simiar to the prevous article,this one is focused on cluster mode.
1.issue command
./bin/spark-submit --class org.apache.spark.examples.JavaWordCount --deploy-mode cluster --master spark://gzsw-02:6066 lib/spark-examples-1.4.1-hadoop2.4.0.jar hdfs://host02:/user/hadoop/input.txt
note:1) the deploy-mode is necessary to specify by 'cluster'.
2) then the 'master' param is rest-url,ie,
REST URL: spark://gzsw-02:6066 (cluster mode)
which shown in spark master ui page,since spark will use rest.RestSubmissionClient to submit jobs.
2.run logs in user side(it's brief,as this is cluster mode)
Spark Command: /usr/local/jdk/jdk1.6.0_31/bin/java -cp /home/hadoop/spark/spark-1.4.1-bin-hadoop2.4/conf/:/home/hadoop/spark/spark-1.4.1-bin-hadoop2.4/lib/spark-assembly-1.4.1-hadoop2.4.0.jar:/home/hadoop/spark/spark-1.4.1-bin-hadoop2.4/lib/datanucleus-api-jdo-3.2.6.jar:/home/hadoop/spark/spark-1.4.1-bin-hadoop2.4/lib/datanucleus-rdbms-3.2.9.jar:/home/hadoop/spark/spark-1.4.1-bin-hadoop2.4/lib/datanucleus-core-3.2.10.jar:/usr/local/hadoop/hadoop-2.5.2/etc/hadoop/ -XX:MaxPermSize=256m org.apache.spark.deploy.SparkSubmit --master spark://gzsw-02:6066 --deploy-mode cluster --class org.apache.spark.examples.JavaWordCount lib/spark-examples-1.4.1-hadoop2.4.0.jar hdfs://hd02:/user/hadoop/input.txt ======================================== -executed cmd retruned by Main.java:/usr/local/jdk/jdk1.6.0_31/bin/java -cp /home/hadoop/spark/spark-1.4.1-bin-hadoop2.4/conf/:/home/hadoop/spark/spark-1.4.1-bin-hadoop2.4/lib/spark-assembly-1.4.1-hadoop2.4.0.jar:/home/hadoop/spark/spark-1.4.1-bin-hadoop2.4/lib/datanucleus-api-jdo-3.2.6.jar:/home/hadoop/spark/spark-1.4.1-bin-hadoop2.4/lib/datanucleus-rdbms-3.2.9.jar:/home/hadoop/spark/spark-1.4.1-bin-hadoop2.4/lib/datanucleus-core-3.2.10.jar:/usr/local/hadoop/hadoop-2.5.2/etc/hadoop/ -XX:MaxPermSize=256m org.apache.spark.deploy.SparkSubmit --master spark://gzsw-02:6066 --deploy-mode cluster --class org.apache.spark.examples.JavaWordCount lib/spark-examples-1.4.1-hadoop2.4.0.jar hdfs://host02:/user/hadoop/input.txt Running Spark using the REST application submission protocol. 16/09/19 11:26:06 INFO rest.RestSubmissionClient: Submitting a request to launch an application in spark://gzsw-02:6066. 16/09/19 11:26:07 INFO rest.RestSubmissionClient: Submission successfully created as driver-20160919112607-0001. Polling submission state... 16/09/19 11:26:07 INFO rest.RestSubmissionClient: Submitting a request for the status of submission driver-20160919112607-0001 in spark://gzsw-02:6066. 16/09/19 11:26:07 INFO rest.RestSubmissionClient: State of driver driver-20160919112607-0001 is now RUNNING. 16/09/19 11:26:07 INFO rest.RestSubmissionClient: Driver is running on worker worker-20160914175456-192.168.100.14-36693 at 192.168.100.14:36693. 16/09/19 11:26:07 INFO rest.RestSubmissionClient: Server responded with CreateSubmissionResponse: { "action" : "CreateSubmissionResponse", "message" : "Driver successfully submitted as driver-20160919112607-0001", "serverSparkVersion" : "1.4.1", "submissionId" : "driver-20160919112607-0001", "success" : true } 16/09/19 11:26:07 INFO util.Utils: Shutdown hook called
so we know,driver is running on worker 192.168.100.14:36693(not local host)
3.FAQ
1) in cluser mode,the driver info will show in spark master ui page(but not for client mode)
(app-0000/0001 both are run in cluster mode,so the corresponding drivers are shown in 'completed drivers' block)
2) can't open the application detail ui.ie when you click the app which run in cluster mode,similar errors will compain about:
Application history not found (app-20160919151936-0000) No event logs found for application JavaWordCount in file:/home/hadoop/spark/spark-eventlog/. Did you specify the correct logging directory?
this msg is present as in cluster mode,the driver will run on other worker instead of master local host,so a request to master will find nothing about this app.
workaround:use the hdfs fs instead of local fs,ie
spark.eventLog.dir=hdfs://host02:8020/user/hadoop/spark-eventlog
3) applications disappear after restart spark
eventhrough you set a distributed filesystem to 'spark.eventlog.dir' mentioned above,you will see nothgin when restart spark,that means spark master will keep apps info in mem when it's alive,but when restarts.there is a spark-history-server.sh to figure out this problem[1]
ref:
相关推荐
Spark-Core文档是本人经三年总结笔记汇总而来,对于自我学习Spark核心基础知识非常方便,资料中例举完善,内容丰富。具体目录如下: 目录 第一章 Spark简介与计算模型 3 1 What is Spark 3 2 Spark简介 3 3 Spark...
Spark Core源码阅读 Spark Context 阅读要点 Spark的缓存,变量,shuffle数据等清理及机制 Spark-submit关于参数及部署模式的部分解析 GroupByKey VS ReduceByKey OrderedRDDFunctions那些事 高效使用...
ClusterManager:在Standalone模式中即为Master(主节点),控制整个集群,监控Worker。在YARN模式中为资源管理器。 Worker:从节点,负责控制计算节点,启动Executor。在YARN模式中为NodeManager,负责计算节点的...
折腾了很久,终于开始学习Spark的源码...这个是提交到standalone集群的方式,打开spark-submit这文件,我们会发现它最后是调用了org.apache.spark.deploy.SparkSubmit这个类。我们直接进去看就行了,main函数就几行代码
spark-2.0.1集群安装及编写例子提交任务,包括集群安装包及例子代码加上安装文档, spark-2.0.1集群安装及编写例子提交任务,包括集群安装包及例子代码加上安装文档
spark-standalone-cluster-on-docker:通过在Docker上使用JupyterLab接口构建自己的集群,学习Scala,Python(PySpark)和R(SparkR)中的Apache Spark
Docker Spark独立 概括 Spark 2.4.7独立docker映像 这是的Docker映像 要求 包裹 版本 python3 3.8.5 码头工人 20.10.2 码头工人组成 1.27.4 火花 2.4.7 怎么跑 该映像可用于使用 安装docker-compose 运行docker...
1. 解压Spark安装包 2. 配置Spark环境变量 2. 修改 spark-env.sh 文件,完成以下设置: 1. 设置运行master进程的节点, e
自述文件该项目提供了Fluent实用程序Http客户端,用于与Spark发行版捆绑在一起的Spark Standalone Rest Server进行交互,如Arthur Mkrtchyan的。特征将作业提交到Spark独立集群查询先前提交给集群的作业的当前状态...
│ 03-[掌握]-Spark环境搭建-Standalone集群模式.mp4 │ 06-[理解]-Spark环境搭建-On-Yarn-两种模式.mp4 │ 07-[掌握]-Spark环境搭建-On-Yarn-两种模式演示.mp4 │ 09-[掌握]-Spark代码开发-准备工作.mp4 │ 10...
Flink_Standalone_Cluster
Spark standalone 分布式集群搭建,Spark standalone运行模式,Spark Standalone运行架构解析---Spark基本工作流程,Spark Standalone运行架构解析---Spark local cluster模式
HIVE 源代码文件,针对树莓派进行过修改,在standalone-metastore/pom.xml中增加 protocCommand属性为本地protoc执行文件路径 /opt/protobuf/protobuf-2.5.0/bin/protoc的节点
要求 虚拟盒子 流浪汉 无业游民的SCP- ...etcdctl get /services/spark-master # has master IP source /run/flannel/subnet.env sudo ifconfig docker0 ${FLANNEL_SUBNET} sudo docker -d --bip=${FLANNEL
hive-jdbc-3.1.2-standalone适用于linux
jar包,官方版本,自测可用
jar包,官方版本,自测可用
hive-jdbc-2.3.7-standalone,可用dbeaver连接hive数据库,在工具中进行数据库记录的新增改查
Spark Standalone架构设计.docx
Hive连接的jar包——hive-jdbc-3.1.2-standalone.jar,使用数据库连接软件连接数据仓库时需要使用相应的驱动器驱动,希望对大家有所帮助