- 浏览: 230570 次
- 性别:
- 来自: 上海
文章分类
最新评论
-
lwb314:
你的这个是创建的临时的hive表,数据也是通过文件录入进去的, ...
Spark SQL操作Hive数据库 -
yixiaoqi2010:
你好 我的提交上去 总是报错,找不到hive表,可能是哪里 ...
Spark SQL操作Hive数据库 -
bo_hai:
target jvm版本也要选择正确。不能选择太高。2.10对 ...
eclipse开发spark程序配置本地运行
这篇bolg讲一下,IDE开发的spark程序如何提交到集群上运行。
首先保证你的集群是运行成功的,集群搭建可以参考http://kevin12.iteye.com/blog/2273556
开发集群测试的spark wordcount程序;
1.hdfs数据准备.
先将README.md文件上传到hdfs上的/library/wordcount/input2目录
root@master1:/usr/local/hadoop/hadoop-2.6.0/sbin# hdfs dfs -mkdir /library/wordcount/input2
root@master1:/usr/local/hadoop/hadoop-2.6.0/sbin# hdfs dfs -put /usr/local/tools/README.md /library/wordcount/input2
查看一下确保文件已经上传到hdfs上。
程序代码:
2.打包程序
将程序打包成jar文件,并将jar文件复制到虚拟机的master1上的/usr/local/sparkApps目录下。
3.提交应用程序到集群中。
可以参考官方文档http://spark.apache.org/docs/latest/submitting-applications.html
第一次用的是下面的命令,结果运行不了,提示信息和命令如下:原因没有提交到集群中。
第二次修改命令如下,运行正常:
运行结果如下:
运行结果和本地的一样。
4.通过写一个wordcount.sh,实现自动化。
创建一个wordcount.sh文件,内容如下,保存并退出。然后chmod +x wordcount.sh给shell文件执行权限。运行下面的命令,执行即可,执行结果和上面的一样。
6.通过浏览器查看程序运行结果:
首先保证你的集群是运行成功的,集群搭建可以参考http://kevin12.iteye.com/blog/2273556
开发集群测试的spark wordcount程序;
1.hdfs数据准备.
先将README.md文件上传到hdfs上的/library/wordcount/input2目录
root@master1:/usr/local/hadoop/hadoop-2.6.0/sbin# hdfs dfs -mkdir /library/wordcount/input2
root@master1:/usr/local/hadoop/hadoop-2.6.0/sbin# hdfs dfs -put /usr/local/tools/README.md /library/wordcount/input2
查看一下确保文件已经上传到hdfs上。
程序代码:
package com.imf.spark import org.apache.spark.SparkConf import org.apache.spark.SparkContext /** * 用户scala开发集群测试的spark wordcount程序 */ object WordCount_Cluster { def main(args: Array[String]): Unit = { /** * 1.创建Spark的配置对象SparkConf,设置Spark程序的运行时的配置信息, * 例如:通过setMaster来设置程序要链接的Spark集群的Master的URL,如果设置为local, * 则代表Spark程序在本地运行,特别适合于机器配置条件非常差的情况。 */ //创建SparkConf对象 val conf = new SparkConf() //设置应用程序名称,在程序运行的监控界面可以看到名称 conf.setAppName("My First Spark App!") /** * 2.创建SparkContext对象 * SparkContext是spark程序所有功能的唯一入口,无论是采用Scala,java,python,R等都必须有一个SprakContext * SparkContext核心作用:初始化spark应用程序运行所需要的核心组件,包括DAGScheduler,TaskScheduler,SchedulerBackend * 同时还会负责Spark程序往Master注册程序等; * SparkContext是整个应用程序中最为至关重要的一个对象; */ //通过创建SparkContext对象,通过传入SparkConf实例定制Spark运行的具体参数和配置信息 val sc = new SparkContext(conf) /** * 3.根据具体数据的来源(HDFS,HBase,Local,FS,DB,S3等)通过SparkContext来创建RDD; * RDD的创建基本有三种方式:根据外部的数据来源(例如HDFS)、根据Scala集合、由其他的RDD操作; * 数据会被RDD划分成为一系列的Partitions,分配到每个Partition的数据属于一个Task的处理范畴; */ //读取hdfs文件内容并切分成Pratitions //val lines = sc.textFile("hdfs://master1:9000/library/wordcount/input2") val lines = sc.textFile("/library/wordcount/input2")//这里不设置并行度是spark有自己的算法,暂时不去考虑 /** * 4.对初始的RDD进行Transformation级别的处理,例如map,filter等高阶函数的变成,来进行具体的数据计算 * 4.1.将每一行的字符串拆分成单个单词 */ //对每一行的字符串进行拆分并把所有行的拆分结果通过flat合并成一个大的集合 val words = lines.flatMap { line => line.split(" ") } /** * 4.2.在单词拆分的基础上对每个单词实例计数为1,也就是word => (word,1) */ val pairs = words.map{word =>(word,1)} /** * 4.3.在每个单词实例计数为1基础上统计每个单词在文件中出现的总次数 */ //对相同的key进行value的累积(包括Local和Reducer级别同时Reduce) val wordCounts = pairs.reduceByKey(_+_) //打印输出 wordCounts.collect.foreach(pair => println(pair._1+":"+pair._2)) sc.stop() } }
2.打包程序
将程序打包成jar文件,并将jar文件复制到虚拟机的master1上的/usr/local/sparkApps目录下。
3.提交应用程序到集群中。
可以参考官方文档http://spark.apache.org/docs/latest/submitting-applications.html
第一次用的是下面的命令,结果运行不了,提示信息和命令如下:原因没有提交到集群中。
root@master1:/usr/local/spark/spark-1.6.0-bin-hadoop2.6/bin# ./spark-submit \ > --class com.imf.spark.WordCount_Cluster \ > --master master spark://master1:7077 \ > /usr/local/sparkApps/WordCount.jar Error: Master must start with yarn, spark, mesos, or local Run with --help for usage help or --verbose for debug output
第二次修改命令如下,运行正常:
./spark-submit \ --class com.imf.spark.WordCount_Cluster \ --master yarn \ /usr/local/sparkApps/WordCount.jar
运行结果如下:
root@master1:/usr/local/spark/spark-1.6.0-bin-hadoop2.6/bin# ./spark-submit \ > --class com.imf.spark.WordCount_Cluster \ > --master yarn \ > /usr/local/sparkApps/WordCount.jar 16/01/27 07:21:55 INFO spark.SparkContext: Running Spark version 1.6.0 16/01/27 07:21:55 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 16/01/27 07:21:56 INFO spark.SecurityManager: Changing view acls to: root 16/01/27 07:21:56 INFO spark.SecurityManager: Changing modify acls to: root 16/01/27 07:21:56 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root) 16/01/27 07:21:57 INFO util.Utils: Successfully started service 'sparkDriver' on port 33500. 16/01/27 07:21:57 INFO slf4j.Slf4jLogger: Slf4jLogger started 16/01/27 07:21:57 INFO Remoting: Starting remoting 16/01/27 07:21:58 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem@192.168.112.130:40488] 16/01/27 07:21:58 INFO util.Utils: Successfully started service 'sparkDriverActorSystem' on port 40488. 16/01/27 07:21:58 INFO spark.SparkEnv: Registering MapOutputTracker 16/01/27 07:21:58 INFO spark.SparkEnv: Registering BlockManagerMaster 16/01/27 07:21:58 INFO storage.DiskBlockManager: Created local directory at /tmp/blockmgr-298b85db-a14e-448b-bf12-a76e7f9d3e61 16/01/27 07:21:58 INFO storage.MemoryStore: MemoryStore started with capacity 517.4 MB 16/01/27 07:21:59 INFO spark.SparkEnv: Registering OutputCommitCoordinator 16/01/27 07:21:59 INFO server.Server: jetty-8.y.z-SNAPSHOT 16/01/27 07:21:59 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:4040 16/01/27 07:21:59 INFO util.Utils: Successfully started service 'SparkUI' on port 4040. 16/01/27 07:21:59 INFO ui.SparkUI: Started SparkUI at http://192.168.112.130:4040 16/01/27 07:21:59 INFO spark.HttpFileServer: HTTP File server directory is /tmp/spark-1273b284-86a3-45a5-ab5f-7996b7f78a5f/httpd-c0ad1a3a-48fc-4aef-8ac1-ddc28159fe2c 16/01/27 07:21:59 INFO spark.HttpServer: Starting HTTP Server 16/01/27 07:21:59 INFO server.Server: jetty-8.y.z-SNAPSHOT 16/01/27 07:21:59 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:41722 16/01/27 07:21:59 INFO util.Utils: Successfully started service 'HTTP file server' on port 41722. 16/01/27 07:21:59 INFO spark.SparkContext: Added JAR file:/usr/local/sparkApps/WordCount.jar at http://192.168.112.130:41722/jars/WordCount.jar with timestamp 1453850519749 16/01/27 07:22:00 INFO client.RMProxy: Connecting to ResourceManager at master1/192.168.112.130:8032 16/01/27 07:22:01 INFO yarn.Client: Requesting a new application from cluster with 3 NodeManagers 16/01/27 07:22:01 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container) 16/01/27 07:22:01 INFO yarn.Client: Will allocate AM container, with 896 MB memory including 384 MB overhead 16/01/27 07:22:01 INFO yarn.Client: Setting up container launch context for our AM 16/01/27 07:22:01 INFO yarn.Client: Setting up the launch environment for our AM container 16/01/27 07:22:01 INFO yarn.Client: Preparing resources for our AM container 16/01/27 07:22:02 INFO yarn.Client: Uploading resource file:/usr/local/spark/spark-1.6.0-bin-hadoop2.6/lib/spark-assembly-1.6.0-hadoop2.6.0.jar -> hdfs://master1:9000/user/root/.sparkStaging/application_1453847555417_0001/spark-assembly-1.6.0-hadoop2.6.0.jar 16/01/27 07:22:11 INFO yarn.Client: Uploading resource file:/tmp/spark-1273b284-86a3-45a5-ab5f-7996b7f78a5f/__spark_conf__2067657392446030944.zip -> hdfs://master1:9000/user/root/.sparkStaging/application_1453847555417_0001/__spark_conf__2067657392446030944.zip 16/01/27 07:22:11 INFO spark.SecurityManager: Changing view acls to: root 16/01/27 07:22:11 INFO spark.SecurityManager: Changing modify acls to: root 16/01/27 07:22:11 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root) 16/01/27 07:22:12 INFO yarn.Client: Submitting application 1 to ResourceManager 16/01/27 07:22:12 INFO impl.YarnClientImpl: Submitted application application_1453847555417_0001 16/01/27 07:22:14 INFO yarn.Client: Application report for application_1453847555417_0001 (state: ACCEPTED) 16/01/27 07:22:14 INFO yarn.Client: client token: N/A diagnostics: N/A ApplicationMaster host: N/A ApplicationMaster RPC port: -1 queue: default start time: 1453850532502 final status: UNDEFINED tracking URL: http://master1:8088/proxy/application_1453847555417_0001/ user: root 16/01/27 07:22:15 INFO yarn.Client: Application report for application_1453847555417_0001 (state: ACCEPTED) 16/01/27 07:22:16 INFO yarn.Client: Application report for application_1453847555417_0001 (state: ACCEPTED) 16/01/27 07:22:17 INFO yarn.Client: Application report for application_1453847555417_0001 (state: ACCEPTED) 16/01/27 07:22:18 INFO yarn.Client: Application report for application_1453847555417_0001 (state: ACCEPTED) 16/01/27 07:22:19 INFO yarn.Client: Application report for application_1453847555417_0001 (state: ACCEPTED) 16/01/27 07:22:20 INFO yarn.Client: Application report for application_1453847555417_0001 (state: ACCEPTED) 16/01/27 07:22:21 INFO yarn.Client: Application report for application_1453847555417_0001 (state: ACCEPTED) 16/01/27 07:22:22 INFO yarn.Client: Application report for application_1453847555417_0001 (state: ACCEPTED) 16/01/27 07:22:23 INFO yarn.Client: Application report for application_1453847555417_0001 (state: ACCEPTED) 16/01/27 07:22:24 INFO yarn.Client: Application report for application_1453847555417_0001 (state: ACCEPTED) 16/01/27 07:22:25 INFO yarn.Client: Application report for application_1453847555417_0001 (state: ACCEPTED) 16/01/27 07:22:26 INFO yarn.Client: Application report for application_1453847555417_0001 (state: ACCEPTED) 16/01/27 07:22:27 INFO yarn.Client: Application report for application_1453847555417_0001 (state: ACCEPTED) 16/01/27 07:22:28 INFO yarn.Client: Application report for application_1453847555417_0001 (state: ACCEPTED) 16/01/27 07:22:29 INFO yarn.Client: Application report for application_1453847555417_0001 (state: ACCEPTED) 16/01/27 07:22:30 INFO yarn.Client: Application report for application_1453847555417_0001 (state: ACCEPTED) 16/01/27 07:22:31 INFO yarn.Client: Application report for application_1453847555417_0001 (state: ACCEPTED) 16/01/27 07:22:32 INFO yarn.Client: Application report for application_1453847555417_0001 (state: ACCEPTED) 16/01/27 07:22:32 INFO cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: ApplicationMaster registered as NettyRpcEndpointRef(null) 16/01/27 07:22:32 INFO cluster.YarnClientSchedulerBackend: Add WebUI Filter. org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter, Map(PROXY_HOSTS -> master1, PROXY_URI_BASES -> http://master1:8088/proxy/application_1453847555417_0001), /proxy/application_1453847555417_0001 16/01/27 07:22:32 INFO ui.JettyUtils: Adding filter: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter 16/01/27 07:22:33 INFO yarn.Client: Application report for application_1453847555417_0001 (state: ACCEPTED) 16/01/27 07:22:34 INFO yarn.Client: Application report for application_1453847555417_0001 (state: RUNNING) 16/01/27 07:22:34 INFO yarn.Client: client token: N/A diagnostics: N/A ApplicationMaster host: 192.168.112.132 ApplicationMaster RPC port: 0 queue: default start time: 1453850532502 final status: UNDEFINED tracking URL: http://master1:8088/proxy/application_1453847555417_0001/ user: root 16/01/27 07:22:34 INFO cluster.YarnClientSchedulerBackend: Application application_1453847555417_0001 has started running. 16/01/27 07:22:34 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 33873. 16/01/27 07:22:34 INFO netty.NettyBlockTransferService: Server created on 33873 16/01/27 07:22:34 INFO storage.BlockManagerMaster: Trying to register BlockManager 16/01/27 07:22:34 INFO storage.BlockManagerMasterEndpoint: Registering block manager 192.168.112.130:33873 with 517.4 MB RAM, BlockManagerId(driver, 192.168.112.130, 33873) 16/01/27 07:22:34 INFO storage.BlockManagerMaster: Registered BlockManager 16/01/27 07:22:37 INFO scheduler.EventLoggingListener: Logging events to hdfs://master1:9000/historyserverforSpark/application_1453847555417_0001 16/01/27 07:22:37 INFO cluster.YarnClientSchedulerBackend: SchedulerBackend is ready for scheduling beginning after waiting maxRegisteredResourcesWaitingTime: 30000(ms) 16/01/27 07:22:46 INFO storage.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 212.9 KB, free 212.9 KB) 16/01/27 07:22:46 INFO storage.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 19.6 KB, free 232.4 KB) 16/01/27 07:22:46 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.112.130:33873 (size: 19.6 KB, free: 517.4 MB) 16/01/27 07:22:46 INFO spark.SparkContext: Created broadcast 0 from textFile at WordCount_Cluster.scala:36 16/01/27 07:22:49 INFO mapred.FileInputFormat: Total input paths to process : 1 16/01/27 07:22:50 INFO spark.SparkContext: Starting job: collect at WordCount_Cluster.scala:55 16/01/27 07:22:51 INFO scheduler.DAGScheduler: Registering RDD 3 (map at WordCount_Cluster.scala:47) 16/01/27 07:22:51 INFO scheduler.DAGScheduler: Got job 0 (collect at WordCount_Cluster.scala:55) with 2 output partitions 16/01/27 07:22:51 INFO scheduler.DAGScheduler: Final stage: ResultStage 1 (collect at WordCount_Cluster.scala:55) 16/01/27 07:22:51 INFO scheduler.DAGScheduler: Parents of final stage: List(ShuffleMapStage 0) 16/01/27 07:22:51 INFO scheduler.DAGScheduler: Missing parents: List(ShuffleMapStage 0) 16/01/27 07:22:51 INFO scheduler.DAGScheduler: Submitting ShuffleMapStage 0 (MapPartitionsRDD[3] at map at WordCount_Cluster.scala:47), which has no missing parents 16/01/27 07:22:52 INFO storage.MemoryStore: Block broadcast_1 stored as values in memory (estimated size 4.1 KB, free 236.5 KB) 16/01/27 07:22:52 INFO storage.MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 2.3 KB, free 238.8 KB) 16/01/27 07:22:52 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on 192.168.112.130:33873 (size: 2.3 KB, free: 517.4 MB) 16/01/27 07:22:52 INFO spark.SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1006 16/01/27 07:22:52 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ShuffleMapStage 0 (MapPartitionsRDD[3] at map at WordCount_Cluster.scala:47) 16/01/27 07:22:52 INFO cluster.YarnScheduler: Adding task set 0.0 with 2 tasks 16/01/27 07:23:08 WARN cluster.YarnScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 16/01/27 07:23:22 INFO cluster.YarnClientSchedulerBackend: Registered executor NettyRpcEndpointRef(null) (worker3:56136) with ID 2 16/01/27 07:23:23 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, worker3, partition 0,NODE_LOCAL, 2202 bytes) 16/01/27 07:23:23 WARN cluster.YarnScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 16/01/27 07:23:24 INFO storage.BlockManagerMasterEndpoint: Registering block manager worker3:41970 with 517.4 MB RAM, BlockManagerId(2, worker3, 41970) 16/01/27 07:23:33 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on worker3:41970 (size: 2.3 KB, free: 517.4 MB) 16/01/27 07:23:37 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on worker3:41970 (size: 19.6 KB, free: 517.4 MB) 16/01/27 07:23:39 INFO cluster.YarnClientSchedulerBackend: Registered executor NettyRpcEndpointRef(null) (worker2:42174) with ID 1 16/01/27 07:23:39 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, worker2, partition 1,RACK_LOCAL, 2202 bytes) 16/01/27 07:23:40 INFO storage.BlockManagerMasterEndpoint: Registering block manager worker2:44472 with 517.4 MB RAM, BlockManagerId(1, worker2, 44472) 16/01/27 07:23:47 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on worker2:44472 (size: 2.3 KB, free: 517.4 MB) 16/01/27 07:23:51 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on worker2:44472 (size: 19.6 KB, free: 517.4 MB) 16/01/27 07:24:00 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 37460 ms on worker3 (1/2) 16/01/27 07:24:09 INFO scheduler.DAGScheduler: ShuffleMapStage 0 (map at WordCount_Cluster.scala:47) finished in 75.810 s 16/01/27 07:24:09 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 29446 ms on worker2 (2/2) 16/01/27 07:24:09 INFO scheduler.DAGScheduler: looking for newly runnable stages 16/01/27 07:24:09 INFO scheduler.DAGScheduler: running: Set() 16/01/27 07:24:09 INFO scheduler.DAGScheduler: waiting: Set(ResultStage 1) 16/01/27 07:24:09 INFO scheduler.DAGScheduler: failed: Set() 16/01/27 07:24:09 INFO cluster.YarnScheduler: Removed TaskSet 0.0, whose tasks have all completed, from pool 16/01/27 07:24:09 INFO scheduler.DAGScheduler: Submitting ResultStage 1 (ShuffledRDD[4] at reduceByKey at WordCount_Cluster.scala:53), which has no missing parents 16/01/27 07:24:09 INFO storage.MemoryStore: Block broadcast_2 stored as values in memory (estimated size 2.6 KB, free 241.4 KB) 16/01/27 07:24:09 INFO storage.MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 1595.0 B, free 242.9 KB) 16/01/27 07:24:09 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on 192.168.112.130:33873 (size: 1595.0 B, free: 517.4 MB) 16/01/27 07:24:09 INFO spark.SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:1006 16/01/27 07:24:09 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ResultStage 1 (ShuffledRDD[4] at reduceByKey at WordCount_Cluster.scala:53) 16/01/27 07:24:09 INFO cluster.YarnScheduler: Adding task set 1.0 with 2 tasks 16/01/27 07:24:09 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 1.0 (TID 2, worker2, partition 0,NODE_LOCAL, 1951 bytes) 16/01/27 07:24:09 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 1.0 (TID 3, worker3, partition 1,NODE_LOCAL, 1951 bytes) 16/01/27 07:24:09 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on worker2:44472 (size: 1595.0 B, free: 517.4 MB) 16/01/27 07:24:09 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on worker3:41970 (size: 1595.0 B, free: 517.4 MB) 16/01/27 07:24:10 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 0 to worker2:42174 16/01/27 07:24:10 INFO spark.MapOutputTrackerMaster: Size of output statuses for shuffle 0 is 158 bytes 16/01/27 07:24:10 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 0 to worker3:56136 16/01/27 07:24:13 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 1.0 (TID 3) in 3634 ms on worker3 (1/2) 16/01/27 07:24:13 INFO scheduler.DAGScheduler: ResultStage 1 (collect at WordCount_Cluster.scala:55) finished in 3.691 s 16/01/27 07:24:13 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 1.0 (TID 2) in 3696 ms on worker2 (2/2) 16/01/27 07:24:13 INFO cluster.YarnScheduler: Removed TaskSet 1.0, whose tasks have all completed, from pool 16/01/27 07:24:13 INFO scheduler.DAGScheduler: Job 0 finished: collect at WordCount_Cluster.scala:55, took 82.513256 s package:1 this:1 Version"](http://spark.apache.org/docs/latest/building-spark.html#specifying-the-hadoop-version):1 Because:1 Python:2 cluster.:1 [run:1 its:1 YARN,:1 have:1 general:2 pre-built:1 locally:2 locally.:1 changed:1 sc.parallelize(1:1 only:1 Configuration:1 This:2 first:1 basic:1 documentation:3 learning,:1 graph:1 Hive:2 several:1 ["Specifying:1 "yarn":1 page](http://spark.apache.org/documentation.html):1 [params]`.:1 [project:2 prefer:1 SparkPi:2 <http://spark.apache.org/>:1 engine:1 version:1 file:1 documentation,:1 MASTER:1 example:3 are:1 systems.:1 params:1 scala>:1 DataFrames,:1 provides:1 refer:2 configure:1 Interactive:2 R,:1 can:6 build:3 when:1 how:2 easiest:1 Apache:1 package.:1 1000).count():1 Note:1 Data.:1 >>>:1 Scala:2 Alternatively,:1 variable:1 submit:1 Testing:1 Streaming:1 module,:1 thread,:1 rich:1 them,:1 detailed:2 stream:1 GraphX:1 distribution:1 Please:3 is:6 return:2 Thriftserver:1 same:1 start:1 Spark](#building-spark).:1 one:2 with:3 built:1 Spark"](http://spark.apache.org/docs/latest/building-spark.html).:1 data:1 wiki](https://cwiki.apache.org/confluence/display/SPARK).:1 using:2 talk:1 class:2 Shell:2 README:1 computing:1 Python,:2 example::1 ##:8 building:2 N:1 set:2 Hadoop-supported:1 other:1 Example:1 analysis.:1 from:1 runs.:1 Building:1 higher-level:1 need:1 guidance:2 Big:1 guide,:1 Java,:1 fast:1 uses:1 SQL:2 will:1 <class>:1 requires:1 :67 Documentation:1 web:1 cluster:2 using::1 MLlib:1 shell::2 Scala,:1 supports:2 built,:1 ./dev/run-tests:1 build/mvn:1 sample:1 For:2 Programs:1 Spark:13 particular:2 The:1 processing.:1 APIs:1 computation:1 Try:1 [Configuration:1 ./bin/pyspark:1 A:1 through:1 #:1 library:1 following:2 More:1 which:2 also:4 storage:1 should:2 To:2 for:11 Once:1 setup:1 mesos://:1 Maven](http://maven.apache.org/).:1 latest:1 processing,:1 the:21 your:1 not:1 different:1 distributions.:1 given.:1 About:1 if:4 instructions.:1 be:2 do:2 Tests:1 no:1 ./bin/run-example:2 programs,:1 including:3 `./bin/run-example:1 Spark.:1 Versions:1 HDFS:1 individual:1 spark://:1 It:2 an:3 programming:1 machine:1 run::1 environment:1 clean:1 1000::2 And:1 run:7 ./bin/spark-shell:1 URL,:1 "local":1 MASTER=spark://host:7077:1 on:5 You:3 threads.:1 against:1 [Apache:1 help:1 print:1 tests:2 examples:2 at:2 in:5 -DskipTests:1 optimized:1 downloaded:1 versions:1 graphs:1 Guide](http://spark.apache.org/docs/latest/configuration.html):1 online:1 usage:1 abbreviated:1 comes:1 directory.:1 overview:1 [building:1 `examples`:2 Many:1 Running:1 way:1 use:3 Online:1 site,:1 tests](https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools).:1 running:1 find:1 sc.parallelize(range(1000)).count():1 contains:1 project:1 you:4 Pi:1 that:2 protocols:1 a:8 or:3 high-level:1 name:1 Hadoop,:2 to:14 available:1 (You:1 core:1 instance::1 see:1 of:5 tools:1 "local[N]":1 programs:2 package.):1 ["Building:1 must:1 and:10 command,:2 system:1 Hadoop:3 16/01/27 07:24:13 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/metrics/json,null} 16/01/27 07:24:13 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage/kill,null} 16/01/27 07:24:13 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/api,null} 16/01/27 07:24:13 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/,null} 16/01/27 07:24:13 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/static,null} 16/01/27 07:24:13 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/threadDump/json,null} 16/01/27 07:24:13 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/threadDump,null} 16/01/27 07:24:13 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/json,null} 16/01/27 07:24:13 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors,null} 16/01/27 07:24:13 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/environment/json,null} 16/01/27 07:24:13 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/environment,null} 16/01/27 07:24:13 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/rdd/json,null} 16/01/27 07:24:13 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/rdd,null} 16/01/27 07:24:13 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/json,null} 16/01/27 07:24:13 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage,null} 16/01/27 07:24:13 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/pool/json,null} 16/01/27 07:24:13 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/pool,null} 16/01/27 07:24:13 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage/json,null} 16/01/27 07:24:13 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage,null} 16/01/27 07:24:13 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/json,null} 16/01/27 07:24:13 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages,null} 16/01/27 07:24:13 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/job/json,null} 16/01/27 07:24:13 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/job,null} 16/01/27 07:24:13 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/json,null} 16/01/27 07:24:13 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs,null} 16/01/27 07:24:13 INFO ui.SparkUI: Stopped Spark web UI at http://192.168.112.130:4040 16/01/27 07:24:14 INFO cluster.YarnClientSchedulerBackend: Interrupting monitor thread 16/01/27 07:24:14 INFO cluster.YarnClientSchedulerBackend: Shutting down all executors 16/01/27 07:24:14 INFO cluster.YarnClientSchedulerBackend: Asking each executor to shut down 16/01/27 07:24:14 INFO cluster.YarnClientSchedulerBackend: Stopped 16/01/27 07:24:14 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 16/01/27 07:24:14 INFO storage.MemoryStore: MemoryStore cleared 16/01/27 07:24:14 INFO storage.BlockManager: BlockManager stopped 16/01/27 07:24:14 INFO storage.BlockManagerMaster: BlockManagerMaster stopped 16/01/27 07:24:14 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! 16/01/27 07:24:15 INFO remote.RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon. 16/01/27 07:24:15 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports. 16/01/27 07:24:15 INFO spark.SparkContext: Successfully stopped SparkContext 16/01/27 07:24:15 INFO util.ShutdownHookManager: Shutdown hook called 16/01/27 07:24:16 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-1273b284-86a3-45a5-ab5f-7996b7f78a5f 16/01/27 07:24:16 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-1273b284-86a3-45a5-ab5f-7996b7f78a5f/httpd-c0ad1a3a-48fc-4aef-8ac1-ddc28159fe2c root@master1:/usr/local/spark/spark-1.6.0-bin-hadoop2.6/bin#
运行结果和本地的一样。
4.通过写一个wordcount.sh,实现自动化。
创建一个wordcount.sh文件,内容如下,保存并退出。然后chmod +x wordcount.sh给shell文件执行权限。运行下面的命令,执行即可,执行结果和上面的一样。
root@master1:/usr/local/sparkApps# ./wordcount.sh
6.通过浏览器查看程序运行结果:
发表评论
-
SparkStreaming pull data from Flume
2016-06-19 17:29 1192Spark Streaming + Flume Integra ... -
Flume push数据到SparkStreaming
2016-06-19 15:16 1891上节http://kevin12.iteye.com/blog ... -
Spark Streaming 统计单词的例
2016-06-19 14:55 3测试Spark Streaming 统计单词的例子 1.准 ... -
Spark Streaming 统计单词的例子
2016-06-19 12:29 3631测试Spark Streaming 统计单词的例子 1.准备 ... -
Spark SQL窗口函数
2016-04-22 07:18 2507窗口函数又叫着窗口分析函数,Spark 1.4版本SparkS ... -
Spark SQL内置函数应用
2016-04-22 07:00 8567简单说明 使用Spark SQL中的内置函数对数据进行 ... -
Spark SQL操作Hive数据库
2016-04-13 22:37 17539本次例子通过scala编程实现Spark SQL操作Hive数 ... -
Spark SQL on hive配置和实战
2016-03-26 18:40 5488spark sql 官网:http://spark ... -
Spark RDD弹性表现和来源
2016-02-09 20:12 3817hadoop 的MapReduce是基于数 ... -
Spark内核架构
2016-02-07 12:24 9751.在将spark内核架构前,先了解一下Hadoop的MR,H ... -
spark集群HA搭建
2016-01-31 08:50 4442spark集群的HA图: 搭建spark的HA需要安装z ... -
Spark集群中WordCount运行原理
2016-01-31 07:05 2475以数据流动的视角解释一下wordcount运行的原理 pa ... -
eclipse开发spark程序配置本地运行
2016-01-27 07:58 12305今天简单讲一下在local模式下用eclipse开发一个简单的 ... -
spark1.6.0搭建(基于hadoop2.6.0分布式)
2016-01-24 10:11 5923本文是基于hadoop2.6.0的分布式环境搭建spark1. ...
相关推荐
初学者手册 一、 软件及下载 2 ...3. 测试spark集群 20 八、 Scala开发环境搭建 21 1、系统安装 21 2、安装jdk8 21 3、安装scala2.11 21 4、安装scala for eclipse 21 5、创建scala工程 21
使用Eclipse编译运行MapReduce程序(Hadoop-Eclipse-Plugin,建议) 38 使用Eclipse打包自己的MapReduce程序 51 不用Hadoop-Eclipse-Plugin编写MapReduce程序 54 Hadoop集群安装配置教程 56 HBase安装 56 HBase...
本人亲手操作搭建Hadoop集群成功,并通过Eclipse进行MapReduce程序的开发,步骤详细完整,在相关过程中配有完整代码和解释,全程无误,只需复制粘贴即可,小白新手按步骤一步一步来也能搭建Hadoop集群成功并进行...
看这一篇就够啦,给出一个完全分布式hadoop+spark集群搭建完整文档,从环境准备(包括机器名,ip映射步骤,ssh免密,Java等)开始,包括zookeeper,hadoop,hive,spark,eclipse/idea安装全过程,3-4节点,集群部署...
大数据与云计算教程课件 优质大数据课程 14.Hadoop集群配置(共6页).pptx 大数据与云计算教程课件 优质大数据课程 15.Hive(共46页).pptx 大数据与云计算教程课件 优质大数据课程 16.Hive操作(共43页).pptx ...
安装Hadoop,网络配置,SSH免密通信,Hadoop配置,启动Hadoop,安装和配置spark,eclipse配置(HDFS文件上传),外网环境向内网迁移,调试程序
Windows环境下采用eclipse连接虚拟机中的Hadoop伪分布式集群-附件资源
1.Windows安装配置下载最新文档,指导服务运行以及配置说明。 2. linux安装配置下载安装文件,SQL文件导入,检测以及开发端口配置和页面设置。 3. 管理控制台 设置API接口插件集群配置 4.openfire源代码构建,搭建...
14.Hadoop集群配置 15.Hive 16.Hive操作 17.Hive查询 18.HBase 19.Pig 20.Pig Latin 21.Pig模式与函数 22.Zookeeper 23.Zookeeper服务 24.使用Zookeeper构建应用 25.Sqoop 26.深入Sqoop的导入 27.深入Sqoop导出 28....
大数据与云计算教程课件 优质大数据课程 14.Hadoop集群配置(共6页).pptx 大数据与云计算教程课件 优质大数据课程 15.Hive(共46页).pptx 大数据与云计算教程课件 优质大数据课程 16.Hive操作(共43页).pptx ...
14.Hadoop集群配置 15.Hive 16.Hive操作 17.Hive查询 18.HBase 19.Pig 20.Pig Latin 21.Pig模式与函数 22.Zookeeper 23.Zookeeper服务 24.使用Zookeeper构建应用 25.Sqoop 26.深入Sqoop的导入 27.深入Sqoop导出 28....
14.Hadoop集群配置 15.Hive 16.Hive操作 17.Hive查询 18.HBase 19.Pig 20.Pig Latin 21.Pig模式与函数 22.Zookeeper 23.Zookeeper服务 24.使用Zookeeper构建应用 25.Sqoop 26.深入Sqoop的导入 27.深入Sqoop导出 28....
14.Hadoop集群配置 15.Hive 16.Hive操作 17.Hive查询 18.HBase 19.Pig 20.Pig Latin 21.Pig模式与函数 22.Zookeeper 23.Zookeeper服务 24.使用Zookeeper构建应用 25.Sqoop 26.深入Sqoop的导入 27.深入Sqoop导出 28....
大数据与云计算教程课件 优质大数据课程 14.Hadoop集群配置(共6页).pptx 大数据与云计算教程课件 优质大数据课程 15.Hive(共46页).pptx 大数据与云计算教程课件 优质大数据课程 16.Hive操作(共43页).pptx ...
大数据与云计算教程课件 优质大数据课程 14.Hadoop集群配置(共6页).pptx 大数据与云计算教程课件 优质大数据课程 15.Hive(共46页).pptx 大数据与云计算教程课件 优质大数据课程 16.Hive操作(共43页).pptx ...
介绍该存储库将用作我们努力在 Spark-Streaming 之上实现时空查询的代码库。要求您需要安装 sbt 0.13x。 有关 [sbt] 的更多信息 ( )建造为 Spark 集群提交构建单个 jar。 $ sbt assemblyEclipse 集成您需要 Eclipse ...
5)Spark 版本:实验室版本为 2.1.0,集群环境为 2.3.0; 6)Maven 实验要求: 基本要求:实现的数据分析系统要有对数据分析结果以及各种功能的图形化、图表化展示界面。 高级要求:在数据分析系统中应用算法解决...
大数据与云计算教程课件 优质大数据课程 14.Hadoop集群配置(共6页).pptx 大数据与云计算教程课件 优质大数据课程 15.Hive(共46页).pptx 大数据与云计算教程课件 优质大数据课程 16.Hive操作(共43页).pptx ...
要考虑的套件更多参考###分类###计算机视觉###数据采集###数据科学工具###数据可视化###分布式计算Pi上的集群计算Pi的大型集群项目让 Pi 将数据推送到 Spark 服务器的项目###操作系统在 Raspbian 上运行...
程序开发案例 程序设计案例 设计 设计原则 单一职责原则 开闭原则 里氏替换原则 依赖倒转原则 接口隔离原则 迪米特原则 设计模式 结构模式 适配器模式 桥接模式 组合模式 装饰模式 外观模式 享元...