Scala版本
scala-2.10.4
说明:
之前搭建环境一直不成功,原因可能是使用了Scala-2.11.4版本导致的。Spark的官方网站明确的说Spark-1.2.0不支持Scala2.11.4版本:
Note: Scala 2.11 users should download the Spark source package and build with Scala 2.11 support.
Spark版本:
spark-1.2.0-bin-hadoop2.4.tgz
配置环境变量
export SCALA_HOME=/home/hadoop/spark1.2.0/scala-2.10.4 export PATH=$SCALA_HOME/bin:$PATH export SPARK_HOME=/home/hadoop/spark1.2.0/spark-1.2.0-bin-hadoop2.4 export PATH=$SPARK_HOME/bin:$PATH
搭建Intellij Idea开发Spark程序的环境
1. 下载安装Scala插件
2. 创建 Scala的Non-SBT项目
3. 导入Spark的jar包
spark-1.2.0-bin-hadoop2.4
4.编写wordcount例子代码
package spark.examples import org.apache.spark.SparkConf import org.apache.spark.SparkContext import org.apache.spark.SparkContext._ object SparkWordCount { def main(args: Array[String]) { ///注意setMaster("local")这行代码,表明Spark以local运行(注意local与standalone模式的区别) val conf = new SparkConf().setAppName("SparkWordCount").setMaster("local") val sc = new SparkContext(conf) val rdd = sc.textFile("file:///home/hadoop/spark1.2.0/word.txt") rdd.flatMap(_.split(" ")).map((_, 1)).reduceByKey(_ + _).map(x => (x._2, x._1)).sortByKey(false).map(x => (x._2, x._1)).saveAsTextFile("file:///home/hadoop/spark1.2.0/WordCountResult") sc.stop } }
控制台日志:
15/01/14 22:06:34 WARN Utils: Your hostname, hadoop-Inspiron-3521 resolves to a loopback address: 127.0.1.1; using 192.168.0.111 instead (on interface eth1) 15/01/14 22:06:34 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address 15/01/14 22:06:35 INFO SecurityManager: Changing view acls to: hadoop 15/01/14 22:06:35 INFO SecurityManager: Changing modify acls to: hadoop 15/01/14 22:06:35 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop); users with modify permissions: Set(hadoop) 15/01/14 22:06:36 INFO Slf4jLogger: Slf4jLogger started 15/01/14 22:06:36 INFO Remoting: Starting remoting 15/01/14 22:06:36 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@hadoop-Inspiron-3521.local:53624] 15/01/14 22:06:36 INFO Utils: Successfully started service 'sparkDriver' on port 53624. 15/01/14 22:06:36 INFO SparkEnv: Registering MapOutputTracker 15/01/14 22:06:36 INFO SparkEnv: Registering BlockManagerMaster 15/01/14 22:06:36 INFO DiskBlockManager: Created local directory at /tmp/spark-local-20150114220636-4826 15/01/14 22:06:36 INFO MemoryStore: MemoryStore started with capacity 461.7 MB 15/01/14 22:06:37 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 15/01/14 22:06:37 INFO HttpFileServer: HTTP File server directory is /tmp/spark-19683393-0315-498c-9b72-9c6a13684f44 15/01/14 22:06:37 INFO HttpServer: Starting HTTP Server 15/01/14 22:06:38 INFO Utils: Successfully started service 'HTTP file server' on port 53231. 15/01/14 22:06:43 INFO Utils: Successfully started service 'SparkUI' on port 4040. 15/01/14 22:06:43 INFO SparkUI: Started SparkUI at http://hadoop-Inspiron-3521.local:4040 15/01/14 22:06:43 INFO AkkaUtils: Connecting to HeartbeatReceiver: akka.tcp://sparkDriver@hadoop-Inspiron-3521.local:53624/user/HeartbeatReceiver 15/01/14 22:06:44 INFO NettyBlockTransferService: Server created on 46971 15/01/14 22:06:44 INFO BlockManagerMaster: Trying to register BlockManager 15/01/14 22:06:44 INFO BlockManagerMasterActor: Registering block manager localhost:46971 with 461.7 MB RAM, BlockManagerId(<driver>, localhost, 46971) 15/01/14 22:06:44 INFO BlockManagerMaster: Registered BlockManager 15/01/14 22:06:44 INFO MemoryStore: ensureFreeSpace(163705) called with curMem=0, maxMem=484127539 15/01/14 22:06:44 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 159.9 KB, free 461.5 MB) 15/01/14 22:06:45 INFO MemoryStore: ensureFreeSpace(22692) called with curMem=163705, maxMem=484127539 15/01/14 22:06:45 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 22.2 KB, free 461.5 MB) 15/01/14 22:06:45 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:46971 (size: 22.2 KB, free: 461.7 MB) 15/01/14 22:06:45 INFO BlockManagerMaster: Updated info of block broadcast_0_piece0 15/01/14 22:06:45 INFO SparkContext: Created broadcast 0 from textFile at SparkWordCount.scala:40 15/01/14 22:06:45 INFO FileInputFormat: Total input paths to process : 1 15/01/14 22:06:45 INFO deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id 15/01/14 22:06:45 INFO deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id 15/01/14 22:06:45 INFO deprecation: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap 15/01/14 22:06:45 INFO deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition 15/01/14 22:06:45 INFO deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id 15/01/14 22:06:46 INFO SparkContext: Starting job: saveAsTextFile at SparkWordCount.scala:43 15/01/14 22:06:46 INFO DAGScheduler: Registering RDD 3 (map at SparkWordCount.scala:43) 15/01/14 22:06:46 INFO DAGScheduler: Registering RDD 5 (map at SparkWordCount.scala:43) 15/01/14 22:06:46 INFO DAGScheduler: Got job 0 (saveAsTextFile at SparkWordCount.scala:43) with 1 output partitions (allowLocal=false) 15/01/14 22:06:46 INFO DAGScheduler: Final stage: Stage 2(saveAsTextFile at SparkWordCount.scala:43) 15/01/14 22:06:46 INFO DAGScheduler: Parents of final stage: List(Stage 1) 15/01/14 22:06:46 INFO DAGScheduler: Missing parents: List(Stage 1) 15/01/14 22:06:46 INFO DAGScheduler: Submitting Stage 0 (MappedRDD[3] at map at SparkWordCount.scala:43), which has no missing parents 15/01/14 22:06:46 INFO MemoryStore: ensureFreeSpace(3560) called with curMem=186397, maxMem=484127539 15/01/14 22:06:46 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 3.5 KB, free 461.5 MB) 15/01/14 22:06:46 INFO MemoryStore: ensureFreeSpace(2528) called with curMem=189957, maxMem=484127539 15/01/14 22:06:46 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 2.5 KB, free 461.5 MB) 15/01/14 22:06:46 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on localhost:46971 (size: 2.5 KB, free: 461.7 MB) 15/01/14 22:06:46 INFO BlockManagerMaster: Updated info of block broadcast_1_piece0 15/01/14 22:06:46 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:838 15/01/14 22:06:46 INFO DAGScheduler: Submitting 1 missing tasks from Stage 0 (MappedRDD[3] at map at SparkWordCount.scala:43) 15/01/14 22:06:46 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks 15/01/14 22:06:46 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, PROCESS_LOCAL, 1292 bytes) 15/01/14 22:06:46 INFO Executor: Running task 0.0 in stage 0.0 (TID 0) 15/01/14 22:06:46 INFO HadoopRDD: Input split: file:/home/hadoop/spark1.2.0/word.txt:0+29 15/01/14 22:06:46 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 1895 bytes result sent to driver 15/01/14 22:06:46 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 323 ms on localhost (1/1) 15/01/14 22:06:46 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 15/01/14 22:06:46 INFO DAGScheduler: Stage 0 (map at SparkWordCount.scala:43) finished in 0.350 s 15/01/14 22:06:46 INFO DAGScheduler: looking for newly runnable stages 15/01/14 22:06:46 INFO DAGScheduler: running: Set() 15/01/14 22:06:46 INFO DAGScheduler: waiting: Set(Stage 1, Stage 2) 15/01/14 22:06:46 INFO DAGScheduler: failed: Set() 15/01/14 22:06:46 INFO DAGScheduler: Missing parents for Stage 1: List() 15/01/14 22:06:46 INFO DAGScheduler: Missing parents for Stage 2: List(Stage 1) 15/01/14 22:06:46 INFO DAGScheduler: Submitting Stage 1 (MappedRDD[5] at map at SparkWordCount.scala:43), which is now runnable 15/01/14 22:06:46 INFO MemoryStore: ensureFreeSpace(2992) called with curMem=192485, maxMem=484127539 15/01/14 22:06:46 INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated size 2.9 KB, free 461.5 MB) 15/01/14 22:06:46 INFO MemoryStore: ensureFreeSpace(2158) called with curMem=195477, maxMem=484127539 15/01/14 22:06:46 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 2.1 KB, free 461.5 MB) 15/01/14 22:06:46 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on localhost:46971 (size: 2.1 KB, free: 461.7 MB) 15/01/14 22:06:46 INFO BlockManagerMaster: Updated info of block broadcast_2_piece0 15/01/14 22:06:46 INFO SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:838 15/01/14 22:06:46 INFO DAGScheduler: Submitting 1 missing tasks from Stage 1 (MappedRDD[5] at map at SparkWordCount.scala:43) 15/01/14 22:06:46 INFO TaskSchedulerImpl: Adding task set 1.0 with 1 tasks 15/01/14 22:06:46 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 1, localhost, PROCESS_LOCAL, 1045 bytes) 15/01/14 22:06:46 INFO Executor: Running task 0.0 in stage 1.0 (TID 1) 15/01/14 22:06:46 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks 15/01/14 22:06:46 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 12 ms 15/01/14 22:06:46 INFO Executor: Finished task 0.0 in stage 1.0 (TID 1). 1000 bytes result sent to driver 15/01/14 22:06:46 INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID 1) in 110 ms on localhost (1/1) 15/01/14 22:06:46 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool 15/01/14 22:06:46 INFO DAGScheduler: Stage 1 (map at SparkWordCount.scala:43) finished in 0.106 s 15/01/14 22:06:46 INFO DAGScheduler: looking for newly runnable stages 15/01/14 22:06:46 INFO DAGScheduler: running: Set() 15/01/14 22:06:46 INFO DAGScheduler: waiting: Set(Stage 2) 15/01/14 22:06:46 INFO DAGScheduler: failed: Set() 15/01/14 22:06:46 INFO DAGScheduler: Missing parents for Stage 2: List() 15/01/14 22:06:46 INFO DAGScheduler: Submitting Stage 2 (MappedRDD[8] at saveAsTextFile at SparkWordCount.scala:43), which is now runnable 15/01/14 22:06:47 INFO MemoryStore: ensureFreeSpace(112880) called with curMem=197635, maxMem=484127539 15/01/14 22:06:47 INFO MemoryStore: Block broadcast_3 stored as values in memory (estimated size 110.2 KB, free 461.4 MB) 15/01/14 22:06:47 INFO MemoryStore: ensureFreeSpace(67500) called with curMem=310515, maxMem=484127539 15/01/14 22:06:47 INFO MemoryStore: Block broadcast_3_piece0 stored as bytes in memory (estimated size 65.9 KB, free 461.3 MB) 15/01/14 22:06:47 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory on localhost:46971 (size: 65.9 KB, free: 461.6 MB) 15/01/14 22:06:47 INFO BlockManagerMaster: Updated info of block broadcast_3_piece0 15/01/14 22:06:47 INFO SparkContext: Created broadcast 3 from broadcast at DAGScheduler.scala:838 15/01/14 22:06:47 INFO DAGScheduler: Submitting 1 missing tasks from Stage 2 (MappedRDD[8] at saveAsTextFile at SparkWordCount.scala:43) 15/01/14 22:06:47 INFO TaskSchedulerImpl: Adding task set 2.0 with 1 tasks 15/01/14 22:06:47 INFO TaskSetManager: Starting task 0.0 in stage 2.0 (TID 2, localhost, PROCESS_LOCAL, 1056 bytes) 15/01/14 22:06:47 INFO Executor: Running task 0.0 in stage 2.0 (TID 2) 15/01/14 22:06:47 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks 15/01/14 22:06:47 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms 15/01/14 22:06:47 INFO FileOutputCommitter: Saved output of task 'attempt_201501142206_0002_m_000000_2' to file:/home/hadoop/spark1.2.0/WordCountResult/_temporary/0/task_201501142206_0002_m_000000 15/01/14 22:06:47 INFO SparkHadoopWriter: attempt_201501142206_0002_m_000000_2: Committed 15/01/14 22:06:47 INFO Executor: Finished task 0.0 in stage 2.0 (TID 2). 824 bytes result sent to driver 15/01/14 22:06:47 INFO TaskSetManager: Finished task 0.0 in stage 2.0 (TID 2) in 397 ms on localhost (1/1) 15/01/14 22:06:47 INFO DAGScheduler: Stage 2 (saveAsTextFile at SparkWordCount.scala:43) finished in 0.399 s 15/01/14 22:06:47 INFO TaskSchedulerImpl: Removed TaskSet 2.0, whose tasks have all completed, from pool 15/01/14 22:06:47 INFO DAGScheduler: Job 0 finished: saveAsTextFile at SparkWordCount.scala:43, took 1.241181 s 15/01/14 22:06:47 INFO SparkUI: Stopped Spark web UI at http://hadoop-Inspiron-3521.local:4040 15/01/14 22:06:47 INFO DAGScheduler: Stopping DAGScheduler 15/01/14 22:06:48 INFO MapOutputTrackerMasterActor: MapOutputTrackerActor stopped! 15/01/14 22:06:48 INFO MemoryStore: MemoryStore cleared 15/01/14 22:06:48 INFO BlockManager: BlockManager stopped 15/01/14 22:06:48 INFO BlockManagerMaster: BlockManagerMaster stopped 15/01/14 22:06:48 INFO SparkContext: Successfully stopped SparkContext 15/01/14 22:06:48 INFO RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon. 15/01/14 22:06:48 INFO RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports. Process finished with exit code 0
调整日志级别
从上面的输出中,可以看到,Spark默认是输出INFO级别的日志,为了查看全部的日志,可以设置Spark的日志输出,办法是在wordcount项目的源代码根目录创建一个log4j.properties文件,其中的内容
log4j.rootCategory=DEBUG, file log4j.appender.file=org.apache.log4j.ConsoleAppender #如果要把日志输出到某个文件中,则使用FileAppender #log4j.appender.file=org.apache.log4j.FileAppender #log4j.appender.file.file=spark.log log4j.appender.file.append=false log4j.appender.file.layout=org.apache.log4j.PatternLayout log4j.appender.file.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss.SSS} %t %p %c{1}: %m%n # Ignore messages below warning level from Jetty, because it's a bit verbose log4j.logger.org.eclipse.jetty=WARN org.eclipse.jetty.LEVEL=WARN
相关推荐
IntelliJ Idea开发spark程序及运行文章的源码,程序简单入门。 项目中缺这个包,请读者自行补上,在SPAKRK_HOME/lib下面有 spark-assembly-1.6.0-hadoop2.6.0 代码指导文章地址: ...
在 IntelliJ IDEA 中配置 Tomcat 服务器是为了能够在开发过程中部署和运行您的 Java Web 应用程序。下面是配置 Tomcat 的基本步骤: 下载和安装 Tomcat: 首先,您需要从 Apache Tomcat 的官方网站...
IntelliJ IDEA是java编程语言开发的集成环境,不过官方是英文版的,这是一个汉化包,下载后放至IntelliJ IDEA安装目录的lib目录下就行,重新启动程序就是中文了。
图解Intellij IDEA 入门教程,简单看得见,轻轻松松成为java开发高手
IntelliJ IDEA 中文指南
IntelliJ IDEA + PHP插件 ,
Intellij IDEA连接Spark集群
史上最简单的 IntelliJ IDEA 教程
vscode-intellij-idea-keybindings, vs 代码的IntelliJ IDEA 键绑定端口 用于 Visual Studio 代码的 IntelliJ IDEA 键绑定 vs 代码的IntelliJ IDEA 键绑定端口。 包括 keymaps,WebStorm,PyCharm,PHP Storm等流行...
idea使用教程 IntelliJ IDEA 使用教程 ,IntelliJ IDEA 使用教程
resources_cn_IntelliJIDEA_2017.3.1_r2.jar resources_cn_IntelliJIDEA_2017.3.2_r1.jar resources_cn_IntelliJIDEA_2017.3.3_r2.jar resources_cn_IntelliJIDEA_2017.3.4_r1.jar resources_...
IntelliJ IDEA安装详解配置教程
IntelliJ IDEA 2020.1.4 x64官方
基于Win7环境,IntelliJ IDEA 搭建Spark开发环境。
IntelliJ IDEA是java语言开发的集成环境,IntelliJ在业界被公认为最好的java开发工具之一,尤其在智能代码助手、代码自动提示、重构、J2EE支持、Ant、JUnit、CVS整合、代码审查、 创新的GUI设计等方面的功能可以说是...
Intellij IDEA汉化包。支持最新版本2018.直接复制lib即可。file-seting可用。 设置下菜单可正常打开。
1.IntelliJ IDEA 介绍 2.本教程介绍 3.Windows 下安装 4.Ubuntu 下安装 5.Mac 下安装 6.安装总结 7.首次运行 8.安装目录讲解 9.界面讲解 10.主题字体和文件编码修改 11.各类文件类型图标讲解 12.索引的讲解 13.编译...
intellij idea 集成tomcat
在本教程中,您将学习如何创建、运行和打包打印到系统输出的简单 Java 应用程序。在此过程中,您将熟悉IntelliJ IDEA功能,以提高开发人员的工作效率:编码辅助和补充工具。 IDE解释 IDE(集成开发环境),它提供了...