`

spark使用总结

 
阅读更多
1.
RDD:Resilient Distributed Dataset 弹性分布数据集
http://developer.51cto.com/art/201309/410276_1.htm
2.spark-shell 的使用

./spark-shell --driver-library-path :/usr/local/hadoop-1.1.2/lib/native/Linux-i386-32:/usr/local/hadoop-1.1.2/lib/native/Linux-amd64-64:/usr/local/hadoop-1.1.2/lib/hadoop-lzo-0.4.17-SNAPSHOT.jar
3.
wordcount 程序
val file = sc.textFile("hdfs://192.168.100.99:9000/user/chaobo/test/tmp/2014/07/07/hive-site.xml.lzo")
val count = file.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey(_+_)
结果打印到屏幕count.collect()
结果写到hdfs count.saveAsTextFile("hdfs://192.168.100.99:9000/user/chaobo/result_20140707")   最后一级目录不能存在
4.启动主节点
../sbin/start-master.sh
5.启动子节点
../sbin/start-slave.sh --webui-port 8081
分享到:
评论

相关推荐

Global site tag (gtag.js) - Google Analytics