Spark Streaming+Flume对接实验

superlxw1234

浏览: 542220 次
性别:
来自: 西安

最近访客更多访客>>

huageng520

rattersnake

yuanyuan7891

ticojj

博主相关

博客

微博

相册

留言

关于我

博客专栏

: Hive入门
浏览量：43173

文章分类

社区版块

存档分类

博客分类：

Spark

spark streaming flume

文章来自： http://lxw1234.com/?p=217

软件环境：

flume-ng-core-1.4.0-cdh5.0.0

spark-1.2.0-bin-hadoop2.3

流程说明：

Spark Streaming: 使用spark-streaming-flume_2.10-1.2.0插件，启动一个avro source，用来接收数据，并做相应的处理；
Flume agent：source监控本地文件系统的一个目录，当文件发生变化时候，由avro sink发送至Spark Streaming的监听端口

Flume配置：

flume-lxw-conf.properties

#-->设置sources名称
agent_lxw.sources = sources1
#--> 设置channel名称
agent_lxw.channels = fileChannel
#--> 设置sink 名称
agent_lxw.sinks = sink1
 
# source 配置
## 一个自定义的Source，实现类似tail -f 的功能，比exec source更可靠
agent_lxw.sources.sources1.type = org.apache.flume.source.taildirectory.DirectoryTailSource
agent_lxw.sources.sources1.dirs = lxwlog
## 监控的目录
agent_lxw.sources.sources1.dirs.lxwlog.path = file:///tmp/lxw-source
#监控文件的正则规则，此正则用java的正则
agent_lxw.sources.sources1.dirs.lxwlog.file-pattern = ^lxw_.*log$
agent_lxw.sources.sources1.first-line-pattern = ^(.*)$
agent_lxw.sources.sources1.channels = fileChannel
 
 
# sink 1 配置 将数据发送至slave004.lxw1234.com的44444端口
agent_lxw.sinks.sink1.type = avro
agent_lxw.sinks.sink1.hostname = slave004.lxw1234.com
agent_lxw.sinks.sink1.port = 44444
agent_lxw.sinks.sink1.channel = fileChannel
agent_lxw.sinks.sink1.batch-size = 500
agent_lxw.sinks.sink1.connect-timeout = 40000
agent_lxw.sinks.sink1.request-timeout = 40000
 
agent_lxw.channels.fileChannel.type = file
#-->检测点文件所存储的目录
agent_lxw.channels.fileChannel.checkpointDir = /tmp/flume/checkpoint/site
#-->数据存储所在的目录设置
agent_lxw.channels.fileChannel.dataDirs = /tmp/flume/data/site
#-->隧道的最大容量
agent_lxw.channels.fileChannel.capacity = 10000
#-->事务容量的最大值设置
agent_lxw.channels.fileChannel.transactionCapacity = 100

Spark Streaming程序：

Spark_Flume.scala

package com.lxw.test
 
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.storage.StorageLevel
import org.apache.spark.streaming.Seconds
import org.apache.spark.streaming.StreamingContext
import org.apache.spark.streaming.flume.FlumeUtils
 
 
object Spark_Flume {
def main (args : Array[String]) {
if(args.length < 2) {
println("Usage: Spark_Flume <hostname> <port>")
System.exit(1)
}
val hostname = args(0)
val port = Integer.parseInt(args(1))
val sc = new SparkContext(new SparkConf().setAppName("Spark_Flume"))
val ssc = new StreamingContext(sc, Seconds(10))
val flumeStream = FlumeUtils.createStream(ssc, hostname, port,StorageLevel.MEMORY_AND_DISK)
flumeStream.map(e => "Event:header:" + e.event.get(0).toString + "body: " + new String(e.event.getBody.array)).print()
ssc.start()
ssc.awaitTermination()
}
}

启动：

先启动Spark Streaming程序：

./spark-submit \
--name "spark-flume" \
--master spark://192.168.1.130:7077 \
--executor-memory 1G \
--class com.lxw.test.Spark_Flume \
/home/liuxiaowen/spark-flume.jar slave004.lxw1234.com 44444

再启动Flume agent:

flume-ng agent -n agent_lxw --conf . -f flume-lxw-conf.properties

效果示例：

注意事项：

参见原文： http://lxw1234.com/?p=217

0
顶

0
踩

分享到：

Java和Shell版十进制和十六进制(Hex)互 ... | Hive索引原理机制与使用

2015-05-18 15:54
浏览 1689
评论(0)
分类:编程语言
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

博客专栏

文章分类

社区版块

存档分类

最新评论

Spark Streaming+Flume对接实验

软件环境：

流程说明：

Flume配置：

Spark Streaming程序：

启动：

效果示例：

注意事项：

评论

发表评论

相关推荐

最近访客 更多访客>>

博主相关

博客专栏

文章分类

社区版块

存档分类

最新评论

Spark Streaming+Flume对接实验

软件环境：

流程说明：

Flume配置：

Spark Streaming程序：

启动：

效果示例：

注意事项：

评论

发表评论

相关推荐

Spark算子：统计RDD分区中的元素及数量

SparkSQL读取HBase数据，通过自定义外部数据源

SparkSQL读取HBase数据

Spark读取HDFS文件，任务本地化(NODE_LOCAL)

Spark SQL中实现Hive MapJoin

Spark1.4.0-SparkSQL与Hive整合-支持窗口分析函数

Spark1.3.1安装配置运行

Spark视频免费下载

最近访客更多访客>>