- 浏览: 2490363 次
- 性别:
- 来自: 成都
文章分类
最新评论
-
nation:
你好,在部署Mesos+Spark的运行环境时,出现一个现象, ...
Spark(4)Deal with Mesos -
sillycat:
AMAZON Relatedhttps://www.godad ...
AMAZON API Gateway(2)Client Side SSL with NGINX -
sillycat:
sudo usermod -aG docker ec2-use ...
Docker and VirtualBox(1)Set up Shared Disk for Virtual Box -
sillycat:
Every Half an Hour30 * * * * /u ...
Build Home NAS(3)Data Redundancy -
sillycat:
3 List the Cron Job I Have>c ...
Build Home NAS(3)Data Redundancy
Spark Streaming 2020(1)Investigation
On my local I have spark cluster with Zeppelin Notebook. Kafka 3 Nodes Cluster on rancher-home, rancher-worker1, rancher-worker2.
Start Kafka Cluster
Start Zookeeper Cluster on 3 machines
> cd /opt/zookeeper
> /opt/zookeeper/bin/zkServer.sh start /opt/zookeeper/conf/zoo.cfg
Check status
>zkServer.sh status conf/zoo.cfg
1 leader, 2 followers
Start Kafka on all 3 machines
> cd /opt/kafka
> nohup /opt/kafka/bin/kafka-server-start.sh /opt/kafka/config/server.properties &
Check the Producer and Consumer
> bin/kafka-console-producer.sh --broker-list rancher-home:9092,rancher-worker1:9092,rancher-worker2:9092 --topic cluster1
> bin/kafka-console-consumer.sh --bootstrap-server rancher-home:9092,rancher-worker1:9092,rancher-worker2:9092 --topic cluster1 --from-beginning
Go to sparkmaster_service, start on rancher-home, go to sparkslave_service, start on rancher-worker1 and rancher-worker2
Check web console
http://rancher-home:8088/
Check who is using 8080 ports on CentOS7
> netstat -nlp | grep 8080
tcp6 0 0 :::8080 :::* LISTEN 6206/java
> ps -ef | grep 6206
It seems that zookeeper is using 8080 ports
Package
https://mvnrepository.com/artifact/org.apache.spark/spark-streaming-kafka-0-10_2.12/2.4.4
https://mvnrepository.com/artifact/org.apache.spark/spark-streaming-kafka-0-10_2.12/2.4.4
https://mvnrepository.com/artifact/org.apache.kafka/kafka-clients/2.4.0
Kafka Version
https://spark.apache.org/docs/latest/streaming-kafka-integration.html
One updated example
https://community.cloudera.com/t5/Support-Questions/Scala-Spark-Streaming-with-Kafka-Integration-in-Zeppelin-not/td-p/233507
It seems that it is hard to make streaming working well on Zeppelin, so I try to make it working in java and Scala in projects.
https://github.com/luohuazju/kiko-spark-java
Issues about HDFS is not a type, error message:
No FileSystem for scheme: hdfs
Solution:
https://www.cnblogs.com/justinzhang/p/4983673.html
https://www.edureka.co/community/3320/java-mapreduce-error-saying-no-filesystem-for-scheme-hdfs
https://brucebcampbell.wordpress.com/2014/12/11/fix-hadoop-hdfs-error-java-io-ioexception-no-filesystem-for-scheme-hdfs-at-org-apache-hadoop-fs-filesystem-getfilesystemclassfilesystem-java2385/
https://www.codelast.com/%E5%8E%9F%E5%88%9B-%E8%A7%A3%E5%86%B3%E8%AF%BB%E5%86%99hdfs%E6%96%87%E4%BB%B6%E7%9A%84%E9%94%99%E8%AF%AF%EF%BC%9Ano-filesystem-for-scheme-hdfs/
https://stackoverflow.com/questions/17265002/hadoop-no-filesystem-for-scheme-file
My changes in pom.xml and settings are as follow:
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
<version>${hadoop.version}</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>${hadoop.version}</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-mapreduce-client-core</artifactId>
<version>${hadoop.version}</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>${hadoop.version}</version>
</dependency>
<properties>
<spark.version>2.4.4</spark.version>
<hadoop.version>3.2.1</hadoop.version>
</properties>
SparkConf conf = this.getSparkConf();
SparkContext sc = new SparkContext(conf);
Configuration hadoopConf = sc.hadoopConfiguration();
hadoopConf.set("fs.hdfs.impl", "org.apache.hadoop.hdfs.DistributedFileSystem");
hadoopConf.set("fs.file.impl", "org.apache.hadoop.fs.LocalFileSystem");
Going to solve the HDFS issue and upgrade the project library
But after I solve the HDFS problem, I found that I run the Hadoop in Docker, the Spark Cluster has issue to talk to 9866 port which is the dis.datanode.address
https://kontext.tech/column/hadoop/265/default-ports-used-by-hadoop-services-hdfs-mapreduce-yarn
https://www.stefaanlippens.net/hadoop-3-default-ports.html
I may need to set up a HDFS Cluster outside of the Dockers
https://acadgild.com/blog/hadoop-3-x-installation-guide
https://www.linode.com/docs/databases/hadoop/how-to-install-and-set-up-hadoop-cluster/
References:
https://github.com/luohuazju/sillycat-spark
https://www.iteye.com/blog/sillycat-2215237
https://www.iteye.com/blog/sillycat-2406572
https://www.iteye.com/blog/sillycat-2370527
Third parties
https://www.cnblogs.com/luweiseu/p/8045863.html
zepplelin stream
https://www.cnblogs.com/luweiseu/p/8045863.html
https://henning.kropponline.de/2016/12/25/simple-spark-streaming-kafka-example-in-a-zeppelin-notebook/
https://juejin.im/post/5c997d9e5188252da22514e6
https://blog.csdn.net/qwemicheal/article/details/71082663
On my local I have spark cluster with Zeppelin Notebook. Kafka 3 Nodes Cluster on rancher-home, rancher-worker1, rancher-worker2.
Start Kafka Cluster
Start Zookeeper Cluster on 3 machines
> cd /opt/zookeeper
> /opt/zookeeper/bin/zkServer.sh start /opt/zookeeper/conf/zoo.cfg
Check status
>zkServer.sh status conf/zoo.cfg
1 leader, 2 followers
Start Kafka on all 3 machines
> cd /opt/kafka
> nohup /opt/kafka/bin/kafka-server-start.sh /opt/kafka/config/server.properties &
Check the Producer and Consumer
> bin/kafka-console-producer.sh --broker-list rancher-home:9092,rancher-worker1:9092,rancher-worker2:9092 --topic cluster1
> bin/kafka-console-consumer.sh --bootstrap-server rancher-home:9092,rancher-worker1:9092,rancher-worker2:9092 --topic cluster1 --from-beginning
Go to sparkmaster_service, start on rancher-home, go to sparkslave_service, start on rancher-worker1 and rancher-worker2
Check web console
http://rancher-home:8088/
Check who is using 8080 ports on CentOS7
> netstat -nlp | grep 8080
tcp6 0 0 :::8080 :::* LISTEN 6206/java
> ps -ef | grep 6206
It seems that zookeeper is using 8080 ports
Package
https://mvnrepository.com/artifact/org.apache.spark/spark-streaming-kafka-0-10_2.12/2.4.4
https://mvnrepository.com/artifact/org.apache.spark/spark-streaming-kafka-0-10_2.12/2.4.4
https://mvnrepository.com/artifact/org.apache.kafka/kafka-clients/2.4.0
Kafka Version
https://spark.apache.org/docs/latest/streaming-kafka-integration.html
One updated example
https://community.cloudera.com/t5/Support-Questions/Scala-Spark-Streaming-with-Kafka-Integration-in-Zeppelin-not/td-p/233507
It seems that it is hard to make streaming working well on Zeppelin, so I try to make it working in java and Scala in projects.
https://github.com/luohuazju/kiko-spark-java
Issues about HDFS is not a type, error message:
No FileSystem for scheme: hdfs
Solution:
https://www.cnblogs.com/justinzhang/p/4983673.html
https://www.edureka.co/community/3320/java-mapreduce-error-saying-no-filesystem-for-scheme-hdfs
https://brucebcampbell.wordpress.com/2014/12/11/fix-hadoop-hdfs-error-java-io-ioexception-no-filesystem-for-scheme-hdfs-at-org-apache-hadoop-fs-filesystem-getfilesystemclassfilesystem-java2385/
https://www.codelast.com/%E5%8E%9F%E5%88%9B-%E8%A7%A3%E5%86%B3%E8%AF%BB%E5%86%99hdfs%E6%96%87%E4%BB%B6%E7%9A%84%E9%94%99%E8%AF%AF%EF%BC%9Ano-filesystem-for-scheme-hdfs/
https://stackoverflow.com/questions/17265002/hadoop-no-filesystem-for-scheme-file
My changes in pom.xml and settings are as follow:
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
<version>${hadoop.version}</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>${hadoop.version}</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-mapreduce-client-core</artifactId>
<version>${hadoop.version}</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>${hadoop.version}</version>
</dependency>
<properties>
<spark.version>2.4.4</spark.version>
<hadoop.version>3.2.1</hadoop.version>
</properties>
SparkConf conf = this.getSparkConf();
SparkContext sc = new SparkContext(conf);
Configuration hadoopConf = sc.hadoopConfiguration();
hadoopConf.set("fs.hdfs.impl", "org.apache.hadoop.hdfs.DistributedFileSystem");
hadoopConf.set("fs.file.impl", "org.apache.hadoop.fs.LocalFileSystem");
Going to solve the HDFS issue and upgrade the project library
But after I solve the HDFS problem, I found that I run the Hadoop in Docker, the Spark Cluster has issue to talk to 9866 port which is the dis.datanode.address
https://kontext.tech/column/hadoop/265/default-ports-used-by-hadoop-services-hdfs-mapreduce-yarn
https://www.stefaanlippens.net/hadoop-3-default-ports.html
I may need to set up a HDFS Cluster outside of the Dockers
https://acadgild.com/blog/hadoop-3-x-installation-guide
https://www.linode.com/docs/databases/hadoop/how-to-install-and-set-up-hadoop-cluster/
References:
https://github.com/luohuazju/sillycat-spark
https://www.iteye.com/blog/sillycat-2215237
https://www.iteye.com/blog/sillycat-2406572
https://www.iteye.com/blog/sillycat-2370527
Third parties
https://www.cnblogs.com/luweiseu/p/8045863.html
zepplelin stream
https://www.cnblogs.com/luweiseu/p/8045863.html
https://henning.kropponline.de/2016/12/25/simple-spark-streaming-kafka-example-in-a-zeppelin-notebook/
https://juejin.im/post/5c997d9e5188252da22514e6
https://blog.csdn.net/qwemicheal/article/details/71082663
发表评论
-
Update Site will come soon
2021-06-02 04:10 1615I am still keep notes my tech n ... -
Stop Update Here
2020-04-28 09:00 268I will stop update here, and mo ... -
NodeJS12 and Zlib
2020-04-01 07:44 434NodeJS12 and Zlib It works as ... -
Docker Swarm 2020(2)Docker Swarm and Portainer
2020-03-31 23:18 316Docker Swarm 2020(2)Docker Swar ... -
Docker Swarm 2020(1)Simply Install and Use Swarm
2020-03-31 07:58 326Docker Swarm 2020(1)Simply Inst ... -
Traefik 2020(1)Introduction and Installation
2020-03-29 13:52 297Traefik 2020(1)Introduction and ... -
Portainer 2020(4)Deploy Nginx and Others
2020-03-20 12:06 383Portainer 2020(4)Deploy Nginx a ... -
Private Registry 2020(1)No auth in registry Nginx AUTH for UI
2020-03-18 00:56 379Private Registry 2020(1)No auth ... -
Docker Compose 2020(1)Installation and Basic
2020-03-15 08:10 331Docker Compose 2020(1)Installat ... -
VPN Server 2020(2)Docker on CentOS in Ubuntu
2020-03-02 08:04 403VPN Server 2020(2)Docker on Cen ... -
Buffer in NodeJS 12 and NodeJS 8
2020-02-25 06:43 339Buffer in NodeJS 12 and NodeJS ... -
NodeJS ENV Similar to JENV and PyENV
2020-02-25 05:14 420NodeJS ENV Similar to JENV and ... -
Prometheus HA 2020(3)AlertManager Cluster
2020-02-24 01:47 365Prometheus HA 2020(3)AlertManag ... -
Serverless with NodeJS and TencentCloud 2020(5)CRON and Settings
2020-02-24 01:46 294Serverless with NodeJS and Tenc ... -
GraphQL 2019(3)Connect to MySQL
2020-02-24 01:48 214GraphQL 2019(3)Connect to MySQL ... -
GraphQL 2019(2)GraphQL and Deploy to Tencent Cloud
2020-02-24 01:48 395GraphQL 2019(2)GraphQL and Depl ... -
GraphQL 2019(1)Apollo Basic
2020-02-19 01:36 280GraphQL 2019(1)Apollo Basic Cl ... -
Serverless with NodeJS and TencentCloud 2020(4)Multiple Handlers and Running wit
2020-02-19 01:19 268Serverless with NodeJS and Tenc ... -
Serverless with NodeJS and TencentCloud 2020(3)Build Tree and Traverse Tree
2020-02-19 01:19 268Serverless with NodeJS and Tenc ... -
Serverless with NodeJS and TencentCloud 2020(2)Trigger SCF in SCF
2020-02-19 01:18 255Serverless with NodeJS and Tenc ...
相关推荐
一个完善的Spark Streaming二次封装开源框架,包含:实时流任务调度、kafka偏移量管理,web后台管理,web api启动、停止spark streaming,宕机告警、自动重启等等功能支持,用户只需要关心业务代码,无需关注繁琐的...
java的sparkstreaming连接kafka的例子,kafka生产者生产消息,消费者读取消息,sparkstreaming读取kafka小区并进行存储iotdb数据库。
sparkStreaming消费数据不丢失,sparkStreaming消费数据不丢失
spark Streaming和structed streaming分析,理解整个 Spark Streaming 的模块划分和代码逻辑。
1.理解Spark Streaming的工作流程。 2.理解Spark Streaming的工作原理。 3.学会使用Spark Streaming处理流式数据。 二、实验环境 Windows 10 VMware Workstation Pro虚拟机 Hadoop环境 Jdk1.8 三、实验内容 (一)...
(1)利用SparkStreaming从文件目录读入日志信息,日志内容包含: ”日志级别、函数名、日志内容“ 三个字段,字段之间以空格拆分。请看数据源的文件。 (2)对读入都日志信息流进行指定筛选出日志级别为error或warn...
1.Spark Streaming整合Flume需要的安装包. 2. Spark Streaming拉取Flume数据的flume配置文件.conf 3. Flume向Spark Streaming推数据的flume配置文件.conf
基于 Flume+ Kafka+ Spark Streaming 实现实时监控输出日志的报警系统的 Spark Streaming 程序代码。 基于 Flume+Kafka+Spark Streaming 实现实时监控输出日志的报警系统的 Spark Streaming 程序代码,博客链接: ...
Scala代码积累之spark streaming kafka 数据存入到hive源码实例,Scala代码积累之spark streaming kafka 数据存入到hive源码实例。
flume+Logstash+Kafka+Spark Streaming进行实时日志处理分析【大数据】
spark Streaming的原理介绍和与storm的对比
写的非常好,早了好久才找到。SparkStreaming预研报告
spark streaming streaming
Spark零基础思维导图(内含spark-core ,spark-streaming,spark-sql),总结的很全面。 Spark零基础思维导图(内含spark-core ,spark-streaming,spark-sql)。 Spark零基础思维导图(内含spark-core ,spark-streaming,...
包含kafka消息中间件的使用和Spark Streaming的示例。
spark之sparkStreaming 理解,总结了自己的理解,欢迎大家下载观看!
讲述Storm与sparkStreaming分别用法与区别,在操作流程等。
Spark核心概念简介: Spark使用maven进行打包(减少jar包大小): Spark中的(弹性分布式数据集)简称RDD: ...SparkStreaming中的正常操作(每读2秒就计算一次): Spark中的local[2]: Spark中的处理流程图像:
Spark Streaming Programming Guide 翻译+个人学习笔记整理
06Spark Streaming原理和实践