- 浏览: 2487224 次
- 性别:
- 来自: 成都
文章分类
最新评论
-
nation:
你好,在部署Mesos+Spark的运行环境时,出现一个现象, ...
Spark(4)Deal with Mesos -
sillycat:
AMAZON Relatedhttps://www.godad ...
AMAZON API Gateway(2)Client Side SSL with NGINX -
sillycat:
sudo usermod -aG docker ec2-use ...
Docker and VirtualBox(1)Set up Shared Disk for Virtual Box -
sillycat:
Every Half an Hour30 * * * * /u ...
Build Home NAS(3)Data Redundancy -
sillycat:
3 List the Cron Job I Have>c ...
Build Home NAS(3)Data Redundancy
Data Solution 2019(10)Spark Cluster Solution with Zeppelin
Spark Single Cluster
https://spark.apache.org/docs/latest/spark-standalone.html
Mesos Cluster
https://spark.apache.org/docs/latest/running-on-mesos.html
Hadoop2 YARN
https://spark.apache.org/docs/latest/running-on-yarn.html
K8S
https://spark.apache.org/docs/latest/running-on-kubernetes.html
Zeppelin with Cluster
https://zeppelin.apache.org/docs/latest/interpreter/spark.html
Decide to Set Up Spark Standalone Cluster and Zeppelin
Start the Spark Master Machine
Prepare Spark
> wget http://apache.mirrors.ionfish.org/spark/spark-2.4.4/spark-2.4.4-bin-hadoop2.7.tgz
> tar zxvf spark-2.4.4-bin-hadoop2.7.tgz
> mv spark-2.4.4-bin-hadoop2.7 ~/tool/spark-2.4.4
> sudo ln -s /home/carl/tool/spark-2.4.4 /opt/spark-2.4.4
> sudo ln -s /opt/spark-2.4.4 /opt/spark
> cd /opt/spark
> cp conf/spark-env.sh.template conf/spark-env.sh
A lot of sample configuration there
# Options for the daemons used in the standalone deploy mode
# - SPARK_MASTER_HOST, to bind the master to a different IP address or hostname
# - SPARK_MASTER_PORT / SPARK_MASTER_WEBUI_PORT, to use non-default ports for the master
# - SPARK_MASTER_OPTS, to set config properties only for the master (e.g. "-Dx=y")
# - SPARK_WORKER_CORES, to set the number of cores to use on this machine
# - SPARK_WORKER_MEMORY, to set how much total memory workers have to give executors (e.g. 1000m, 2g)
# - SPARK_WORKER_PORT / SPARK_WORKER_WEBUI_PORT, to use non-default ports for the worker
# - SPARK_WORKER_DIR, to set the working directory of worker processes
# - SPARK_WORKER_OPTS, to set config properties only for the worker (e.g. "-Dx=y")
# - SPARK_DAEMON_MEMORY, to allocate to the master, worker and history server themselves (default: 1g).
# - SPARK_HISTORY_OPTS, to set config properties only for the history server (e.g. "-Dx=y")
# - SPARK_SHUFFLE_OPTS, to set config properties only for the external shuffle service (e.g. "-Dx=y")
# - SPARK_DAEMON_JAVA_OPTS, to set config properties for all daemons (e.g. "-Dx=y")
# - SPARK_DAEMON_CLASSPATH, to set the classpath for all daemons
# - SPARK_PUBLIC_DNS, to set the public dns name of the master or workers
https://spark.apache.org/docs/latest/spark-standalone.html
Make some changes according to my ENV
> vi conf/spark-env.sh
SPARK_MASTER_HOST=rancher-home
SPARK_MASTER_PORT=7077
SPARK_MASTER_WEBUI_PORT=8088
SPARK_WORKER_PORT=7177
SPARK_WORKER_WEBUI_PORT=8188
Start the master service
> sbin/start-master.sh
Start the Slave on rancher-worker1
> wget http://apache.mirrors.ionfish.org/spark/spark-2.4.4/spark-2.4.4-bin-hadoop2.7.tgz
> tar zxvf spark-2.4.4-bin-hadoop2.7.tgz
> mv spark-2.4.4-bin-hadoop2.7 ~/tool/spark-2.4.4
> sudo ln -s /home/carl/tool/spark-2.4.4 /opt/spark-2.4.4
> sudo ln -s /opt/spark-2.4.4 /opt/spark
Prepare Configuration
> cp conf/spark-env.sh.template conf/spark-env.sh
SPARK_MASTER_HOST=rancher-home
SPARK_MASTER_PORT=7077
SPARK_MASTER_WEBUI_PORT=8088
SPARK_WORKER_PORT=7177
SPARK_WORKER_WEBUI_PORT=8188
Start the slave and connect to master
> sbin/start-slave.sh spark://rancher-home:7077
Stop the slave
> sbin/stop-slave.sh spark://rancher-home:7077
Make Spark Cluster in Docker
# - SPARK_NO_DAEMONIZE Run the proposed command in the foreground. It will not output a PID file.
SPARK_NO_DAEMONIZE=true
It fails if I start the services
2019-10-28T00:41:42.502359700Z 19/10/28 00:41:42 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set()
2019-10-28T00:41:43.110823900Z 19/10/28 00:41:43 WARN Utils: Service 'sparkMaster' could not bind on port 7077. Attempting port 7078.
HOST file
https://cloud.tencent.com/developer/article/1175087
Finally, the configuration will be close to these for Master
SPARK_MASTER_PORT=7077
SPARK_MASTER_WEBUI_PORT=8088
SPARK_LOCAL_HOSTNAME=rancher-home
SPARK_IDENT_STRING=rancher-home
SPARK_PUBLIC_DNS=rancher-home
SPARK_NO_DAEMONIZE=true
SPARK_DAEMON_MEMORY=1g
Dockerfile as follow:
#Set up spark master in Docker
#Prepre the OS
FROM centos:7
MAINTAINER Yiyi Kang <yiyikangrachel@gmail.com>
RUN yum -y update
RUN yum install -y wget
#install jdk
RUN yum -y install java-1.8.0-openjdk.x86_64
RUN echo ‘export JAVA_HOME=/usr/lib/jvm/jre-1.8.0-openjdk’ | tee -a /etc/profile
RUN mkdir /tool/
WORKDIR /tool/
#add the software spark
RUN wget --no-verbose http://apache.mirrors.ionfish.org/spark/spark-2.4.4/spark-2.4.4-bin-hadoop2.7.tgz
RUN tar -xvzf spark-2.4.4-bin-hadoop2.7.tgz
RUN ln -s /tool/spark-2.4.4-bin-hadoop2.7 /tool/spark
ADD conf/spark-env.sh /tool/spark/conf/
#set up the app
EXPOSE 8088 7077
RUN mkdir -p /app/
ADD start.sh /app/
WORKDIR /app/
CMD [ "./start.sh" ]
Makefile important parts as follow:
run:
docker run -d -p 7077:7077 -p 8088:8088 \
--hostname rancher-home \
--name $(NAME) $(IMAGE):$(TAG)
The Slave Machine Configuration will be as follow:
SPARK_WORKER_PORT=7177
SPARK_WORKER_WEBUI_PORT=8188
SPARK_PUBLIC_DNS=rancher-worker1
SPARK_LOCAL_HOSTNAME=rancher-worker1
SPARK_IDENT_STRING=rancher-worker1
SPARK_NO_DAEMONIZE=true
Dockerfile is as follow:
#Set up spark slave in Docker
#Prepre the OS
FROM centos:7
MAINTAINER Yiyi Kang <yiyikangrachel@gmail.com>
RUN yum -y update
RUN yum install -y wget
#install jdk
RUN yum -y install java-1.8.0-openjdk.x86_64
RUN echo ‘export JAVA_HOME=/usr/lib/jvm/jre-1.8.0-openjdk’ | tee -a /etc/profile
RUN mkdir /tool/
WORKDIR /tool/
#add the software spark
RUN wget --no-verbose http://apache.mirrors.ionfish.org/spark/spark-2.4.4/spark-2.4.4-bin-hadoop2.7.tgz
RUN tar -xvzf spark-2.4.4-bin-hadoop2.7.tgz
RUN ln -s /tool/spark-2.4.4-bin-hadoop2.7 /tool/spark
ADD conf/spark-env.sh /tool/spark/conf/
#set up the app
EXPOSE 8188 7177
RUN mkdir -p /app/
ADD start.sh /app/
WORKDIR /app/
CMD [ "./start.sh" ]
Add host to point to our master machine
run:
docker run -d -p 7177:7177 -p 8188:8188 \
--name $(NAME) \
--hostname rancher-worker1 \
--add-host=rancher-home:192.168.56.110 $(IMAGE):$(TAG)
Next step is to put a lot of configuration in parameters.
References:
https://spark.apache.org/docs/latest/cluster-overview.html
https://stackoverflow.com/questions/28664834/which-cluster-type-should-i-choose-for-spark
https://stackoverflow.com/questions/39671117/docker-container-with-apache-spark-in-standalone-cluster-mode
https://github.com/shuaicj/docker-spark-master
https://stackoverflow.com/questions/32719007/spark-spark-public-dns-and-spark-local-ip-on-stand-alone-cluster-with-docker-con
Spark Single Cluster
https://spark.apache.org/docs/latest/spark-standalone.html
Mesos Cluster
https://spark.apache.org/docs/latest/running-on-mesos.html
Hadoop2 YARN
https://spark.apache.org/docs/latest/running-on-yarn.html
K8S
https://spark.apache.org/docs/latest/running-on-kubernetes.html
Zeppelin with Cluster
https://zeppelin.apache.org/docs/latest/interpreter/spark.html
Decide to Set Up Spark Standalone Cluster and Zeppelin
Start the Spark Master Machine
Prepare Spark
> wget http://apache.mirrors.ionfish.org/spark/spark-2.4.4/spark-2.4.4-bin-hadoop2.7.tgz
> tar zxvf spark-2.4.4-bin-hadoop2.7.tgz
> mv spark-2.4.4-bin-hadoop2.7 ~/tool/spark-2.4.4
> sudo ln -s /home/carl/tool/spark-2.4.4 /opt/spark-2.4.4
> sudo ln -s /opt/spark-2.4.4 /opt/spark
> cd /opt/spark
> cp conf/spark-env.sh.template conf/spark-env.sh
A lot of sample configuration there
# Options for the daemons used in the standalone deploy mode
# - SPARK_MASTER_HOST, to bind the master to a different IP address or hostname
# - SPARK_MASTER_PORT / SPARK_MASTER_WEBUI_PORT, to use non-default ports for the master
# - SPARK_MASTER_OPTS, to set config properties only for the master (e.g. "-Dx=y")
# - SPARK_WORKER_CORES, to set the number of cores to use on this machine
# - SPARK_WORKER_MEMORY, to set how much total memory workers have to give executors (e.g. 1000m, 2g)
# - SPARK_WORKER_PORT / SPARK_WORKER_WEBUI_PORT, to use non-default ports for the worker
# - SPARK_WORKER_DIR, to set the working directory of worker processes
# - SPARK_WORKER_OPTS, to set config properties only for the worker (e.g. "-Dx=y")
# - SPARK_DAEMON_MEMORY, to allocate to the master, worker and history server themselves (default: 1g).
# - SPARK_HISTORY_OPTS, to set config properties only for the history server (e.g. "-Dx=y")
# - SPARK_SHUFFLE_OPTS, to set config properties only for the external shuffle service (e.g. "-Dx=y")
# - SPARK_DAEMON_JAVA_OPTS, to set config properties for all daemons (e.g. "-Dx=y")
# - SPARK_DAEMON_CLASSPATH, to set the classpath for all daemons
# - SPARK_PUBLIC_DNS, to set the public dns name of the master or workers
https://spark.apache.org/docs/latest/spark-standalone.html
Make some changes according to my ENV
> vi conf/spark-env.sh
SPARK_MASTER_HOST=rancher-home
SPARK_MASTER_PORT=7077
SPARK_MASTER_WEBUI_PORT=8088
SPARK_WORKER_PORT=7177
SPARK_WORKER_WEBUI_PORT=8188
Start the master service
> sbin/start-master.sh
Start the Slave on rancher-worker1
> wget http://apache.mirrors.ionfish.org/spark/spark-2.4.4/spark-2.4.4-bin-hadoop2.7.tgz
> tar zxvf spark-2.4.4-bin-hadoop2.7.tgz
> mv spark-2.4.4-bin-hadoop2.7 ~/tool/spark-2.4.4
> sudo ln -s /home/carl/tool/spark-2.4.4 /opt/spark-2.4.4
> sudo ln -s /opt/spark-2.4.4 /opt/spark
Prepare Configuration
> cp conf/spark-env.sh.template conf/spark-env.sh
SPARK_MASTER_HOST=rancher-home
SPARK_MASTER_PORT=7077
SPARK_MASTER_WEBUI_PORT=8088
SPARK_WORKER_PORT=7177
SPARK_WORKER_WEBUI_PORT=8188
Start the slave and connect to master
> sbin/start-slave.sh spark://rancher-home:7077
Stop the slave
> sbin/stop-slave.sh spark://rancher-home:7077
Make Spark Cluster in Docker
# - SPARK_NO_DAEMONIZE Run the proposed command in the foreground. It will not output a PID file.
SPARK_NO_DAEMONIZE=true
It fails if I start the services
2019-10-28T00:41:42.502359700Z 19/10/28 00:41:42 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set()
2019-10-28T00:41:43.110823900Z 19/10/28 00:41:43 WARN Utils: Service 'sparkMaster' could not bind on port 7077. Attempting port 7078.
HOST file
https://cloud.tencent.com/developer/article/1175087
Finally, the configuration will be close to these for Master
SPARK_MASTER_PORT=7077
SPARK_MASTER_WEBUI_PORT=8088
SPARK_LOCAL_HOSTNAME=rancher-home
SPARK_IDENT_STRING=rancher-home
SPARK_PUBLIC_DNS=rancher-home
SPARK_NO_DAEMONIZE=true
SPARK_DAEMON_MEMORY=1g
Dockerfile as follow:
#Set up spark master in Docker
#Prepre the OS
FROM centos:7
MAINTAINER Yiyi Kang <yiyikangrachel@gmail.com>
RUN yum -y update
RUN yum install -y wget
#install jdk
RUN yum -y install java-1.8.0-openjdk.x86_64
RUN echo ‘export JAVA_HOME=/usr/lib/jvm/jre-1.8.0-openjdk’ | tee -a /etc/profile
RUN mkdir /tool/
WORKDIR /tool/
#add the software spark
RUN wget --no-verbose http://apache.mirrors.ionfish.org/spark/spark-2.4.4/spark-2.4.4-bin-hadoop2.7.tgz
RUN tar -xvzf spark-2.4.4-bin-hadoop2.7.tgz
RUN ln -s /tool/spark-2.4.4-bin-hadoop2.7 /tool/spark
ADD conf/spark-env.sh /tool/spark/conf/
#set up the app
EXPOSE 8088 7077
RUN mkdir -p /app/
ADD start.sh /app/
WORKDIR /app/
CMD [ "./start.sh" ]
Makefile important parts as follow:
run:
docker run -d -p 7077:7077 -p 8088:8088 \
--hostname rancher-home \
--name $(NAME) $(IMAGE):$(TAG)
The Slave Machine Configuration will be as follow:
SPARK_WORKER_PORT=7177
SPARK_WORKER_WEBUI_PORT=8188
SPARK_PUBLIC_DNS=rancher-worker1
SPARK_LOCAL_HOSTNAME=rancher-worker1
SPARK_IDENT_STRING=rancher-worker1
SPARK_NO_DAEMONIZE=true
Dockerfile is as follow:
#Set up spark slave in Docker
#Prepre the OS
FROM centos:7
MAINTAINER Yiyi Kang <yiyikangrachel@gmail.com>
RUN yum -y update
RUN yum install -y wget
#install jdk
RUN yum -y install java-1.8.0-openjdk.x86_64
RUN echo ‘export JAVA_HOME=/usr/lib/jvm/jre-1.8.0-openjdk’ | tee -a /etc/profile
RUN mkdir /tool/
WORKDIR /tool/
#add the software spark
RUN wget --no-verbose http://apache.mirrors.ionfish.org/spark/spark-2.4.4/spark-2.4.4-bin-hadoop2.7.tgz
RUN tar -xvzf spark-2.4.4-bin-hadoop2.7.tgz
RUN ln -s /tool/spark-2.4.4-bin-hadoop2.7 /tool/spark
ADD conf/spark-env.sh /tool/spark/conf/
#set up the app
EXPOSE 8188 7177
RUN mkdir -p /app/
ADD start.sh /app/
WORKDIR /app/
CMD [ "./start.sh" ]
Add host to point to our master machine
run:
docker run -d -p 7177:7177 -p 8188:8188 \
--name $(NAME) \
--hostname rancher-worker1 \
--add-host=rancher-home:192.168.56.110 $(IMAGE):$(TAG)
Next step is to put a lot of configuration in parameters.
References:
https://spark.apache.org/docs/latest/cluster-overview.html
https://stackoverflow.com/questions/28664834/which-cluster-type-should-i-choose-for-spark
https://stackoverflow.com/questions/39671117/docker-container-with-apache-spark-in-standalone-cluster-mode
https://github.com/shuaicj/docker-spark-master
https://stackoverflow.com/questions/32719007/spark-spark-public-dns-and-spark-local-ip-on-stand-alone-cluster-with-docker-con
发表评论
-
Update Site will come soon
2021-06-02 04:10 1612I am still keep notes my tech n ... -
Stop Update Here
2020-04-28 09:00 263I will stop update here, and mo ... -
NodeJS12 and Zlib
2020-04-01 07:44 433NodeJS12 and Zlib It works as ... -
Docker Swarm 2020(2)Docker Swarm and Portainer
2020-03-31 23:18 312Docker Swarm 2020(2)Docker Swar ... -
Docker Swarm 2020(1)Simply Install and Use Swarm
2020-03-31 07:58 323Docker Swarm 2020(1)Simply Inst ... -
Traefik 2020(1)Introduction and Installation
2020-03-29 13:52 294Traefik 2020(1)Introduction and ... -
Portainer 2020(4)Deploy Nginx and Others
2020-03-20 12:06 381Portainer 2020(4)Deploy Nginx a ... -
Private Registry 2020(1)No auth in registry Nginx AUTH for UI
2020-03-18 00:56 376Private Registry 2020(1)No auth ... -
Docker Compose 2020(1)Installation and Basic
2020-03-15 08:10 329Docker Compose 2020(1)Installat ... -
VPN Server 2020(2)Docker on CentOS in Ubuntu
2020-03-02 08:04 400VPN Server 2020(2)Docker on Cen ... -
Buffer in NodeJS 12 and NodeJS 8
2020-02-25 06:43 337Buffer in NodeJS 12 and NodeJS ... -
NodeJS ENV Similar to JENV and PyENV
2020-02-25 05:14 417NodeJS ENV Similar to JENV and ... -
Prometheus HA 2020(3)AlertManager Cluster
2020-02-24 01:47 363Prometheus HA 2020(3)AlertManag ... -
Serverless with NodeJS and TencentCloud 2020(5)CRON and Settings
2020-02-24 01:46 293Serverless with NodeJS and Tenc ... -
GraphQL 2019(3)Connect to MySQL
2020-02-24 01:48 209GraphQL 2019(3)Connect to MySQL ... -
GraphQL 2019(2)GraphQL and Deploy to Tencent Cloud
2020-02-24 01:48 392GraphQL 2019(2)GraphQL and Depl ... -
GraphQL 2019(1)Apollo Basic
2020-02-19 01:36 276GraphQL 2019(1)Apollo Basic Cl ... -
Serverless with NodeJS and TencentCloud 2020(4)Multiple Handlers and Running wit
2020-02-19 01:19 265Serverless with NodeJS and Tenc ... -
Serverless with NodeJS and TencentCloud 2020(3)Build Tree and Traverse Tree
2020-02-19 01:19 261Serverless with NodeJS and Tenc ... -
Serverless with NodeJS and TencentCloud 2020(2)Trigger SCF in SCF
2020-02-19 01:18 253Serverless with NodeJS and Tenc ...
相关推荐
Find out and implement the tools and techniques of big data analytics using Spark on Hadoop clusters with wide variety of tools used with Spark and Hadoop Understand all the Hadoop and Spark ecosystem...
Advanced analytics on your Big Data with latest Apache Spark 2.x About This Book An advanced guide with a combination of instructions and practical examples to extend the most up-to date Spark ...
You will also learn how to develop Spark applications using SparkR and PySpark APIs, interactive data analytics using Zeppelin, and in-memory data processing with Alluxio. By the end of this book, ...
藏经阁-nabling Apache Zeppelin_ and Spark_ for Data Science in the
Advanced analytics on your Big Data with latest Apache Spark 2.x About This Book An advanced guide with a combination of instructions and practical examples to extend the most up-to date Spark ...
spark streaming streaming
With this plugin, you can conveniently work with Zeppelin notebooks, run applications with spark-submit, produce and consume messages with Kafka, monitor Spark and Hadoop YARN applications, and work ...
Spark是一个非常好的计算平台,支持多种语言,同时基于内存的计算速度也非常快。整个开源社区也很活跃。但是Spark在易用性上面还是有一些美中不足。对于刚接触的人来说,上手以及环境搭建还是有一些困难。另外,如果...
vagrant-spark-zeppelin:Vagrant,Apache Spark和Apache Zeppelin VM,带有用于学习Spark的笔记本
Starting with introductory recipes on utilizing the Breeze and Spark libraries, get to grips withhow to import data from a host of possible sources and how to pre-process numerical, string, and date ...
Install and use DCOS for big data processingUse Apache Spark for big data stack data processing Who This Book Is For Developers, architects, IT project managers, database administrators, and others...
藏经阁-State of Security_Apache Spark&Apache Zeppelin.pdf
火花壳角度的降价PostgreSQL 数据库Python 基础弹性搜索dylanmei/zeppelin:latest先前版本包含Spark 1.6.0,Python 2.7和所有股票解释器。 该图像仍以dylanmei/zeppelin:0.6.0-stable 。简单用法要启动Zeppelin,请...
数据科学的火花一个rocker/r-base:latest基于rocker/r-base:latest的和 Docker容器。 该图像包含: , 和 支持 , , , 和 开箱即用的某些口译员。.../usr/zeppelin/data josepcurto/sparkdsZeppelin将在ht
The book opens with an overview of the Spark ecosystem. The book will introduce you to Project Catalyst and Tungsten. You will understand how Memory Management and Binary Processing, Cache-aware ...
zeppelin主题两个,愿得一人心 ,白首不相离 首先保证已经越狱并安装了Zeppelin,【Zeppelin安装教程】 1,下载你想要的文字或者图标到你的电脑,本文最后会提供部分小编喜欢的图标 2,手机连接PP助手,点开...
Apache Zeppelin 0.7.2 中文文档 Apache Zeppelin 0.7.2 中文文档 Apache Zeppelin 0.7.2 中文文档
当地的构建映像并在安装了数据量的本地模式下运行docker build -t zeppelin:1.5.0 .mkdir /data && chmod -R 777 /datadocker run -d -v /data:/zeppelin/data -p 8080:8080 -p 8081:8081 zeppelin:1.5.0Zeppelin将...
cloudera manager6.2.1web界面集成zeppelin,由于原装的CDH6.2.1parcel包没有包含zeppelin组件,我们公司又用到了这个zeppelin组件,所以我临危受命开始安装zeppelin,刚开始的时候也是不太懂怎么安装,第一次接触最新的...
apache zeppelin使用文档