`
sillycat
  • 浏览: 2487224 次
  • 性别: Icon_minigender_1
  • 来自: 成都
社区版块
存档分类
最新评论

Data Solution 2019(10)Spark Cluster Solution with Zeppelin

 
阅读更多
Data Solution 2019(10)Spark Cluster Solution with Zeppelin

Spark Single Cluster
https://spark.apache.org/docs/latest/spark-standalone.html
Mesos Cluster
https://spark.apache.org/docs/latest/running-on-mesos.html
Hadoop2 YARN
https://spark.apache.org/docs/latest/running-on-yarn.html
K8S
https://spark.apache.org/docs/latest/running-on-kubernetes.html

Zeppelin with Cluster
https://zeppelin.apache.org/docs/latest/interpreter/spark.html

Decide to Set Up Spark Standalone Cluster and Zeppelin
Start the Spark Master Machine
Prepare Spark
> wget http://apache.mirrors.ionfish.org/spark/spark-2.4.4/spark-2.4.4-bin-hadoop2.7.tgz
> tar zxvf spark-2.4.4-bin-hadoop2.7.tgz
> mv spark-2.4.4-bin-hadoop2.7 ~/tool/spark-2.4.4
> sudo ln -s /home/carl/tool/spark-2.4.4 /opt/spark-2.4.4
> sudo ln -s /opt/spark-2.4.4 /opt/spark
> cd /opt/spark
> cp conf/spark-env.sh.template conf/spark-env.sh

A lot of sample configuration there
# Options for the daemons used in the standalone deploy mode
# - SPARK_MASTER_HOST, to bind the master to a different IP address or hostname
# - SPARK_MASTER_PORT / SPARK_MASTER_WEBUI_PORT, to use non-default ports for the master
# - SPARK_MASTER_OPTS, to set config properties only for the master (e.g. "-Dx=y")
# - SPARK_WORKER_CORES, to set the number of cores to use on this machine
# - SPARK_WORKER_MEMORY, to set how much total memory workers have to give executors (e.g. 1000m, 2g)
# - SPARK_WORKER_PORT / SPARK_WORKER_WEBUI_PORT, to use non-default ports for the worker
# - SPARK_WORKER_DIR, to set the working directory of worker processes
# - SPARK_WORKER_OPTS, to set config properties only for the worker (e.g. "-Dx=y")
# - SPARK_DAEMON_MEMORY, to allocate to the master, worker and history server themselves (default: 1g).
# - SPARK_HISTORY_OPTS, to set config properties only for the history server (e.g. "-Dx=y")
# - SPARK_SHUFFLE_OPTS, to set config properties only for the external shuffle service (e.g. "-Dx=y")
# - SPARK_DAEMON_JAVA_OPTS, to set config properties for all daemons (e.g. "-Dx=y")
# - SPARK_DAEMON_CLASSPATH, to set the classpath for all daemons
# - SPARK_PUBLIC_DNS, to set the public dns name of the master or workers

https://spark.apache.org/docs/latest/spark-standalone.html
Make some changes according to my ENV
> vi conf/spark-env.sh

SPARK_MASTER_HOST=rancher-home
SPARK_MASTER_PORT=7077
SPARK_MASTER_WEBUI_PORT=8088
SPARK_WORKER_PORT=7177
SPARK_WORKER_WEBUI_PORT=8188

Start the master service
> sbin/start-master.sh

Start the Slave on rancher-worker1
> wget http://apache.mirrors.ionfish.org/spark/spark-2.4.4/spark-2.4.4-bin-hadoop2.7.tgz
> tar zxvf spark-2.4.4-bin-hadoop2.7.tgz
> mv spark-2.4.4-bin-hadoop2.7 ~/tool/spark-2.4.4
> sudo ln -s /home/carl/tool/spark-2.4.4 /opt/spark-2.4.4
> sudo ln -s /opt/spark-2.4.4 /opt/spark

Prepare Configuration
> cp conf/spark-env.sh.template conf/spark-env.sh
SPARK_MASTER_HOST=rancher-home
SPARK_MASTER_PORT=7077
SPARK_MASTER_WEBUI_PORT=8088
SPARK_WORKER_PORT=7177
SPARK_WORKER_WEBUI_PORT=8188

Start the slave and connect to master
> sbin/start-slave.sh spark://rancher-home:7077

Stop the slave
> sbin/stop-slave.sh spark://rancher-home:7077

Make Spark Cluster in Docker
# - SPARK_NO_DAEMONIZE  Run the proposed command in the foreground. It will not output a PID file.
SPARK_NO_DAEMONIZE=true

It fails if I start the services
2019-10-28T00:41:42.502359700Z 19/10/28 00:41:42 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(root); groups with view permissions: Set(); users  with modify permissions: Set(root); groups with modify permissions: Set()
2019-10-28T00:41:43.110823900Z 19/10/28 00:41:43 WARN Utils: Service 'sparkMaster' could not bind on port 7077. Attempting port 7078.

HOST file
https://cloud.tencent.com/developer/article/1175087

Finally, the configuration will be close to these for Master
SPARK_MASTER_PORT=7077
SPARK_MASTER_WEBUI_PORT=8088
SPARK_LOCAL_HOSTNAME=rancher-home
SPARK_IDENT_STRING=rancher-home
SPARK_PUBLIC_DNS=rancher-home
SPARK_NO_DAEMONIZE=true
SPARK_DAEMON_MEMORY=1g

Dockerfile as follow:
#Set up spark master in Docker

#Prepre the OS
FROM    centos:7
MAINTAINER Yiyi Kang <yiyikangrachel@gmail.com>

RUN     yum -y update
RUN     yum install -y wget

#install jdk
RUN yum -y install java-1.8.0-openjdk.x86_64
RUN echo ‘export JAVA_HOME=/usr/lib/jvm/jre-1.8.0-openjdk’ | tee -a /etc/profile

RUN            mkdir /tool/
WORKDIR        /tool/

#add the software spark
RUN  wget --no-verbose http://apache.mirrors.ionfish.org/spark/spark-2.4.4/spark-2.4.4-bin-hadoop2.7.tgz
RUN  tar -xvzf spark-2.4.4-bin-hadoop2.7.tgz
RUN  ln -s /tool/spark-2.4.4-bin-hadoop2.7 /tool/spark

ADD  conf/spark-env.sh /tool/spark/conf/

#set up the app
EXPOSE  8088 7077
RUN     mkdir -p /app/
ADD     start.sh /app/
WORKDIR /app/
CMD    [ "./start.sh" ]

Makefile important parts as follow:
run:
    docker run -d -p 7077:7077 -p 8088:8088 \
    --hostname rancher-home \
    --name $(NAME) $(IMAGE):$(TAG)

The Slave Machine Configuration will be as follow:
SPARK_WORKER_PORT=7177
SPARK_WORKER_WEBUI_PORT=8188
SPARK_PUBLIC_DNS=rancher-worker1
SPARK_LOCAL_HOSTNAME=rancher-worker1
SPARK_IDENT_STRING=rancher-worker1
SPARK_NO_DAEMONIZE=true

Dockerfile is as follow:
#Set up spark slave in Docker

#Prepre the OS
FROM    centos:7
MAINTAINER Yiyi Kang <yiyikangrachel@gmail.com>

RUN     yum -y update
RUN     yum install -y wget

#install jdk
RUN yum -y install java-1.8.0-openjdk.x86_64
RUN echo ‘export JAVA_HOME=/usr/lib/jvm/jre-1.8.0-openjdk’ | tee -a /etc/profile

RUN            mkdir /tool/
WORKDIR        /tool/

#add the software spark
RUN  wget --no-verbose http://apache.mirrors.ionfish.org/spark/spark-2.4.4/spark-2.4.4-bin-hadoop2.7.tgz
RUN  tar -xvzf spark-2.4.4-bin-hadoop2.7.tgz
RUN  ln -s /tool/spark-2.4.4-bin-hadoop2.7 /tool/spark

ADD  conf/spark-env.sh /tool/spark/conf/

#set up the app
EXPOSE  8188 7177
RUN     mkdir -p /app/
ADD     start.sh /app/
WORKDIR /app/
CMD    [ "./start.sh" ]

Add host to point to our master machine
run:
    docker run -d -p 7177:7177 -p 8188:8188 \
    --name $(NAME) \
    --hostname rancher-worker1 \
    --add-host=rancher-home:192.168.56.110 $(IMAGE):$(TAG)

Next step is to put a lot of configuration in parameters.

References:
https://spark.apache.org/docs/latest/cluster-overview.html
https://stackoverflow.com/questions/28664834/which-cluster-type-should-i-choose-for-spark
https://stackoverflow.com/questions/39671117/docker-container-with-apache-spark-in-standalone-cluster-mode
https://github.com/shuaicj/docker-spark-master
https://stackoverflow.com/questions/32719007/spark-spark-public-dns-and-spark-local-ip-on-stand-alone-cluster-with-docker-con

分享到:
评论

相关推荐

Global site tag (gtag.js) - Google Analytics