`
Kevin12
  • 浏览: 230568 次
  • 性别: Icon_minigender_1
  • 来自: 上海
社区版块
存档分类
最新评论

spark集群HA搭建

阅读更多
spark集群的HA图:



搭建spark的HA需要安装zookeeper集群,下面简单说明一下zookeeper集群的安装方法;
我是将master1,worker1,worker2上安装zookeeper集群;
下面是先在master1上安装zookeeper,然后将配置好的拷贝到worker1和worker2上。
软件版本:zookeeper-3.4.6
1.解压并配置zookeeper环境变量
在虚拟机中的位置:/usr/local/zookeeper/zookeeper-3.4.6
环境变量配置:
export JAVA_HOME=/usr/local/jdk/jdk1.8.0_60
export ZOOKEEPER_HOME=/usr/local/zookeeper/zookeeper-3.4.6
export PATH=.:${JAVA_HOME}/bin:${SCALA_HOME}/bin:${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin:${SPARK_HOME}/bin:${ZOOKEEPER_HOME}/bin:$PATH

然后执行命令source ~/.bashrc命令是配置生效。
在master1上执行命令下面的命令将master1上配置了zookeeper的.bashrc拷贝到worker1,worker2上:
scp ~/.bashrc root@worker1:~/
scp ~/.bashrc root@worker2:~/

然后用ssh命令进入worker1,worker2上执行source ~/.bashrc 使配置生效。
2.配置master1上的zookeeper
进入/usr/local/zookeeper/zookeeper-3.4.6,使用命令mkdir logs创建一个logs目录,用mkdir data命令创建一个data目录;
在进入/usr/local/zookeeper/zookeeper-3.4.6/conf目录将zoo_sample.cfg拷贝一份名为zoo.cfg的文件,并编辑进行配置;
root@master1:/usr/local/zookeeper/zookeeper-3.4.6/conf# cp zoo_sample.cfg zoo.cfg 
root@master1:/usr/local/zookeeper/zookeeper-3.4.6/conf# vim zoo.cfg
 number of ticks that the initial 
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between 
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just 
# example sakes.
dataDir=/usr/local/zookeeper/zookeeper-3.4.6/data
dataLogDir=/usr/local/zookeeper/zookeeper-3.4.6/logs
server.0=master1:2888:3888
server.1=worker1:2888:3888
server.2=worker2:2888:3888
# the port at which the clients will connect
clientPort=2181
# the maximum number of client connections.
# increase this if you need to handle more clients
#maxClientCnxns=60
#
# Be sure to read the maintenance section of the 
# administrator guide before turning on autopurge.
#
# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
#
# The number of snapshots to retain in dataDir
#autopurge.snapRetainCount=3
# Purge task interval in hours
# Set to "0" to disable auto purge feature
#autopurge.purgeInterval=1

在master1上进入/usr/local/zookeeper/zookeeper-3.4.6/data目录,创建myid文件,并在文件中添加内容0,这个0(数字零)是和server.0中的数字对应的。
3.将master1的zookeeper拷贝到worker1和worker2上,并进行配置。
scp -r /usr/local/zookeeper/zookeeper-3.4.6 root@worker1:/usr/local/zookeeper/zookeeper-3.4.6/
scp -r /usr/local/zookeeper/zookeeper-3.4.6 root@worker2:/usr/local/zookeeper/zookeeper-3.4.6/

注:如果目标机器上没有zookeeper目录需要事先创建一下。
使用ssh worker1命令进入worker1,编辑/usr/local/zookeeper/zookeeper-3.4.6/data/myid,并将里面的内容改成1.
同上将worker2中的/usr/local/zookeeper/zookeeper-3.4.6/data/myid中的内容改成2.
myid的内容和配置文件中的server.0,server.1,server.2对应的。
4.启动zookeeper,测试zookeeper选举功能
分别在master1,worker1,worker2上面启动zookeeper;
root@master1:/usr/local/zookeeper/zookeeper-3.4.6/bin# ./zkServer.sh start
JMX enabled by default
Using config: /usr/local/zookeeper/zookeeper-3.4.6/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
root@master1:/usr/local/zookeeper/zookeeper-3.4.6/bin# ssh worker1
Welcome to Ubuntu 15.10 (GNU/Linux 4.2.0-16-generic x86_64)

 * Documentation:  https://help.ubuntu.com/

121 packages can be updated.
79 updates are security updates.

Last login: Sat Jan 30 19:56:23 2016 from 192.168.112.130
root@worker1:~#cd /usr/local/zookeeper/zookeeper-3.4.6/bin/
root@worker2:/usr/local/zookeeper/zookeeper-3.4.6/bin# ./zkServer.sh start
JMX enabled by default
Using config: /usr/local/zookeeper/zookeeper-3.4.6/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
#通过jps查看三台虚拟机上的进程都会多出一个QuorumPeerMain后台进程。
root@worker2:/usr/local/zookeeper/zookeeper-3.4.6/bin# jps
4433 NodeManager
5889 QuorumPeerMain
4343 DataNode
5918 Jps
root@worker2:/usr/local/zookeeper/zookeeper-3.4.6/bin# exit
注销
Connection to worker2 closed.
root@worker1:/usr/local/zookeeper/zookeeper-3.4.6/bin# jps
6006 Jps
4454 NodeManager
4364 DataNode
5964 QuorumPeerMain
root@worker1:/usr/local/zookeeper/zookeeper-3.4.6/bin# exit
注销
Connection to worker1 closed.
root@master1:/usr/local/zookeeper/zookeeper-3.4.6/bin# jps
6629 QuorumPeerMain
4471 NameNode
6681 Jps
4825 ResourceManager
4685 SecondaryNameNode
root@master1:/usr/local/zookeeper/zookeeper-3.4.6/bin# 
#通过zkServer.sh status命令查看每个台虚拟机的Mode状态,可以发现只有一个leader,两个follower.在leader那一台虚拟机中用命令zkServer.sh stop停止zookeeper,再用zkServer.sh status查看其它两台虚拟机,发现剩余两台中一个是leader,一个是follower,说明zookeeper进行了自动选举,这种自动选举可以使集群处于HA状态。下面看下具体操作:
root@master1:/usr/local/zookeeper/zookeeper-3.4.6/bin# zkServer.sh status
JMX enabled by default
Using config: /usr/local/zookeeper/zookeeper-3.4.6/bin/../conf/zoo.cfg
Mode: leader
root@worker1:/usr/local/zookeeper/zookeeper-3.4.6/bin# zkServer.sh status
JMX enabled by default
Using config: /usr/local/zookeeper/zookeeper-3.4.6/bin/../conf/zoo.cfg
Mode: follower
root@worker2:/usr/local/zookeeper/zookeeper-3.4.6/bin# zkServer.sh status
JMX enabled by default
Using config: /usr/local/zookeeper/zookeeper-3.4.6/bin/../conf/zoo.cfg
Mode: follower
可见master1上运行的是leader,其它两台虚拟机上运行的是follower,进入master1用zkServer.sh stop命令停止zookeeper,再次查看worker1,worker2上面的zookeeper状态。
root@master1:/usr/local/zookeeper/zookeeper-3.4.6/bin# zkServer.sh stop
JMX enabled by default
Using config: /usr/local/zookeeper/zookeeper-3.4.6/bin/../conf/zoo.cfg
Stopping zookeeper ... STOPPED
root@master1:/usr/local/zookeeper/zookeeper-3.4.6/bin# jps
8376 Jps
4825 ResourceManager
7933 SecondaryNameNode
7806 NameNode
root@worker2:/usr/local/zookeeper/zookeeper-3.4.6/bin# zkServer.sh status
JMX enabled by default
Using config: /usr/local/zookeeper/zookeeper-3.4.6/bin/../conf/zoo.cfg
Mode: leader
root@worker1:/usr/local/zookeeper/zookeeper-3.4.6/bin# zkServer.sh status
JMX enabled by default
Using config: /usr/local/zookeeper/zookeeper-3.4.6/bin/../conf/zoo.cfg
Mode: follower
可以见在master1上停止zookeeper后,在worker2上重新选举出了leader.再次启动master1上的zookeeper后,master1上就以follower状态运行。
root@master1:/usr/local/zookeeper/zookeeper-3.4.6/bin# zkServer.sh start
JMX enabled by default
Using config: /usr/local/zookeeper/zookeeper-3.4.6/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
root@master1:/usr/local/zookeeper/zookeeper-3.4.6/bin# zkServer.sh status
JMX enabled by default
Using config: /usr/local/zookeeper/zookeeper-3.4.6/bin/../conf/zoo.cfg
Mode: follower
root@master1:/usr/local/zookeeper/zookeeper-3.4.6/bin# 


要说明的一点:第一个启动zookeeper的虚拟机,其$ZOOKEEPER_HOME/bin目录下的zookeeper.out开始会有错误信息,原因是其他两台zookeeper还没启动,连接不上,等其他两台zookeeper启动后就正常了,这个可以忽略。
2016-01-31 07:09:37,261 [myid:0] - WARN  [WorkerSender[myid=0]:QuorumCnxManager@382] - Cannot open channel to 1 at election address worker1/192.168.112.131:3888
java.net.ConnectException: 拒绝连接
    at java.net.PlainSocketImpl.socketConnect(Native Method)
    at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
    at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
    at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
    at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
    at java.net.Socket.connect(Socket.java:589)
    at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368)
    at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:341)
    at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:449)
    at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:430)
    at java.lang.Thread.run(Thread.java:745)
2016-01-31 07:09:37,264 [myid:0] - WARN  [WorkerSender[myid=0]:QuorumCnxManager@382] - Cannot open channel to 2 at election address worker2/192.168.112.132:3888
java.net.ConnectException: 拒绝连接
    at java.net.PlainSocketImpl.socketConnect(Native Method)
    at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
    at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
    at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
    at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
    at java.net.Socket.connect(Socket.java:589)
    at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368)
    at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:341)
    at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:449)
    at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:430)
    at java.lang.Thread.run(Thread.java:745)

5.在spark-env.sh中配置zookeeper支持信息
export JAVA_HOME=/usr/local/jdk/jdk1.8.0_60
export export SCALA_HOME=/usr/local/scala/scala-2.10.4
export HADOOP_HOME=/usr/local/hadoop/hadoop-2.6.0
export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop
#export SPARK_MASTER_IP=master1
export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=master1:2181,worker1:2181,worker2:2181 -Dspark.deploy.zookeeper.dir=/spark"
export SPARK_WORKER_MEMORY=1g
export SPARK_EXECUTOR_MEMORY=1g
export SPARK_DRIVER_MEMORY=1g
export SPARK_WORKDER_CORES=4

注:#export SPARK_MASTER_IP=master1  这个配置要注释掉。
集群搭建时配置的spark参数可能和现在的不一样,主要是考虑个人电脑配置问题,如果memory配置太大,机器运行很慢。
说明:
-Dspark.deploy.recoveryMode=ZOOKEEPER    #说明整个集群状态是通过zookeeper来维护的,整个集群状态的恢复也是通过zookeeper来维护的。就是说用zookeeper做了spark的HA配置,Master(Active)挂掉的话,Master(standby)要想变成Master(Active)的话,Master(Standby)就要像zookeeper读取整个集群状态信息,然后进行恢复所有Worker和Driver的状态信息,和所有的Application状态信息;
-Dspark.deploy.zookeeper.url=master1:2181,worker1:2181,worker2:2181 #将所有配置了zookeeper,并且在这台机器上有可能做master(Active)的机器都配置进来;(我用了3台,就配置了3台)
-Dspark.deploy.zookeeper.dir=/spark
这里的dir和zookeeper配置文件zoo.cfg中的dataDir的区别???
-Dspark.deploy.zookeeper.dir是保存spark的元数据,保存了spark的作业运行状态;
zookeeper会保存spark集群的所有的状态信息,包括所有的Workers信息,所有的Applactions信息,所有的Driver信息,如果集群
然后通过scp命令将master1上的spark-env.sh拷贝到worker1和worker2的响应目录下面:
root@master1:/usr/local/spark/spark-1.6.0-bin-hadoop2.6/conf# scp spark-env.sh root@worker1:/usr/local/spark/spark-1.6.0-bin-hadoop2.6/conf/
root@master1:/usr/local/spark/spark-1.6.0-bin-hadoop2.6/conf# scp spark-env.sh root@worker2:/usr/local/spark/spark-1.6.0-bin-hadoop2.6/conf/

拷贝过去后一定要去查看worker1,worker2的spark-env.sh中的内容是否和master1中的一样。
在zookeeper集群状态是master1为follower,worker1为leader,worker2为follower的情况下测试spark的HA。
在master1上通过start-all.sh命令启动spark集群,此时worker1和worker2上面的Master并未启动,所以也要在worker1和worker2上面通过命令start-master.sh命令来启动各自的Master,启动后用jps命令查看Master进程,确保三个安装zookeeper的节点都启动了Master进程;用在浏览器地址栏中输入master1:8080,worker1:8080,worker2:8080就可以查看集群状态。







测试集群的HA

在master1上启动spark-shell,命令如下,注意master不是一个而是3个,交给zookeeper来管理,启动时也是通过zookeeper来获取master。
root@master1:/usr/local/spark/spark-1.6.0-bin-hadoop2.6/bin#  spark-shell --master spark://master1:7077,worker1:7077,worker2:7077
16/01/31 07:49:15 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/01/31 07:49:15 INFO spark.SecurityManager: Changing view acls to: root
16/01/31 07:49:15 INFO spark.SecurityManager: Changing modify acls to: root
16/01/31 07:49:15 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root)
16/01/31 07:49:15 INFO spark.HttpServer: Starting HTTP Server
16/01/31 07:49:16 INFO server.Server: jetty-8.y.z-SNAPSHOT
16/01/31 07:49:16 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:38357
16/01/31 07:49:16 INFO util.Utils: Successfully started service 'HTTP class server' on port 38357.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 1.6.0
      /_/

Using Scala version 2.10.5 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_60)
Type in expressions to have them evaluated.
Type :help for more information.
16/01/31 07:49:22 INFO spark.SparkContext: Running Spark version 1.6.0
16/01/31 07:49:22 INFO spark.SecurityManager: Changing view acls to: root
16/01/31 07:49:22 INFO spark.SecurityManager: Changing modify acls to: root
16/01/31 07:49:22 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root)
16/01/31 07:49:22 INFO util.Utils: Successfully started service 'sparkDriver' on port 45379.
16/01/31 07:49:23 INFO slf4j.Slf4jLogger: Slf4jLogger started
16/01/31 07:49:23 INFO Remoting: Starting remoting
16/01/31 07:49:23 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem@192.168.112.130:42792]
16/01/31 07:49:23 INFO util.Utils: Successfully started service 'sparkDriverActorSystem' on port 42792.
16/01/31 07:49:23 INFO spark.SparkEnv: Registering MapOutputTracker
16/01/31 07:49:23 INFO spark.SparkEnv: Registering BlockManagerMaster
16/01/31 07:49:23 INFO storage.DiskBlockManager: Created local directory at /tmp/blockmgr-09321e00-bbe5-4452-aa15-f02530b1f53f
16/01/31 07:49:23 INFO storage.MemoryStore: MemoryStore started with capacity 517.4 MB
16/01/31 07:49:23 INFO spark.SparkEnv: Registering OutputCommitCoordinator
16/01/31 07:49:24 INFO server.Server: jetty-8.y.z-SNAPSHOT
16/01/31 07:49:24 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:4040
16/01/31 07:49:24 INFO util.Utils: Successfully started service 'SparkUI' on port 4040.
16/01/31 07:49:24 INFO ui.SparkUI: Started SparkUI at http://192.168.112.130:4040
16/01/31 07:49:24 INFO client.AppClient$ClientEndpoint: Connecting to master spark://master1:7077...
16/01/31 07:49:24 INFO client.AppClient$ClientEndpoint: Connecting to master spark://worker1:7077...
16/01/31 07:49:24 INFO client.AppClient$ClientEndpoint: Connecting to master spark://worker2:7077...
16/01/31 07:49:25 INFO cluster.SparkDeploySchedulerBackend: Connected to Spark cluster with app ID app-20160131074925-0000
16/01/31 07:49:25 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 46202.
16/01/31 07:49:25 INFO netty.NettyBlockTransferService: Server created on 46202
16/01/31 07:49:25 INFO storage.BlockManagerMaster: Trying to register BlockManager
16/01/31 07:49:25 INFO storage.BlockManagerMasterEndpoint: Registering block manager 192.168.112.130:46202 with 517.4 MB RAM, BlockManagerId(driver, 192.168.112.130, 46202)
16/01/31 07:49:25 INFO storage.BlockManagerMaster: Registered BlockManager
16/01/31 07:49:26 INFO client.AppClient$ClientEndpoint: Executor added: app-20160131074925-0000/0 on worker-20160131071148-192.168.112.132-41059 (192.168.112.132:41059) with 1 cores
16/01/31 07:49:26 INFO cluster.SparkDeploySchedulerBackend: Granted executor ID app-20160131074925-0000/0 on hostPort 192.168.112.132:41059 with 1 cores, 1024.0 MB RAM
16/01/31 07:49:26 INFO client.AppClient$ClientEndpoint: Executor added: app-20160131074925-0000/1 on worker-20160131071148-192.168.112.133-43458 (192.168.112.133:43458) with 1 cores
16/01/31 07:49:26 INFO cluster.SparkDeploySchedulerBackend: Granted executor ID app-20160131074925-0000/1 on hostPort 192.168.112.133:43458 with 1 cores, 1024.0 MB RAM
16/01/31 07:49:27 INFO client.AppClient$ClientEndpoint: Executor updated: app-20160131074925-0000/1 is now RUNNING
16/01/31 07:49:30 INFO client.AppClient$ClientEndpoint: Executor updated: app-20160131074925-0000/0 is now RUNNING
16/01/31 07:49:33 INFO scheduler.EventLoggingListener: Logging events to hdfs://master1:9000/historyserverforSpark/app-20160131074925-0000
16/01/31 07:49:33 INFO cluster.SparkDeploySchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
16/01/31 07:49:33 INFO repl.SparkILoop: Created spark context..
Spark context available as sc.
16/01/31 07:49:43 INFO hive.HiveContext: Initializing execution hive, version 1.2.1
16/01/31 07:49:43 INFO client.ClientWrapper: Inspected Hadoop version: 2.6.0
16/01/31 07:49:43 INFO client.ClientWrapper: Loaded org.apache.hadoop.hive.shims.Hadoop23Shims for Hadoop version 2.6.0
16/01/31 07:49:49 INFO metastore.HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
16/01/31 07:49:49 INFO metastore.ObjectStore: ObjectStore, initialize called
16/01/31 07:49:51 INFO DataNucleus.Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
16/01/31 07:49:51 INFO DataNucleus.Persistence: Property datanucleus.cache.level2 unknown - will be ignored
16/01/31 07:49:52 WARN DataNucleus.Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
16/01/31 07:49:58 WARN DataNucleus.Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
16/01/31 07:50:03 INFO metastore.ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
16/01/31 07:50:05 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
16/01/31 07:50:05 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
16/01/31 07:50:10 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
16/01/31 07:50:10 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
16/01/31 07:50:10 INFO cluster.SparkDeploySchedulerBackend: Registered executor NettyRpcEndpointRef(null) (worker3:58956) with ID 1
16/01/31 07:50:10 INFO storage.BlockManagerMasterEndpoint: Registering block manager worker3:34011 with 517.4 MB RAM, BlockManagerId(1, worker3, 34011)
16/01/31 07:50:11 INFO metastore.MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY
16/01/31 07:50:11 INFO metastore.ObjectStore: Initialized ObjectStore
16/01/31 07:50:13 WARN metastore.ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
16/01/31 07:50:14 WARN metastore.ObjectStore: Failed to get database default, returning NoSuchObjectException
16/01/31 07:50:14 INFO metastore.HiveMetaStore: Added admin role in metastore
16/01/31 07:50:14 INFO metastore.HiveMetaStore: Added public role in metastore
16/01/31 07:50:15 INFO metastore.HiveMetaStore: No user is added in admin role, since config is empty
16/01/31 07:50:15 INFO metastore.HiveMetaStore: 0: get_all_databases
16/01/31 07:50:15 INFO HiveMetaStore.audit: ugi=root    ip=unknown-ip-addr    cmd=get_all_databases    
16/01/31 07:50:15 INFO metastore.HiveMetaStore: 0: get_functions: db=default pat=*
16/01/31 07:50:15 INFO HiveMetaStore.audit: ugi=root    ip=unknown-ip-addr    cmd=get_functions: db=default pat=*    
16/01/31 07:50:15 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged as "embedded-only" so does not have its own datastore table.
16/01/31 07:50:16 INFO session.SessionState: Created local directory: /tmp/root
16/01/31 07:50:16 INFO session.SessionState: Created local directory: /tmp/d79d8d0f-b021-4443-97aa-e9da5f65f9fe_resources
16/01/31 07:50:16 INFO session.SessionState: Created HDFS directory: /tmp/hive/root/d79d8d0f-b021-4443-97aa-e9da5f65f9fe
16/01/31 07:50:16 INFO session.SessionState: Created local directory: /tmp/root/d79d8d0f-b021-4443-97aa-e9da5f65f9fe
16/01/31 07:50:16 INFO session.SessionState: Created HDFS directory: /tmp/hive/root/d79d8d0f-b021-4443-97aa-e9da5f65f9fe/_tmp_space.db
16/01/31 07:50:17 INFO hive.HiveContext: default warehouse location is /user/hive/warehouse
16/01/31 07:50:17 INFO hive.HiveContext: Initializing HiveMetastoreConnection version 1.2.1 using Spark classes.
16/01/31 07:50:17 INFO client.ClientWrapper: Inspected Hadoop version: 2.6.0
16/01/31 07:50:17 INFO client.ClientWrapper: Loaded org.apache.hadoop.hive.shims.Hadoop23Shims for Hadoop version 2.6.0
16/01/31 07:50:20 INFO metastore.HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
16/01/31 07:50:20 INFO cluster.SparkDeploySchedulerBackend: Registered executor NettyRpcEndpointRef(null) (worker2:46076) with ID 0
16/01/31 07:50:20 INFO metastore.ObjectStore: ObjectStore, initialize called
16/01/31 07:50:20 INFO storage.BlockManagerMasterEndpoint: Registering block manager worker2:41924 with 517.4 MB RAM, BlockManagerId(0, worker2, 41924)
16/01/31 07:50:21 INFO DataNucleus.Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
16/01/31 07:50:21 INFO DataNucleus.Persistence: Property datanucleus.cache.level2 unknown - will be ignored
16/01/31 07:50:21 WARN DataNucleus.Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
16/01/31 07:50:21 WARN DataNucleus.Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
16/01/31 07:50:23 INFO metastore.ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
16/01/31 07:50:25 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
16/01/31 07:50:25 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
16/01/31 07:50:25 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
16/01/31 07:50:25 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
16/01/31 07:50:26 INFO DataNucleus.Query: Reading in results for query "org.datanucleus.store.rdbms.query.SQLQuery@0" since the connection used is closing
16/01/31 07:50:26 INFO metastore.MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY
16/01/31 07:50:26 INFO metastore.ObjectStore: Initialized ObjectStore
16/01/31 07:50:26 INFO metastore.HiveMetaStore: Added admin role in metastore
16/01/31 07:50:26 INFO metastore.HiveMetaStore: Added public role in metastore
16/01/31 07:50:26 INFO metastore.HiveMetaStore: No user is added in admin role, since config is empty
16/01/31 07:50:26 INFO metastore.HiveMetaStore: 0: get_all_databases
16/01/31 07:50:26 INFO HiveMetaStore.audit: ugi=root    ip=unknown-ip-addr    cmd=get_all_databases    
16/01/31 07:50:26 INFO metastore.HiveMetaStore: 0: get_functions: db=default pat=*
16/01/31 07:50:26 INFO HiveMetaStore.audit: ugi=root    ip=unknown-ip-addr    cmd=get_functions: db=default pat=*    
16/01/31 07:50:26 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged as "embedded-only" so does not have its own datastore table.
16/01/31 07:50:26 INFO session.SessionState: Created local directory: /tmp/83189bde-0f10-427f-8825-e634e5d0e1ff_resources
16/01/31 07:50:26 INFO session.SessionState: Created HDFS directory: /tmp/hive/root/83189bde-0f10-427f-8825-e634e5d0e1ff
16/01/31 07:50:26 INFO session.SessionState: Created local directory: /tmp/root/83189bde-0f10-427f-8825-e634e5d0e1ff
16/01/31 07:50:26 INFO session.SessionState: Created HDFS directory: /tmp/hive/root/83189bde-0f10-427f-8825-e634e5d0e1ff/_tmp_space.db
16/01/31 07:50:27 INFO repl.SparkILoop: Created sql context (with Hive support)..
SQL context available as sqlContext.

scala> 


测试spark的HA,在worker1上停止spark的master进程,在回到master1中查看上面的窗口,提示信息如下,说明Master节点已经切换到worker2上了,这里的切换不是瞬间切换的,因为有Worker,Application,Driver信息,新产生的Master要恢复这些信息。
scala> 16/01/31 08:08:35 WARN client.AppClient$ClientEndpoint: Connection to worker1:7077 failed; waiting for master to reconnect...
16/01/31 08:08:35 WARN cluster.SparkDeploySchedulerBackend: Disconnected from Spark cluster! Waiting for reconnection...
16/01/31 08:08:35 WARN client.AppClient$ClientEndpoint: Connection to worker1:7077 failed; waiting for master to reconnect...
16/01/31 08:09:14 INFO client.AppClient$ClientEndpoint: Master has changed, new master is at spark://worker2:7077


查看worker2:8080,Master(active)已经交个worker2了。



说明如果再将worker1上spark的Master进程启动,集群的Master(active)也不会交还给原先的worker1。因为spark集群的状态信息都是交给zookeeper来管理的,在每个Master(standby),被选举为Master(active),恢复的集群状态都是一样的。并且集群的切换需要的时间不同,是根据集群规模确定的。

在worker2上停止spark的master进程后Master(active)切换到master1上面了,我们用stop-slaves.sh命令停止所有的Worker,再次用start-slaves.sh命令启动所有的Worker,然后再浏览器中查看集群状态,发现集群也会将之前的节点信息保存下来,说明了zookeeper中保存了集群所有的Workers信息,所有的Applactions信息,所有的Driver信息;

到此spark的HA搭建完成!

成功是属于勤奋坚持和执着的人,加油!!!



  • 大小: 104.6 KB
  • 大小: 187.4 KB
  • 大小: 221.7 KB
  • 大小: 180.1 KB
  • 大小: 222.4 KB
  • 大小: 239.2 KB
分享到:
评论

相关推荐

    spark yarn模式的搭建.docx

    3、Hadoop2.0 HA集群搭建步骤介绍; 4、Spark On Yarn搭建介绍; 5、集群启动介绍; 最新最全的java培训视频教程,供大家学习与技术分享。下载链接: 百度网盘链接:https://pan.baidu.com/s/133mIoFlCCmTDxxkb7RJ1Og ...

    Spark学习笔记 (二)Spark2.3 HA集群的分布式安装图文详解

    主要介绍了Spark2.3 HA集群的分布式安装,结合图文与实例形式详细分析了Spark2.3 HA集群分布式安装具体下载、安装、配置、启动及执行spark程序等相关操作技巧,需要的朋友可以参考下

    Spark环境搭建——standalone集群模式

    这篇博客,Alice为大家带来的是Spark集群环境搭建之——standalone集群模式。 文章目录集群角色介绍集群规划修改配置并分发启动和停止查看web界面测试 集群角色介绍  Spark是基于内存计算的大数据并行计算框架,...

    Hadoop+Spark+Hive+HBase+Oozie+Kafka+Flume+Flink+ES+Redash等详细安装部署

    1、内容概要:Hadoop+Spark+Hive+HBase+Oozie+Kafka+Flume+Flink+Elasticsearch+Redash等大数据集群及组件搭建指南(详细搭建步骤+实践过程问题总结)。 2、适合人群:大数据运维、大数据相关技术及组件初学者。 3、...

    cdh5.5.4 集群搭建 【自动化脚本+hadoop-ha,yarn-ha,zk,hbase,hive,flume,kafka,spark】

    cdh5.5.4 集群搭建 【自动化脚本+hadoop-ha,yarn-ha,zk,hbase,hive,flume,kafka,spark】全套高可用环境搭建,还有自动化启动脚本。只需要复制粘贴命令,就可以完成。3台机器。相关资源可以留言发邮件,我发资料。cdh...

    Spark环境搭建——HA高可用模式

    本篇博客,Alice为大家带来的是Spark的HA高可用环境搭建的教程。 原理 Spark Standalone集群是Master-Slaves架构的集群模式,和大部分的Master-Slaves结构集群一样,存在着Master单点故障的问题。 如何解决这个单点...

    Hadoop-2.5.2 HA集群搭建

    按照文档操作可安装7个节点的大数据集群,包括hadoop,hive,hbase,spark,tez,flume,kafka等等,不技术自动化运维及监控

    spark环境安装(Hadoop HA+Hbase+phoneix+kafka+flume+zookeeper+spark+scala)

    亲手在Centos7上安装,所用软件列表 apache-flume-1.8.0-bin.tar.gz apache-phoenix-4.13.0-HBase-1.3-bin.tar.gz hadoop-2.7.4.tar.gz ...spark-2.2.0-bin-hadoop2.7.tgz spark-2.2.0.tgz zookeeper-3.4.11.tar.gz

    Spark分布式内存计算框架视频教程

    4.Standalone集群及HA 5.Spark 应用开发入门 6.Spark 应用提交 7.Spark on YARN 8.应用部署模式DeployMode 第二章、SparkCore 模块 1.RDD 概念及特性 2.RDD 创建 3.RDD 函数及使用 4.RDD 持久化 5.案例:SogouQ日志...

    带你深入浅出,彻底了解什么是Spark?

    大数据专业,或者人工... local本地模式(单机)–开发测试使用2.standalone独立集群模式–开发测试使用3.standalone-HA高可用模式–生产环境使用4.on yarn集群模式–生产环境使用5.on mesos集群模式–国内使用较少6.on

    word源码java-spark_demo:spark_demo

    word源码java 说明: ...集群规划:(不启动HA) 主机名 IP 安装的软件 运行的进程 c7 192.168.0.107 jdk、scala、hadoop、spark nameNode、ResourceManager、Master c8 192.168.0.108 jdk、sca

    5堂Hadoop必修课,不会这些勿称高手

    大数据分布式集群搭建(HA),构建企业级MapReduce项目,hadoop和spark源码编译,Zookeeper, MapReduce高级Join操作等等,不会这些别说你是hadoop高手

    java查看函数源码-BigDataArchitect:大数据架构师

    4. hadoop-HDFS集群搭建-HA模式概念 5. hadoop-HDFS集群搭建-HA模式验证 6. hadoop-HDFS权限、企业级搭建、idea+maven开发HDFS 7. hadoop-MapReduce原理精讲、轻松入门 8. hadoop-MapReduce调度原理,Yarn原理 9. ...

    大数据学习计划.pdf

    所以在第⼆部分的学习中我们需要达到以下⽬标: 1、 搭建单节点模拟分布式集群,熟悉 HDFS 命令; 掌握 HDFS 体系结 构,读写流程,能 dump HDFS 元 数据⽂件; 理解 Flume 组件架构, 并能⽤ Flume 向 HDFS 平台导⼊...

Global site tag (gtag.js) - Google Analytics