`
Kevin12
  • 浏览: 236852 次
  • 性别: Icon_minigender_1
  • 来自: 上海
社区版块
存档分类
最新评论

spark集群HA搭建

阅读更多
spark集群的HA图:



搭建spark的HA需要安装zookeeper集群,下面简单说明一下zookeeper集群的安装方法;
我是将master1,worker1,worker2上安装zookeeper集群;
下面是先在master1上安装zookeeper,然后将配置好的拷贝到worker1和worker2上。
软件版本:zookeeper-3.4.6
1.解压并配置zookeeper环境变量
在虚拟机中的位置:/usr/local/zookeeper/zookeeper-3.4.6
环境变量配置:
export JAVA_HOME=/usr/local/jdk/jdk1.8.0_60
export ZOOKEEPER_HOME=/usr/local/zookeeper/zookeeper-3.4.6
export PATH=.:${JAVA_HOME}/bin:${SCALA_HOME}/bin:${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin:${SPARK_HOME}/bin:${ZOOKEEPER_HOME}/bin:$PATH

然后执行命令source ~/.bashrc命令是配置生效。
在master1上执行命令下面的命令将master1上配置了zookeeper的.bashrc拷贝到worker1,worker2上:
scp ~/.bashrc root@worker1:~/
scp ~/.bashrc root@worker2:~/

然后用ssh命令进入worker1,worker2上执行source ~/.bashrc 使配置生效。
2.配置master1上的zookeeper
进入/usr/local/zookeeper/zookeeper-3.4.6,使用命令mkdir logs创建一个logs目录,用mkdir data命令创建一个data目录;
在进入/usr/local/zookeeper/zookeeper-3.4.6/conf目录将zoo_sample.cfg拷贝一份名为zoo.cfg的文件,并编辑进行配置;
root@master1:/usr/local/zookeeper/zookeeper-3.4.6/conf# cp zoo_sample.cfg zoo.cfg 
root@master1:/usr/local/zookeeper/zookeeper-3.4.6/conf# vim zoo.cfg
 number of ticks that the initial 
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between 
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just 
# example sakes.
dataDir=/usr/local/zookeeper/zookeeper-3.4.6/data
dataLogDir=/usr/local/zookeeper/zookeeper-3.4.6/logs
server.0=master1:2888:3888
server.1=worker1:2888:3888
server.2=worker2:2888:3888
# the port at which the clients will connect
clientPort=2181
# the maximum number of client connections.
# increase this if you need to handle more clients
#maxClientCnxns=60
#
# Be sure to read the maintenance section of the 
# administrator guide before turning on autopurge.
#
# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
#
# The number of snapshots to retain in dataDir
#autopurge.snapRetainCount=3
# Purge task interval in hours
# Set to "0" to disable auto purge feature
#autopurge.purgeInterval=1

在master1上进入/usr/local/zookeeper/zookeeper-3.4.6/data目录,创建myid文件,并在文件中添加内容0,这个0(数字零)是和server.0中的数字对应的。
3.将master1的zookeeper拷贝到worker1和worker2上,并进行配置。
scp -r /usr/local/zookeeper/zookeeper-3.4.6 root@worker1:/usr/local/zookeeper/zookeeper-3.4.6/
scp -r /usr/local/zookeeper/zookeeper-3.4.6 root@worker2:/usr/local/zookeeper/zookeeper-3.4.6/

注:如果目标机器上没有zookeeper目录需要事先创建一下。
使用ssh worker1命令进入worker1,编辑/usr/local/zookeeper/zookeeper-3.4.6/data/myid,并将里面的内容改成1.
同上将worker2中的/usr/local/zookeeper/zookeeper-3.4.6/data/myid中的内容改成2.
myid的内容和配置文件中的server.0,server.1,server.2对应的。
4.启动zookeeper,测试zookeeper选举功能
分别在master1,worker1,worker2上面启动zookeeper;
root@master1:/usr/local/zookeeper/zookeeper-3.4.6/bin# ./zkServer.sh start
JMX enabled by default
Using config: /usr/local/zookeeper/zookeeper-3.4.6/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
root@master1:/usr/local/zookeeper/zookeeper-3.4.6/bin# ssh worker1
Welcome to Ubuntu 15.10 (GNU/Linux 4.2.0-16-generic x86_64)

 * Documentation:  https://help.ubuntu.com/

121 packages can be updated.
79 updates are security updates.

Last login: Sat Jan 30 19:56:23 2016 from 192.168.112.130
root@worker1:~#cd /usr/local/zookeeper/zookeeper-3.4.6/bin/
root@worker2:/usr/local/zookeeper/zookeeper-3.4.6/bin# ./zkServer.sh start
JMX enabled by default
Using config: /usr/local/zookeeper/zookeeper-3.4.6/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
#通过jps查看三台虚拟机上的进程都会多出一个QuorumPeerMain后台进程。
root@worker2:/usr/local/zookeeper/zookeeper-3.4.6/bin# jps
4433 NodeManager
5889 QuorumPeerMain
4343 DataNode
5918 Jps
root@worker2:/usr/local/zookeeper/zookeeper-3.4.6/bin# exit
注销
Connection to worker2 closed.
root@worker1:/usr/local/zookeeper/zookeeper-3.4.6/bin# jps
6006 Jps
4454 NodeManager
4364 DataNode
5964 QuorumPeerMain
root@worker1:/usr/local/zookeeper/zookeeper-3.4.6/bin# exit
注销
Connection to worker1 closed.
root@master1:/usr/local/zookeeper/zookeeper-3.4.6/bin# jps
6629 QuorumPeerMain
4471 NameNode
6681 Jps
4825 ResourceManager
4685 SecondaryNameNode
root@master1:/usr/local/zookeeper/zookeeper-3.4.6/bin# 
#通过zkServer.sh status命令查看每个台虚拟机的Mode状态,可以发现只有一个leader,两个follower.在leader那一台虚拟机中用命令zkServer.sh stop停止zookeeper,再用zkServer.sh status查看其它两台虚拟机,发现剩余两台中一个是leader,一个是follower,说明zookeeper进行了自动选举,这种自动选举可以使集群处于HA状态。下面看下具体操作:
root@master1:/usr/local/zookeeper/zookeeper-3.4.6/bin# zkServer.sh status
JMX enabled by default
Using config: /usr/local/zookeeper/zookeeper-3.4.6/bin/../conf/zoo.cfg
Mode: leader
root@worker1:/usr/local/zookeeper/zookeeper-3.4.6/bin# zkServer.sh status
JMX enabled by default
Using config: /usr/local/zookeeper/zookeeper-3.4.6/bin/../conf/zoo.cfg
Mode: follower
root@worker2:/usr/local/zookeeper/zookeeper-3.4.6/bin# zkServer.sh status
JMX enabled by default
Using config: /usr/local/zookeeper/zookeeper-3.4.6/bin/../conf/zoo.cfg
Mode: follower
可见master1上运行的是leader,其它两台虚拟机上运行的是follower,进入master1用zkServer.sh stop命令停止zookeeper,再次查看worker1,worker2上面的zookeeper状态。
root@master1:/usr/local/zookeeper/zookeeper-3.4.6/bin# zkServer.sh stop
JMX enabled by default
Using config: /usr/local/zookeeper/zookeeper-3.4.6/bin/../conf/zoo.cfg
Stopping zookeeper ... STOPPED
root@master1:/usr/local/zookeeper/zookeeper-3.4.6/bin# jps
8376 Jps
4825 ResourceManager
7933 SecondaryNameNode
7806 NameNode
root@worker2:/usr/local/zookeeper/zookeeper-3.4.6/bin# zkServer.sh status
JMX enabled by default
Using config: /usr/local/zookeeper/zookeeper-3.4.6/bin/../conf/zoo.cfg
Mode: leader
root@worker1:/usr/local/zookeeper/zookeeper-3.4.6/bin# zkServer.sh status
JMX enabled by default
Using config: /usr/local/zookeeper/zookeeper-3.4.6/bin/../conf/zoo.cfg
Mode: follower
可以见在master1上停止zookeeper后,在worker2上重新选举出了leader.再次启动master1上的zookeeper后,master1上就以follower状态运行。
root@master1:/usr/local/zookeeper/zookeeper-3.4.6/bin# zkServer.sh start
JMX enabled by default
Using config: /usr/local/zookeeper/zookeeper-3.4.6/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
root@master1:/usr/local/zookeeper/zookeeper-3.4.6/bin# zkServer.sh status
JMX enabled by default
Using config: /usr/local/zookeeper/zookeeper-3.4.6/bin/../conf/zoo.cfg
Mode: follower
root@master1:/usr/local/zookeeper/zookeeper-3.4.6/bin# 


要说明的一点:第一个启动zookeeper的虚拟机,其$ZOOKEEPER_HOME/bin目录下的zookeeper.out开始会有错误信息,原因是其他两台zookeeper还没启动,连接不上,等其他两台zookeeper启动后就正常了,这个可以忽略。
2016-01-31 07:09:37,261 [myid:0] - WARN  [WorkerSender[myid=0]:QuorumCnxManager@382] - Cannot open channel to 1 at election address worker1/192.168.112.131:3888
java.net.ConnectException: 拒绝连接
    at java.net.PlainSocketImpl.socketConnect(Native Method)
    at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
    at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
    at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
    at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
    at java.net.Socket.connect(Socket.java:589)
    at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368)
    at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:341)
    at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:449)
    at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:430)
    at java.lang.Thread.run(Thread.java:745)
2016-01-31 07:09:37,264 [myid:0] - WARN  [WorkerSender[myid=0]:QuorumCnxManager@382] - Cannot open channel to 2 at election address worker2/192.168.112.132:3888
java.net.ConnectException: 拒绝连接
    at java.net.PlainSocketImpl.socketConnect(Native Method)
    at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
    at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
    at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
    at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
    at java.net.Socket.connect(Socket.java:589)
    at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368)
    at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:341)
    at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:449)
    at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:430)
    at java.lang.Thread.run(Thread.java:745)

5.在spark-env.sh中配置zookeeper支持信息
export JAVA_HOME=/usr/local/jdk/jdk1.8.0_60
export export SCALA_HOME=/usr/local/scala/scala-2.10.4
export HADOOP_HOME=/usr/local/hadoop/hadoop-2.6.0
export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop
#export SPARK_MASTER_IP=master1
export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=master1:2181,worker1:2181,worker2:2181 -Dspark.deploy.zookeeper.dir=/spark"
export SPARK_WORKER_MEMORY=1g
export SPARK_EXECUTOR_MEMORY=1g
export SPARK_DRIVER_MEMORY=1g
export SPARK_WORKDER_CORES=4

注:#export SPARK_MASTER_IP=master1  这个配置要注释掉。
集群搭建时配置的spark参数可能和现在的不一样,主要是考虑个人电脑配置问题,如果memory配置太大,机器运行很慢。
说明:
-Dspark.deploy.recoveryMode=ZOOKEEPER    #说明整个集群状态是通过zookeeper来维护的,整个集群状态的恢复也是通过zookeeper来维护的。就是说用zookeeper做了spark的HA配置,Master(Active)挂掉的话,Master(standby)要想变成Master(Active)的话,Master(Standby)就要像zookeeper读取整个集群状态信息,然后进行恢复所有Worker和Driver的状态信息,和所有的Application状态信息;
-Dspark.deploy.zookeeper.url=master1:2181,worker1:2181,worker2:2181 #将所有配置了zookeeper,并且在这台机器上有可能做master(Active)的机器都配置进来;(我用了3台,就配置了3台)
-Dspark.deploy.zookeeper.dir=/spark
这里的dir和zookeeper配置文件zoo.cfg中的dataDir的区别???
-Dspark.deploy.zookeeper.dir是保存spark的元数据,保存了spark的作业运行状态;
zookeeper会保存spark集群的所有的状态信息,包括所有的Workers信息,所有的Applactions信息,所有的Driver信息,如果集群
然后通过scp命令将master1上的spark-env.sh拷贝到worker1和worker2的响应目录下面:
root@master1:/usr/local/spark/spark-1.6.0-bin-hadoop2.6/conf# scp spark-env.sh root@worker1:/usr/local/spark/spark-1.6.0-bin-hadoop2.6/conf/
root@master1:/usr/local/spark/spark-1.6.0-bin-hadoop2.6/conf# scp spark-env.sh root@worker2:/usr/local/spark/spark-1.6.0-bin-hadoop2.6/conf/

拷贝过去后一定要去查看worker1,worker2的spark-env.sh中的内容是否和master1中的一样。
在zookeeper集群状态是master1为follower,worker1为leader,worker2为follower的情况下测试spark的HA。
在master1上通过start-all.sh命令启动spark集群,此时worker1和worker2上面的Master并未启动,所以也要在worker1和worker2上面通过命令start-master.sh命令来启动各自的Master,启动后用jps命令查看Master进程,确保三个安装zookeeper的节点都启动了Master进程;用在浏览器地址栏中输入master1:8080,worker1:8080,worker2:8080就可以查看集群状态。







测试集群的HA

在master1上启动spark-shell,命令如下,注意master不是一个而是3个,交给zookeeper来管理,启动时也是通过zookeeper来获取master。
root@master1:/usr/local/spark/spark-1.6.0-bin-hadoop2.6/bin#  spark-shell --master spark://master1:7077,worker1:7077,worker2:7077
16/01/31 07:49:15 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/01/31 07:49:15 INFO spark.SecurityManager: Changing view acls to: root
16/01/31 07:49:15 INFO spark.SecurityManager: Changing modify acls to: root
16/01/31 07:49:15 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root)
16/01/31 07:49:15 INFO spark.HttpServer: Starting HTTP Server
16/01/31 07:49:16 INFO server.Server: jetty-8.y.z-SNAPSHOT
16/01/31 07:49:16 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:38357
16/01/31 07:49:16 INFO util.Utils: Successfully started service 'HTTP class server' on port 38357.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 1.6.0
      /_/

Using Scala version 2.10.5 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_60)
Type in expressions to have them evaluated.
Type :help for more information.
16/01/31 07:49:22 INFO spark.SparkContext: Running Spark version 1.6.0
16/01/31 07:49:22 INFO spark.SecurityManager: Changing view acls to: root
16/01/31 07:49:22 INFO spark.SecurityManager: Changing modify acls to: root
16/01/31 07:49:22 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root)
16/01/31 07:49:22 INFO util.Utils: Successfully started service 'sparkDriver' on port 45379.
16/01/31 07:49:23 INFO slf4j.Slf4jLogger: Slf4jLogger started
16/01/31 07:49:23 INFO Remoting: Starting remoting
16/01/31 07:49:23 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem@192.168.112.130:42792]
16/01/31 07:49:23 INFO util.Utils: Successfully started service 'sparkDriverActorSystem' on port 42792.
16/01/31 07:49:23 INFO spark.SparkEnv: Registering MapOutputTracker
16/01/31 07:49:23 INFO spark.SparkEnv: Registering BlockManagerMaster
16/01/31 07:49:23 INFO storage.DiskBlockManager: Created local directory at /tmp/blockmgr-09321e00-bbe5-4452-aa15-f02530b1f53f
16/01/31 07:49:23 INFO storage.MemoryStore: MemoryStore started with capacity 517.4 MB
16/01/31 07:49:23 INFO spark.SparkEnv: Registering OutputCommitCoordinator
16/01/31 07:49:24 INFO server.Server: jetty-8.y.z-SNAPSHOT
16/01/31 07:49:24 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:4040
16/01/31 07:49:24 INFO util.Utils: Successfully started service 'SparkUI' on port 4040.
16/01/31 07:49:24 INFO ui.SparkUI: Started SparkUI at http://192.168.112.130:4040
16/01/31 07:49:24 INFO client.AppClient$ClientEndpoint: Connecting to master spark://master1:7077...
16/01/31 07:49:24 INFO client.AppClient$ClientEndpoint: Connecting to master spark://worker1:7077...
16/01/31 07:49:24 INFO client.AppClient$ClientEndpoint: Connecting to master spark://worker2:7077...
16/01/31 07:49:25 INFO cluster.SparkDeploySchedulerBackend: Connected to Spark cluster with app ID app-20160131074925-0000
16/01/31 07:49:25 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 46202.
16/01/31 07:49:25 INFO netty.NettyBlockTransferService: Server created on 46202
16/01/31 07:49:25 INFO storage.BlockManagerMaster: Trying to register BlockManager
16/01/31 07:49:25 INFO storage.BlockManagerMasterEndpoint: Registering block manager 192.168.112.130:46202 with 517.4 MB RAM, BlockManagerId(driver, 192.168.112.130, 46202)
16/01/31 07:49:25 INFO storage.BlockManagerMaster: Registered BlockManager
16/01/31 07:49:26 INFO client.AppClient$ClientEndpoint: Executor added: app-20160131074925-0000/0 on worker-20160131071148-192.168.112.132-41059 (192.168.112.132:41059) with 1 cores
16/01/31 07:49:26 INFO cluster.SparkDeploySchedulerBackend: Granted executor ID app-20160131074925-0000/0 on hostPort 192.168.112.132:41059 with 1 cores, 1024.0 MB RAM
16/01/31 07:49:26 INFO client.AppClient$ClientEndpoint: Executor added: app-20160131074925-0000/1 on worker-20160131071148-192.168.112.133-43458 (192.168.112.133:43458) with 1 cores
16/01/31 07:49:26 INFO cluster.SparkDeploySchedulerBackend: Granted executor ID app-20160131074925-0000/1 on hostPort 192.168.112.133:43458 with 1 cores, 1024.0 MB RAM
16/01/31 07:49:27 INFO client.AppClient$ClientEndpoint: Executor updated: app-20160131074925-0000/1 is now RUNNING
16/01/31 07:49:30 INFO client.AppClient$ClientEndpoint: Executor updated: app-20160131074925-0000/0 is now RUNNING
16/01/31 07:49:33 INFO scheduler.EventLoggingListener: Logging events to hdfs://master1:9000/historyserverforSpark/app-20160131074925-0000
16/01/31 07:49:33 INFO cluster.SparkDeploySchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
16/01/31 07:49:33 INFO repl.SparkILoop: Created spark context..
Spark context available as sc.
16/01/31 07:49:43 INFO hive.HiveContext: Initializing execution hive, version 1.2.1
16/01/31 07:49:43 INFO client.ClientWrapper: Inspected Hadoop version: 2.6.0
16/01/31 07:49:43 INFO client.ClientWrapper: Loaded org.apache.hadoop.hive.shims.Hadoop23Shims for Hadoop version 2.6.0
16/01/31 07:49:49 INFO metastore.HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
16/01/31 07:49:49 INFO metastore.ObjectStore: ObjectStore, initialize called
16/01/31 07:49:51 INFO DataNucleus.Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
16/01/31 07:49:51 INFO DataNucleus.Persistence: Property datanucleus.cache.level2 unknown - will be ignored
16/01/31 07:49:52 WARN DataNucleus.Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
16/01/31 07:49:58 WARN DataNucleus.Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
16/01/31 07:50:03 INFO metastore.ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
16/01/31 07:50:05 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
16/01/31 07:50:05 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
16/01/31 07:50:10 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
16/01/31 07:50:10 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
16/01/31 07:50:10 INFO cluster.SparkDeploySchedulerBackend: Registered executor NettyRpcEndpointRef(null) (worker3:58956) with ID 1
16/01/31 07:50:10 INFO storage.BlockManagerMasterEndpoint: Registering block manager worker3:34011 with 517.4 MB RAM, BlockManagerId(1, worker3, 34011)
16/01/31 07:50:11 INFO metastore.MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY
16/01/31 07:50:11 INFO metastore.ObjectStore: Initialized ObjectStore
16/01/31 07:50:13 WARN metastore.ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
16/01/31 07:50:14 WARN metastore.ObjectStore: Failed to get database default, returning NoSuchObjectException
16/01/31 07:50:14 INFO metastore.HiveMetaStore: Added admin role in metastore
16/01/31 07:50:14 INFO metastore.HiveMetaStore: Added public role in metastore
16/01/31 07:50:15 INFO metastore.HiveMetaStore: No user is added in admin role, since config is empty
16/01/31 07:50:15 INFO metastore.HiveMetaStore: 0: get_all_databases
16/01/31 07:50:15 INFO HiveMetaStore.audit: ugi=root    ip=unknown-ip-addr    cmd=get_all_databases    
16/01/31 07:50:15 INFO metastore.HiveMetaStore: 0: get_functions: db=default pat=*
16/01/31 07:50:15 INFO HiveMetaStore.audit: ugi=root    ip=unknown-ip-addr    cmd=get_functions: db=default pat=*    
16/01/31 07:50:15 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged as "embedded-only" so does not have its own datastore table.
16/01/31 07:50:16 INFO session.SessionState: Created local directory: /tmp/root
16/01/31 07:50:16 INFO session.SessionState: Created local directory: /tmp/d79d8d0f-b021-4443-97aa-e9da5f65f9fe_resources
16/01/31 07:50:16 INFO session.SessionState: Created HDFS directory: /tmp/hive/root/d79d8d0f-b021-4443-97aa-e9da5f65f9fe
16/01/31 07:50:16 INFO session.SessionState: Created local directory: /tmp/root/d79d8d0f-b021-4443-97aa-e9da5f65f9fe
16/01/31 07:50:16 INFO session.SessionState: Created HDFS directory: /tmp/hive/root/d79d8d0f-b021-4443-97aa-e9da5f65f9fe/_tmp_space.db
16/01/31 07:50:17 INFO hive.HiveContext: default warehouse location is /user/hive/warehouse
16/01/31 07:50:17 INFO hive.HiveContext: Initializing HiveMetastoreConnection version 1.2.1 using Spark classes.
16/01/31 07:50:17 INFO client.ClientWrapper: Inspected Hadoop version: 2.6.0
16/01/31 07:50:17 INFO client.ClientWrapper: Loaded org.apache.hadoop.hive.shims.Hadoop23Shims for Hadoop version 2.6.0
16/01/31 07:50:20 INFO metastore.HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
16/01/31 07:50:20 INFO cluster.SparkDeploySchedulerBackend: Registered executor NettyRpcEndpointRef(null) (worker2:46076) with ID 0
16/01/31 07:50:20 INFO metastore.ObjectStore: ObjectStore, initialize called
16/01/31 07:50:20 INFO storage.BlockManagerMasterEndpoint: Registering block manager worker2:41924 with 517.4 MB RAM, BlockManagerId(0, worker2, 41924)
16/01/31 07:50:21 INFO DataNucleus.Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
16/01/31 07:50:21 INFO DataNucleus.Persistence: Property datanucleus.cache.level2 unknown - will be ignored
16/01/31 07:50:21 WARN DataNucleus.Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
16/01/31 07:50:21 WARN DataNucleus.Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
16/01/31 07:50:23 INFO metastore.ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
16/01/31 07:50:25 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
16/01/31 07:50:25 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
16/01/31 07:50:25 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
16/01/31 07:50:25 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
16/01/31 07:50:26 INFO DataNucleus.Query: Reading in results for query "org.datanucleus.store.rdbms.query.SQLQuery@0" since the connection used is closing
16/01/31 07:50:26 INFO metastore.MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY
16/01/31 07:50:26 INFO metastore.ObjectStore: Initialized ObjectStore
16/01/31 07:50:26 INFO metastore.HiveMetaStore: Added admin role in metastore
16/01/31 07:50:26 INFO metastore.HiveMetaStore: Added public role in metastore
16/01/31 07:50:26 INFO metastore.HiveMetaStore: No user is added in admin role, since config is empty
16/01/31 07:50:26 INFO metastore.HiveMetaStore: 0: get_all_databases
16/01/31 07:50:26 INFO HiveMetaStore.audit: ugi=root    ip=unknown-ip-addr    cmd=get_all_databases    
16/01/31 07:50:26 INFO metastore.HiveMetaStore: 0: get_functions: db=default pat=*
16/01/31 07:50:26 INFO HiveMetaStore.audit: ugi=root    ip=unknown-ip-addr    cmd=get_functions: db=default pat=*    
16/01/31 07:50:26 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged as "embedded-only" so does not have its own datastore table.
16/01/31 07:50:26 INFO session.SessionState: Created local directory: /tmp/83189bde-0f10-427f-8825-e634e5d0e1ff_resources
16/01/31 07:50:26 INFO session.SessionState: Created HDFS directory: /tmp/hive/root/83189bde-0f10-427f-8825-e634e5d0e1ff
16/01/31 07:50:26 INFO session.SessionState: Created local directory: /tmp/root/83189bde-0f10-427f-8825-e634e5d0e1ff
16/01/31 07:50:26 INFO session.SessionState: Created HDFS directory: /tmp/hive/root/83189bde-0f10-427f-8825-e634e5d0e1ff/_tmp_space.db
16/01/31 07:50:27 INFO repl.SparkILoop: Created sql context (with Hive support)..
SQL context available as sqlContext.

scala> 


测试spark的HA,在worker1上停止spark的master进程,在回到master1中查看上面的窗口,提示信息如下,说明Master节点已经切换到worker2上了,这里的切换不是瞬间切换的,因为有Worker,Application,Driver信息,新产生的Master要恢复这些信息。
scala> 16/01/31 08:08:35 WARN client.AppClient$ClientEndpoint: Connection to worker1:7077 failed; waiting for master to reconnect...
16/01/31 08:08:35 WARN cluster.SparkDeploySchedulerBackend: Disconnected from Spark cluster! Waiting for reconnection...
16/01/31 08:08:35 WARN client.AppClient$ClientEndpoint: Connection to worker1:7077 failed; waiting for master to reconnect...
16/01/31 08:09:14 INFO client.AppClient$ClientEndpoint: Master has changed, new master is at spark://worker2:7077


查看worker2:8080,Master(active)已经交个worker2了。



说明如果再将worker1上spark的Master进程启动,集群的Master(active)也不会交还给原先的worker1。因为spark集群的状态信息都是交给zookeeper来管理的,在每个Master(standby),被选举为Master(active),恢复的集群状态都是一样的。并且集群的切换需要的时间不同,是根据集群规模确定的。

在worker2上停止spark的master进程后Master(active)切换到master1上面了,我们用stop-slaves.sh命令停止所有的Worker,再次用start-slaves.sh命令启动所有的Worker,然后再浏览器中查看集群状态,发现集群也会将之前的节点信息保存下来,说明了zookeeper中保存了集群所有的Workers信息,所有的Applactions信息,所有的Driver信息;

到此spark的HA搭建完成!

成功是属于勤奋坚持和执着的人,加油!!!



  • 大小: 104.6 KB
  • 大小: 187.4 KB
  • 大小: 221.7 KB
  • 大小: 180.1 KB
  • 大小: 222.4 KB
  • 大小: 239.2 KB
分享到:
评论

相关推荐

    spark 分布式集群搭建

    ### Spark Standalone 分布式集群搭建详解 #### Spark Standalone 运行模式概述 Spark Standalone 是 Apache Spark 提供的一种自带的集群管理模式,主要用于管理 Spark 应用程序的执行环境。这种模式简单易用,适合...

    Spark学习笔记 (二)Spark2.3 HA集群的分布式安装图文详解

    Spark HA集群的分布式安装是指在多个节点上安装Spark集群,以便于提高Spark集群的高可用性和扩展性。分布式安装需要配置多个节点上的Spark集群,以便于实现高可用性和负载均衡。 六、Spark集群的启动和执行 在安装...

    spark最新集群搭建指南2017

    在2017年的Spark集群搭建中,主要涉及到以下几个核心知识点: 1. **Spark版本与Hadoop兼容性**:Spark 2.2.0版本是支持Hadoop 2.7的,但需要注意的是,从Spark 2.0版本开始,它才正式支持Hadoop 2.7。同时,该版本...

    spark yarn模式的搭建.docx

    搭建 Spark On Yarn 集群主要涉及三个组件的安装和配置:Zookeeper、Hadoop 和 Spark。下面将详细介绍这三个阶段的搭建过程。 一、Zookeeper 集群搭建 Zookeeper 是一个分布式的、开放源码的分布式应用程序协调服务...

    Spark环境搭建——standalone集群模式

    这篇博客,Alice为大家带来的是Spark集群环境搭建之——standalone集群模式。 文章目录集群角色介绍集群规划修改配置并分发启动和停止查看web界面测试 集群角色介绍  Spark是基于内存计算的大数据并行计算框架,...

    Spark2.3.0-Hadoop2.7.4集群部署

    - **Master/Worker模式**: Spark集群通常采用Master/Worker架构,其中Master节点负责任务调度,Worker节点执行具体计算任务。 - **部署方式**: 可选择Standalone模式或者YARN模式。本文重点介绍YARN模式下的部署。 ...

    Hadoop+Spark+Hive+HBase+Oozie+Kafka+Flume+Flink+ES+Redash等详细安装部署

    综上所述,搭建这样一个大数据集群需要深入了解每个组件的特性和配置要求,同时还需要具备一定的网络和系统管理知识。过程中可能遇到的问题包括网络配置、权限设置、依赖冲突等,解决这些问题通常需要查阅官方文档、...

    hadoop集群的搭建(apache)

    本文将深入探讨如何搭建一个Hadoop集群,以及MapReduce的基本原理和应用实例。 首先,我们要理解Hadoop的核心组件:HDFS(Hadoop Distributed File System)和MapReduce。HDFS是一种分布式文件系统,它允许数据在多...

    Hadoop集群搭建详细简明教程

    随着业务的发展,你可能还需要考虑添加更多节点,实现HA(High Availability)以提高服务可用性,或者引入更先进的数据处理框架如Spark,以提升计算效率。 总之,“Hadoop集群搭建详细简明教程”将引导你完成从零到...

    cdh5.5.4 集群搭建 【自动化脚本+hadoop-ha,yarn-ha,zk,hbase,hive,flume,kafka,spark】

    cdh5.5.4 集群搭建 【自动化脚本+hadoop-ha,yarn-ha,zk,hbase,hive,flume,kafka,spark】全套高可用环境搭建,还有自动化启动脚本。只需要复制粘贴命令,就可以完成。3台机器。相关资源可以留言发邮件,我发资料。cdh...

    linux 搭建 高性能集群

    例如,Hadoop、Spark等大数据处理框架在集群环境下表现出色。 综上所述,搭建Linux高性能集群是一个涉及多个层面的技术挑战,需要综合考虑硬件、软件、网络以及管理策略。文件“马路遥--搭建高性能集群c.pdf”很...

    hadoop集群搭建以及大数据平台搭建

    本文将详细介绍如何搭建Hadoop完全分布式和高可用集群,并涵盖相关的大数据平台搭建,如Flink、Flume、Hive、MySQL、Spark集群以及Spark高可用配置,以及Sqoop的安装和配置。 首先,我们从Hadoop开始。Hadoop由HDFS...

    Hadoop-2.5.2 HA集群搭建

    按照文档操作可安装7个节点的大数据集群,包括hadoop,hive,hbase,spark,tez,flume,kafka等等,不技术自动化运维及监控

    spark 高可用安装文档

    为了确保Spark集群的稳定性和可靠性,高可用性(HA)成为了构建Spark集群时必须考虑的关键因素之一。本文档将详细介绍如何在CentOS 6.5环境下搭建一个高可用性的Spark集群。 #### 二、准备工作 在开始部署之前,请...

    spark环境安装(Hadoop HA+Hbase+phoneix+kafka+flume+zookeeper+spark+scala)

    本项目旨在搭建一套完整的Spark集群环境,包括Hadoop HA(高可用)、HBase、Phoenix、Kafka、Flume、Zookeeper以及Scala等多个组件的集成。这样的环境适用于大规模的数据处理与分析任务,能够有效地支持实时数据流...

    Spark环境搭建——HA高可用模式

    - **启动Spark集群**:在高可用模式下,需要先在任意一台Master节点上启动所有服务,然后在其他Master节点上分别启动Master服务。 在集群运行后,你可以通过访问Web UI来监控Master节点的状态。在主Master故障时,...

    Hadoop集群搭建共10页.pdf.zip

    【标题】"Hadoop集群搭建共10页.pdf.zip" 提供的是关于Hadoop集群建设的详细教程,可能涵盖了从规划、配置到实际操作的全过程。Hadoop是Apache基金会的一个开源分布式计算框架,广泛应用于大数据处理领域。它通过...

    大数据相关搭建笔记.zip

    `spark集群搭建.docx`和`单个spark集群搭建.docx`文件将涵盖Spark集群的安装和配置过程,包括Spark Standalone、Mesos或YARN等不同的部署模式。 5. **Sqoop**:用于在Hadoop和传统的关系型数据库管理系统之间导入...

    Hadoop集群搭建及Hive的安装与使用

    例如,为Hadoop集群配置HA(High Availability)以确保服务的稳定性,通过调整配置参数提升Hive的查询速度,以及设置合适的权限控制以保护数据安全。 总之,Hadoop集群的搭建和Hive的使用是大数据处理的基础。通过...

    hadoop2.2.0集群搭建

    ### Hadoop 2.2.0 集群搭建详细指南 #### 一、环境配置与准备工作 在开始搭建 Hadoop 2.2.0 的集群之前,我们需要确保所有节点都处于良好的工作状态,并完成一系列的基础环境配置。具体步骤如下: 1. **更新 ...

Global site tag (gtag.js) - Google Analytics