Ip |
主机名 |
程序 |
进程 |
192.168.128.11 |
h1 |
Jdk Hadoop hbase |
Namenode DFSZKFailoverController Hamster |
192.168.128.12 |
h2 |
Jdk Hadoop hbase |
Namenode DFSZKFailoverController Hamster |
192.168.128.13 |
h3 |
Jdk Hadoop |
resourceManager |
192.168.128.14 |
h4 |
Jdk Hadoop
|
resourceManager |
192.168.128.15 |
h5 |
Jdk Hadoop Zookeeper Hbase |
Datanode nodeManager JournalNode QuorumPeerMain HRegionServer |
192.168.128.16 |
h6 |
Jdk Hadoop Zookeeper Hbase |
Datanode nodeManager JournalNode QuorumPeerMain HRegionServer |
192.168.128.17 |
h7 |
Jdk Hadoop Zookeeper hbase |
Datanode nodeManager JournalNode QuorumPeerMain HRegionServer |
关于准备工作 我这里就不一一写出来了,总结一下有主机名,ip,主机名和ip的映射关系,防火墙,ssh免密码,jdk的安装及环境变量的设置。
安装zookeeper到 h5、h6、h7上面
修改 /home/zookeeper-3.4.8/conf的zoo_sample.cfg
cp zoo_sample.cfg zoo.cfg
# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just
# example sakes.
dataDir=/home/zookeeper-3.4.8/data
# the port at which the clients will connect
clientPort=2181
# the maximum number of client connections.
# increase this if you need to handle more clients
#maxClientCnxns=60
#
# Be sure to read the maintenance section of the
# administrator guide before turning on autopurge.
#
# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
#
# The number of snapshots to retain in dataDir
#autopurge.snapRetainCount=3
# Purge task interval in hours
# Set to "0" to disable auto purge feature
#autopurge.purgeInterval=1
server.1=h5:2888:3888
server.2=h6:2888:3888
server.3=h7:2888:3888
创建 data文件夹 和在里面 创建文件myid 并写入数字1
touch data/myid
echo 1 > data/myid
拷贝整个zookeeper到另外两个节点上
scp -r /home/zookeeper-3.4.8 h6:/home/
scp -r /home/zookeeper-3.4.8 h7:/home/
其他两个节点的myid 修改为 2 3
安装hadoop
/home/hadoop-2.7.2/etc/Hadoop
hadoop-env.sh:
export JAVA_HOME=/home/jdk
core-site.xml:
<configuration>
<!-- 指定hdfs的nameservice为masters -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://masters</value>
</property>
<!-- 指定hadoop临时目录 -->
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop-2.7.2/tmp</value>
</property>
<!-- 指定zookeeper地址 -->
<property>
<name>ha.zookeeper.quorum</name>
<value>h5:2181,h6:2181,h7:2181</value>
</property>
</configuration>
hdfs-site.xml:
<configuration>
<!--指定hdfs的nameservice为masters,需要和core-site.xml中的保持一致 -->
<property>
<name>dfs.nameservices</name>
<value>masters</value>
</property>
<!-- h1下面有两个NameNode,分别是h1,h2 -->
<property>
<name>dfs.ha.namenodes.masters</name>
<value>h1,h2</value>
</property>
<!-- h1的RPC通信地址 -->
<property>
<name>dfs.namenode.rpc-address.masters.h1</name>
<value>h1:9000</value>
</property>
<!-- h1的http通信地址 -->
<property>
<name>dfs.namenode.http-address.masters.h1</name>
<value>h1:50070</value>
</property>
<!-- h2的RPC通信地址 -->
<property>
<name>dfs.namenode.rpc-address.masters.h2</name>
<value>h2:9000</value>
</property>
<!-- h2的http通信地址 -->
<property>
<name>dfs.namenode.http-address.masters.h2</name>
<value>h2:50070</value>
</property>
<!-- 指定NameNode的元数据在JournalNode上的存放位置 -->
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://h5:8485;h6:8485;h7:8485/masters</value>
</property>
<!-- 指定JournalNode在本地磁盘存放数据的位置 -->
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/home/hadoop-2.7.2/journal</value>
</property>
<!-- 开启NameNode失败自动切换 -->
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<!-- 配置失败自动切换实现方式 -->
<property>
<name>dfs.client.failover.proxy.provider.masters</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<!-- 配置隔离机制方法,多个机制用换行分割,即每个机制暂用一行-->
<property>
<name>dfs.ha.fencing.methods</name>
<value>
sshfence
shell(/bin/true)
</value>
</property>
<!-- 使用sshfence隔离机制时需要ssh免登陆 -->
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/root/.ssh/id_rsa</value>
</property>
<!-- 配置sshfence隔离机制超时时间 -->
<property>
<name>dfs.ha.fencing.ssh.connect-timeout</name>
<value>30000</value>
</property>
</configuration>
mapred-site.xml:
<configuration>
<!-- 指定mr框架为yarn方式 -->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
yarn-site.xml:
<configuration>
<!-- 开启RM高可靠 -->
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<!-- 指定RM的cluster id -->
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>RM_HA_ID</value>
</property>
<!-- 指定RM的名字 -->
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>
<!-- 分别指定RM的地址 -->
<property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>h3</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>h4</value>
</property>
<property>
<name>yarn.resourcemanager.recovery.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.store.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
</property>
<!-- 指定zk集群地址 -->
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>h5:2181,h6:2181,h7:2181</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
Slaves:
h5
h6
h7
然后 拷贝到其他节点
scp -r hadoop-2.7.2 h2:/home/ 等等
这个地方说明一下 yarn 的HA 是在 h3和h4 上面
启动顺序
###注意:严格按照下面的步骤
1. 启动zookeeper集群
[root@h6 ~]# cd /home/zookeeper-3.4.8/bin/
[root@h6 bin]# ./zkServer.sh start
H5 h6 h7 都一样
[root@h6 bin]# ./zkServer.sh status
查看状态
2. 启动journalnode
[root@h5 bin]# cd /home/hadoop-2.7.2/sbin/
[root@h5 sbin]# ./hadoop-daemons.sh start journalnode
h5: starting journalnode, logging to /home/hadoop-2.7.2/logs/hadoop-root-journalnode-h5.out
h7: starting journalnode, logging to /home/hadoop-2.7.2/logs/hadoop-root-journalnode-h7.out
h6: starting journalnode, logging to /home/hadoop-2.7.2/logs/hadoop-root-journalnode-h6.out
[root@h5 sbin]# jps
2420 JournalNode
2309 QuorumPeerMain
2461 Jps
[root@h5 sbin]# ^C
3. 格式化HDFS
在h1上执行命令:
hdfs namenode -format
格式化后会在根据core-site.xml中的hadoop.tmp.dir配置生成个文件
拷贝tmp 到 h2
[root@h1 hadoop-2.7.2]# scp -r tmp/ h2:/home/hadoop-2.7.2/
4. 格式化ZK(在h1上执行即可)
[root@h1 hadoop-2.7.2]# hdfs zkfc -formatZK
5. 启动HDFS(在h1上执行)
[root@h1 hadoop-2.7.2]# sbin/start-dfs.sh
16/02/25 05:01:14 WARN hdfs.DFSUtil: Namenode for ns1 remains unresolved for ID null. Check your hdfs-site.xml file to ensure namenodes are configured properly.
16/02/25 05:01:14 WARN hdfs.DFSUtil: Namenode for ns2 remains unresolved for ID null. Check your hdfs-site.xml file to ensure namenodes are configured properly.
16/02/25 05:01:14 WARN hdfs.DFSUtil: Namenode for ns3 remains unresolved for ID null. Check your hdfs-site.xml file to ensure namenodes are configured properly.
Starting namenodes on [h1 h2 masters masters masters]
masters: ssh: Could not resolve hostname masters: Name or service not known
masters: ssh: Could not resolve hostname masters: Name or service not known
masters: ssh: Could not resolve hostname masters: Name or service not known
h2: starting namenode, logging to /home/hadoop-2.7.2/logs/hadoop-root-namenode-h2.out
h1: starting namenode, logging to /home/hadoop-2.7.2/logs/hadoop-root-namenode-h1.out
h5: starting datanode, logging to /home/hadoop-2.7.2/logs/hadoop-root-datanode-h5.out
h7: starting datanode, logging to /home/hadoop-2.7.2/logs/hadoop-root-datanode-h7.out
h6: starting datanode, logging to /home/hadoop-2.7.2/logs/hadoop-root-datanode-h6.out
Starting journal nodes [h5 h6 h7]
h5: journalnode running as process 2420. Stop it first.
h6: journalnode running as process 2885. Stop it first.
h7: journalnode running as process 2896. Stop it first.
Starting ZK Failover Controllers on NN hosts [h1 h2 masters masters masters]
masters: ssh: Could not resolve hostname masters: Name or service not known
masters: ssh: Could not resolve hostname masters: Name or service not known
masters: ssh: Could not resolve hostname masters: Name or service not known
h2: starting zkfc, logging to /home/hadoop-2.7.2/logs/hadoop-root-zkfc-h2.out
h1: starting zkfc, logging to /home/hadoop-2.7.2/logs/hadoop-root-zkfc-h1.out
[root@h1 hadoop-2.7.2]#
6. 启动YARN(是在h3上执行start-yarn.sh,把namenode和resourcemanager分开是因为性能问题,因为他们都要占用大量资源,所以把他们分开了,他们分开了就要分别在不同的机器上启动)
[root@h3 sbin]# ./start-yarn.sh
[root@h4 sbin]# ./yarn-daemons.sh start resourcemanager
验证:
http://192.168.128.11:50070
Overview 'h1:9000' (active)
http://192.168.128.12:50070
Overview 'h2:9000' (standby)
上传文件
[root@h4 bin]# hadoop fs -put /etc/profile /profile
[root@h4 bin]# hadoop fs -ls
ls: `.': No such file or directory
[root@h4 bin]# hadoop fs -ls /
Found 1 items
-rw-r--r-- 3 root supergroup 1814 2016-02-26 19:08 /profile
[root@h4 bin]#
杀死h1
[root@h1 sbin]# jps
2480 NameNode
2868 Jps
2775 DFSZKFailoverController
[root@h1 sbin]# kill -9 2480
[root@h1 sbin]# jps
2880 Jps
2775 DFSZKFailoverController
[root@h1 sbin]# hadoop fs -ls /
Found 1 items
-rw-r--r-- 3 root supergroup 1814 2016-02-26 19:08 /profile
此时 h2 变为active
手动启动 h1的 namenode
[root@h1 sbin]# ./hadoop-daemon.sh start namenode
starting namenode, logging to /home/hadoop-2.7.2/logs/hadoop-root-namenode-h1.out
[root@h1 sbin]# hadoop jar /home/hadoop-2.7.2/s
观察 h1 状态为standby
验证yarn
[root@h1 sbin]# hadoop jar /home/hadoop-2.7.2/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar wordcount /profile /out
16/02/26 19:14:23 INFO input.FileInputFormat: Total input paths to process : 1
16/02/26 19:14:23 INFO mapreduce.JobSubmitter: number of splits:1
16/02/26 19:14:23 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1456484773347_0001
16/02/26 19:14:24 INFO impl.YarnClientImpl: Submitted application application_1456484773347_0001
16/02/26 19:14:24 INFO mapreduce.Job: The url to track the job: http://h3:8088/proxy/application_1456484773347_0001/
16/02/26 19:14:24 INFO mapreduce.Job: Running job: job_1456484773347_0001
16/02/26 19:14:49 INFO mapreduce.Job: Job job_1456484773347_0001 running in uber mode : false
16/02/26 19:14:49 INFO mapreduce.Job: map 0% reduce 0%
16/02/26 19:15:05 INFO mapreduce.Job: map 100% reduce 0%
16/02/26 19:15:22 INFO mapreduce.Job: map 100% reduce 100%
16/02/26 19:15:23 INFO mapreduce.Job: Job job_1456484773347_0001 completed successfully
16/02/26 19:15:23 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=2099
FILE: Number of bytes written=243781
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=1901
HDFS: Number of bytes written=1470
HDFS: Number of read operations=6
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=13014
Total time spent by all reduces in occupied slots (ms)=13470
Total time spent by all map tasks (ms)=13014
Total time spent by all reduce tasks (ms)=13470
Total vcore-milliseconds taken by all map tasks=13014
Total vcore-milliseconds taken by all reduce tasks=13470
Total megabyte-milliseconds taken by all map tasks=13326336
Total megabyte-milliseconds taken by all reduce tasks=13793280
Map-Reduce Framework
Map input records=80
Map output records=256
Map output bytes=2588
Map output materialized bytes=2099
Input split bytes=87
Combine input records=256
Combine output records=156
Reduce input groups=156
Reduce shuffle bytes=2099
Reduce input records=156
Reduce output records=156
Spilled Records=312
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=395
CPU time spent (ms)=4100
Physical memory (bytes) snapshot=298807296
Virtual memory (bytes) snapshot=4201771008
Total committed heap usage (bytes)=138964992
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=1814
File Output Format Counters
Bytes Written=1470
[root@h1 sbin]# hadoop fs -ls /
Found 3 items
drwxr-xr-x - root supergroup 0 2016-02-26 19:15 /out
-rw-r--r-- 3 root supergroup 1814 2016-02-26 19:08 /profile
drwx------ - root supergroup 0 2016-02-26 19:14 /tmp
[root@h1 sbin]#
Hadoop ha 集群搭建完成
安装hbase
hbase-env.sh:
export JAVA_HOME=/home/jdk
export HBASE_MANAGES_ZK=false
hbase-site.xml:
<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://h1:9000/hbase</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.master</name>
<value>h1:60000</value>
</property>
<property>
<name>hbase.master.port</name>
<value>60000</value>
<description>The port master should bind to.</description>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>h5,h6,h7</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
</configuration>
注意:$HBASE_HOME/conf/hbase-site.xml的hbase.rootdir的主机和端口号与$HADOOP_HOME/conf/core-site.xml的fs.default.name的主机和端口号一致
Regionservers:内容为:
h5
h6
h7
复制到h2 h5,h6,h7上面
整个启动顺序
按照上面启动hadoop ha 的顺序 先启动好
然后在h1,h2上启动hbase
./start-hbase.sh
测试进入 hbase
[root@h1 bin]# hbase shell
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/hbase-1.2.0/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/hadoop-2.7.2/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 1.2.0, r25b281972df2f5b15c426c8963cbf77dd853a5ad, Thu Feb 18 23:01:49 CST 2016
hbase(main):001:0> esit
NameError: undefined local variable or method `esit' for #<Object:0x7ad1caa2>
hbase(main):002:0> exit
至此全部结束。
相关推荐
Docker(Hadoop_3.3.1+HBase_2.4.16+Zookeeper_3.7.1+Hive_3.1.3 )配置文件 搭建集群环境
《Hadoop 2.7.2与HBase的集成——深入理解hadoop-2.7.2-hbase-jar.tar.gz》 Hadoop是Apache软件基金会的一个开源项目,它为大规模数据处理提供了一个分布式计算框架。Hadoop的核心包括HDFS(Hadoop Distributed ...
用户可以在这里下载到二进制和源码两种形式的包,用于安装、配置和开发基于Hadoop的应用。 压缩包子文件的文件名称列表中: 1. "hadoop-2.7.2 (1).tar.gz" 这是Hadoop 2.7.2的预编译二进制版本,包含了运行Hadoop所...
Hadoop-2.2.0+Hbase-0.96.2+Hive-0.13.1分布式整合,Hadoop-2.X使用HA方式
标题 "hadoop-2.7.2-hbase-jar.zip" 暗示这是一个与Hadoop和HBase相关的归档文件,其中包含了HBase的JAR文件。Hadoop是Apache软件基金会开发的一个开源分布式计算框架,它使得在大规模数据集上进行计算成为可能。...
6. **Linux环境下的配置**:在Linux上部署HBase,需要安装Java环境,配置Hadoop和HBase的环境变量,以及正确设置HBase的配置文件如`hbase-site.xml`。 7. **HBase的应用场景**:HBase常用于实时分析、日志处理、...
提供的文档`hadoop_zookeeper_hbase集群配置.docx`应包含详细的步骤和配置示例,而`配置文件.rar`则可能包含了预设的配置模板,可以作为配置参考。在实际操作时,务必根据具体环境调整配置,确保所有节点之间的网络...
jdk1.8.0_131、apache-zookeeper-3.8.0、hadoop-3.3.2、hbase-2.4.12 mysql5.7.38、mysql jdbc驱动mysql-connector-java-8.0.8-dmr-bin.jar、 apache-hive-3.1.3 2.本文软件均安装在自建的目录/export/server/下 ...
此文以命令行+截图的形式详细的记录了Hadoop-2.6.4+Zookeeper-3.4.9+Hbase-1.2.4分布式开发平台的环境配置过程,希望能对大家有所帮助。
Hadoop-eclipse-plugin-2.7.2正是为了解决这个问题,它为Eclipse提供了与Hadoop集群无缝对接的功能,使得开发者可以在熟悉的Eclipse环境中编写、调试和运行Hadoop MapReduce程序。 首先,让我们深入了解Hadoop-...
hadoop-2.7.2-connon.jar,重新编译了其中的NativeIO,可以用在windows下,不会报UnsatisfiedLinkedError了
Hadoop HA高可用集群搭建(Hadoop+Zookeeper+HBase) 一、Hadoop HA高可用集群概述 在大数据处理中,高可用集群是非常重要的,Hadoop HA高可用集群可以提供高可靠性和高可用性,确保数据处理不中断。该集群由...
解压hbase-1.2.0-bin.tar.gz后,需要修改hbase-site.xml,设置HBase的主节点地址、Zookeeper地址以及数据存储路径。同时,需要确保Hadoop和HBase的版本兼容,并且在Hadoop环境正常运行的基础上进行HBase的安装和配置...
为方便网络不畅通的同学下载,故上传至此无需积分,由于资源冲突于是又重新压缩,解压后就是hadoop-2.7.2.tar.gz。Hadoop是一个由Apache基金会所开发的分布式系统基础架构。用户可以在不了解分布式底层细节的情况下...
hadoop2.7.2gz文件
hadoop-eclipse-plugin-2.7.2.jar,编译环境win10-64,ant-1.9.6,eclipse-4.5.2(4.5.0可用,其他未测),hadoop-2.7.2
在本文中,我们将深入探讨如何在CentOS-6.4 64位操作系统上配置一个基于Hadoop 2.2.0、HBase 0.96和Zookeeper 3.4.5的分布式环境。这个过程涉及到多个步骤,包括系统设置、软件安装、配置以及服务启动。 首先,为了...
在搭建Hadoop-2.4.0集群时,首先需要确保系统已安装必要的依赖包和工具,包括Maven 3.0或更高版本,Findbugs 1.3.9(如果要运行findbugs),ProtocolBuffer 2.5.0以及CMake 2.6或更新版本(如果要编译本地代码)。...
hadoop-common-2.7.2.jar