编写不易,转载请注明(http://shihlei.iteye.com/blog/2066627)!
一 概述
公司使用CDH4的环境,Job运行时环境选择的是MRV1,网络上搭建CDH4.4 HDFS ,MRV1 HA环境的资料非常少。尝试搭建,并将过程记录于《Hadoop_CDH4.4.0_MRV1_CDH4.2.2_安装手册_v0.2》;
二 规划
环境:
组件名 |
版本 |
说明 |
JRE |
java version "1.7.0_25" Java(TM) SE Runtime Environment (build 1.7.0_25-b15) |
|
Hadoop |
hadoop-2.0.0-cdh4.4.0.tar.gz |
主程序包 (http://archive.cloudera.com/cdh4/cdh/4/) |
MRV1 |
mr1-2.0.0-mr1-cdh4.2.2.tar.gz 对应版本:hadoop-2.0.0-mr1-cdh4.2.2 |
MRV1包 |
Zookeeper |
zookeeper-3.4.5.tar.gz |
NameNode,JobTracker HA自动热切使用的协调服务 |
主机:
IP |
Host |
部署模块 |
进程 |
8.8.8.11 |
Hadoop-NN-01 |
NameNode JobTracker |
DFSZKFailoverController JobTrackerHADaemon MRZKFailoverController |
8.8.8.12 |
Hadoop-NN-02 |
NameNode JobTracker |
NameNode DFSZKFailoverController JobTrackerHADaemon MRZKFailoverController |
8.8.8.13 |
Hadoop-DN-01 Zookeeper-01 |
DataNode TaskTracker Zookeeper |
TaskTracker JournalNode QuorumPeerMain |
8.8.8.14 |
Hadoop-DN-02 Zookeeper-02 |
DataNode TaskTracker Zookeeper |
DataNode TaskTracker JournalNode QuorumPeerMain |
8.8.8.15 |
Hadoop-DN-03 |
DataNode TaskTracker Zookeeper |
DataNode TaskTracker JournalNode QuorumPeerMain |
备注:
- NameNode
- JobTracker
- DFSZKFC:DFS Zookeeper Failover Controller 激活Standby NameNode
- MRZKFC:MR Zookeeper Failover Controller 激活 Standby JobTracker
- DataNode
- TaskTracker
- JournalNode:NameNode共享editlog结点服务(如果使用NFS共享,则该进程和所有启动相关配置接可省略)。
- QuorumPeerMain:Zookeeper主进程
目录:
名称 |
路径 |
(MRV1) |
|
MRV1 Data |
$ HADOOP_HOME/data |
MRV1 Log |
$ HADOOP_HOME/logs |
$HADOOP_PREFIX HDFS |
|
HDFS Data |
$ HADOOP_HOME/data |
HDFS Log |
$ HADOOP_HOME/logs |
三 详细安装
(一) 环境准备
1)关闭防火墙:service iptables stop
2)安装JRE:略
3)安装Zookeeper:(见附件)
4)配置Host文件(root用户):vi /etc/hosts
内容:
8.8.8.11 Hadoop-NN-01 8.8.8.12 Hadoop-NN-02 8.8.8.13 Hadoop-DN-01 Zookeeper-01 8.8.8.14 Hadoop-DN-02 Zookeeper-02 8.8.8.15 Hadoop-DN-03 Zookeeper-03 |
ssh-copy-id -i ~/.ssh/id_rsa.pub puppet@Hadoop-NN-02 ssh-copy-id -i ~/.ssh/id_rsa.pub puppet@Hadoop-DN-01 ssh-copy-id -i ~/.ssh/id_rsa.pub puppet@Hadoop-DN-02 ssh-copy-id -i ~/.ssh/id_rsa.pub puppet@Hadoop-DN-03 |
6)配置环境变量:vi ~/.bashrc
内容:
#Hadoop CDH4 export HADOOP_HOME=/home/puppet/hadoop/cdh4.2.2/hadoop-2.0.0-mr1-cdh4.2.2 export HADOOP_PREFIX=/home/puppet/hadoop/cdh4.4.0/hadoop-2.0.0-cdh4.4.0 export PATH=$PATH:$HADOOP_PREFIX/bin:$HADOOP_PREFIX/sbin:$HADOOP_HOME/bin |
(二)HDFS
1)解压:tar -xvf hadoop-2.0.0-cdh4.4.0.tar.gz
2)配置hadoop-env.sh:vi $HADOOP_HOME/etc/hadoop/hadoop-env.sh
添加:export JAVA_HOME=/usr/java/jdk1.7.0_25
3)配置core-site.xml:vi $HADOOP_HOME/etc/hadoop/hadoop-env.sh
<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <!--基本目录设置 --> <property> <!--默认FS,这里使用HA配置,填写的是hdfs nameservice逻辑名 --> <name>fs.defaultFS</name> <value>hdfs://mycluster</value> </property> <property> <!--配置临时目录 --> <name>hadoop.tmp.dir</name> <value>/home/puppet/hadoop/cdh4.4.0/hadoop-2.0.0-cdh4.4.0/data/tmp</value> </property> <!--==============================Trash机制======================================= --> <property> <!--多长时间创建CheckPoint NameNode截点上运行的CheckPointer 从Current文件夹创建CheckPoint;默认:0 由fs.trash.interval项指定 --> <name>fs.trash.checkpoint.interval</name> <value>0</value> </property> <property> <!--多少分钟.Trash下的CheckPoint目录会被删除,该配置服务器设置优先级大于客户端,默认:0 不删除 --> <name>fs.trash.interval</name> <value>1440</value> </property> </configuration>
4)配置hdfs-site.xml:vi $HADOOP_HOME/etc/hadoop/hadoop-env.sh
<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <!--数据目录配置 --> <property> <!--namenode 存放元数据和编辑日志地址 --> <name>dfs.name.dir</name> <value>/home/puppet/hadoop/cdh4.4.0/hadoop-2.0.0-cdh4.4.0/data/dfs/name</value> </property> <property> <name>dfs.name.edits.dir</name> <value>/home/puppet/hadoop/cdh4.4.0/hadoop-2.0.0-cdh4.4.0/data/editlog</value> </property> <property> <!--datanode 存放块地址 --> <name>dfs.data.dir</name> <value>/home/puppet/hadoop/cdh4.4.0/hadoop-2.0.0-cdh4.4.0/data/dfs/dn</value> </property> <property> <!--块副本数 --> <name>dfs.replication</name> <value>1</value> </property> <!--===============================HSFS HA======================================= --> <!--MR1设置名为 fs.default.name --> <property> <name>fs.default.name</name> <value>hdfs://mycluster</value> </property> <!--nameservices逻辑名 --> <property> <name>dfs.nameservices</name> <value>mycluster</value> </property> <property> <!--设置NameNode IDs 此版本最大只支持两个NameNode --> <name>dfs.ha.namenodes.mycluster</name> <value>nn1,nn2</value> </property> <!-- Hdfs HA: dfs.namenode.rpc-address.[nameservice ID] rpc 通信地址 --> <property> <name>dfs.namenode.rpc-address.mycluster.nn1</name> <value>Hadoop-NN-01:8020</value> </property> <property> <name>dfs.namenode.rpc-address.mycluster.nn2</name> <value>Hadoop-NN-02:8020</value> </property> <!-- Hdfs HA: dfs.namenode.http-address.[nameservice ID] http 通信地址 --> <property> <name>dfs.namenode.http-address.mycluster.nn1</name> <value>Hadoop-NN-01:50070</value> </property> <property> <name>dfs.namenode.http-address.mycluster.nn2</name> <value>Hadoop-NN-02:50070</value> </property> <!--==================NameNode auto failover base ZKFC and Zookeeper====================== --> <!--开启基于Zookeeper及ZKFC进程的自动备援设置,监视进程是否死掉 --> <property> <name>dfs.ha.automatic-failover.enabled</name> <value>true</value> </property> <property> <name>ha.zookeeper.quorum</name> <value>Zookeeper-01:2181,Zookeeper-02:2181,Zookeeper-03:2181</value> </property> <property> <!--指定ZooKeeper超时间隔,单位毫秒 --> <name>ha.zookeeper.session-timeout.ms</name> <value>2000</value> </property> <!--==================Namenode fencing:=============================================== --> <!--Failover后防止停掉的Namenode启动,造成两个服务 --> <property> <name>dfs.ha.fencing.methods</name> <value>sshfence</value> </property> <property> <name>dfs.ha.fencing.ssh.private-key-files</name> </property> <!--==================Namenode editlog同步 ============================================ --> <!--保证数据恢复 --> <property> <name>dfs.journalnode.http-address</name> <value>0.0.0.0:8480</value> </property> <property> <name>dfs.journalnode.rpc-address</name> <value>0.0.0.0:8485</value> </property> <property> <!--设置JournalNode服务器地址,QuorumJournalManager 用于存储editlog --> <!--格式:qjournal://<host1:port1>;<host2:port2>;<host3:port3>/<journalId> 端口同journalnode.rpc-address --> <name>dfs.namenode.shared.edits.dir</name> <value>qjournal://Hadoop-DN-01:8485;Hadoop-DN-02:8485;Hadoop-DN-03:8485/mycluster</value> </property> <property> <!--JournalNode存放数据地址 --> <name>dfs.journalnode.edits.dir</name> <value>/home/puppet/hadoop/cdh4.4.0/hadoop-2.0.0-cdh4.4.0/data/dfs/jn</value> </property> <!--==================DataNode editlog同步 ============================================ --> <property> <!--DataNode,Client连接Namenode识别选择Active NameNode策略 --> <name>dfs.client.failover.proxy.provider.mycluster</name> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value> </property> </configuration>
5)配置salves:vi $HADOOP_HOME/etc/hadoop/slaves
Hadoop-DN-01 Hadoop-DN-02 Hadoop-DN-03 |
6)分发程序:
scp -r /home/puppet/hadoop/cdh4.4.0/hadoop-2.0.0-cdh4.4.0 puppet@Hadoop-NN-02: /home/puppet/hadoop/cdh4.4.0/ scp -r /home/puppet/hadoop/cdh4.4.0/hadoop-2.0.0-cdh4.4.0 puppet@Hadoop-DN-01: /home/puppet/hadoop/cdh4.4.0/ scp -r /home/puppet/hadoop/cdh4.4.0/hadoop-2.0.0-cdh4.4.0 puppet@Hadoop-DN-02: /home/puppet/hadoop/cdh4.4.0/ scp -r /home/puppet/hadoop/cdh4.4.0/hadoop-2.0.0-cdh4.4.0 puppet@Hadoop-DN-02: /home/puppet/hadoop/cdh4.4.0/ |
7)启动JournalNode:
在各个JournalNode(Hadoop-DN-01,Hadoop-DN-02,Hadoop-DN-03):hadoop-daemon.sh start journalnode
8)NameNode格式化:
结点Hadoop-NN-01:hdfs namenode -format
9)初始化zkfc:
结点Hadoop-NN-01:hdfs zkfc -formatZK
10)启动hdfs
命令所在目录:/home/puppet/hadoop/cdh4.4.0/hadoop-2.0.0-cdh4.4.0/sbin/start-dfs.sh
11)验证:
进程:
a)NameNode
[puppet@BigData-01 ~]$ jps 4001 NameNode 4290 DFSZKFailoverController 4415 Jps |
b)DataNode
[puppet@BigData-03 ~]$ jps 25918 QuorumPeerMain 19217 JournalNode 19143 DataNode 19351 Jps |
页面:
a)ActiveNameNode:Hadoop-NN-01:50070
b)StandbyNameNode:Hadoop-NN-02:50070
(三)MapReduce
1)解压:tar -xvf mr1-2.0.0-mr1-cdh4.2.2.tar.gz
2)配置hadoop-env.sh:$HADOOP_HOME/conf/hadoop-env.sh
添加:export JAVA_HOME=/usr/java/jdk1.7.0_25
3)配置mapred-site.xml:$HADOOP_HOME/conf/mapred-site.xml
<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <!--数据目录配置 --> <property> <!--TaskTracker存放临时数据和Map中间输出数据目录,建议JBOP模式 --> <name>mapred.local.dir</name> <value>/home/puppet/hadoop/cdh4.2.2/hadoop-2.0.0-mr1-cdh4.2.2/data/mapred/local</value> </property> <!--===============================JobTracker HA======================================= --> <property> <!--JobTracker逻辑名,此处用于HA配置 --> <name>mapred.job.tracker</name> <value>logicaljt</value> </property> <property> <!--是否覆盖最近Active JobTracker运行的Jobs,默认False;HA:必须True --> <name>mapred.jobtracker.restart.recover</name> <value>true</value> </property> <property> <!--Job状态是否在HDFS上持久化,默认False;HA:必须True --> <name>mapred.job.tracker.persist.jobstatus.active</name> <value>true</value> </property> <property> <!--Job状态在HDFS上保留多少小时,默认0;HA:必须>0 --> <name>mapred.job.tracker.persist.jobstatus.hours</name> <value>1</value> </property> <property> <!--HDFS存储Job 状态位置,必须存在切mapred用户拥有 --> <name>mapred.job.tracker.persist.jobstatus.dir</name> <value>/home/puppet/hadoop/cdh4.2.2/hadoop-2.0.0-mr1-cdh4.2.2/data/jobsInfo</value> </property> <property> <name>mapred.jobtrackers.logicaljt</name> <value>jt1,jt2</value> </property> <!-- JobTracker HA: mapred.jobtracker.rpc-address.[nameservice ID] rpc 通信地址 --> <property> <name>mapred.jobtracker.rpc-address.logicaljt.jt1</name> <value>Hadoop-NN-01:8021</value> </property> <property> <name>mapred.jobtracker.rpc-address.logicaljt.jt2</name> <value>Hadoop-NN-02:8021</value> </property> <!-- JobTracker HA: mapred.job.tracker.http.address.[nameservice ID] http 通信地址 --> <property> <name>mapred.job.tracker.http.address.logicaljt.jt1</name> <value>Hadoop-NN-01:50030</value> </property> <property> <name>mapred.job.tracker.http.address.logicaljt.jt2</name> <value>Hadoop-NN-02:50030</value> </property> <!-- JobTracker HA: mapred.ha.jobtracker.rpc-address.[nameservice ID] HA 守护进程 rpc 通信地址 --> <property> <name>mapred.ha.jobtracker.rpc-address.logicaljt.jt1</name> <value>Hadoop-NN-01:8023</value> </property> <property> <name>mapred.ha.jobtracker.rpc-address.logicaljt.jt2</name> <value>Hadoop-NN-02:8023</value> </property> <!-- JobTracker HA: mapred.ha.jobtracker.http-redirect-address.[nameservice ID] http重定向地址 --> <property> <name>mapred.ha.jobtracker.http-redirect-address.logicaljt.jt1</name> <value>Hadoop-NN-01:50030</value> </property> <property> <name>mapred.ha.jobtracker.http-redirect-address.logicaljt.jt2</name> <value>Hadoop-NN-02:50030</value> </property> <!--==================TaskTracker Client 配置====================== --> <property> <!--TaskTracker,Client连接JobTracker识别选择Active JobTracker策略 --> <name>mapred.client.failover.proxy.provider.logicaljt</name> <value>org.apache.hadoop.mapred.ConfiguredFailoverProxyProvider</value> </property> <property> <!--TaskTracker,Client最大故障重试次数 --> <name>mapred.client.failover.max.attempts</name> <value>15</value> </property> <property> <!--TaskTracker,Client首次故障切换前的等待时间 --> <name>mapred.client.failover.sleep.base.millis</name> <value>500</value> </property> <property> <!--TaskTracker,Client两次故障切换之间等待最大时间 --> <name>mapred.client.failover.sleep.max.millis</name> <value>1500</value> </property> <property> <!--TaskTracker,Client两次故障切换之间重试最大次数 --> <name>mapred.client.failover.connection.retries</name> <value>0</value> </property> <property> <!--TaskTracker,Client两次故障切换之间重试超时时间 --> <name>mapred.client.failover.connection.retries.on.timeouts</name> <value>0</value> </property> <!--==================JobTracker auto failover base ZKFC and Zookeeper====================== --> <!--开启基于Zookeeper及MRZKFC进程的自动备援设置,监视进程是否死掉 --> <property> <name>mapred.ha.automatic-failover.enabled</name> <value>true</value> </property> <property> <name>mapred.ha.zkfc.port</name> <value>8018</value> </property> <property> <name>ha.zookeeper.quorum</name> <value>Zookeeper-01:2181,Zookeeper-02:2181,Zookeeper-03:2181</value> </property> <property> <!--指定ZooKeeper超时间隔,单位毫秒 --> <name>ha.zookeeper.session-timeout.ms</name> <value>2000</value> </property> <!--==================JobTracker fencing:=============================================== --> <!--Failover后防止停掉的JobTracker启动,造成两个服务 --> <property> <name>mapred.ha.fencing.methods</name> <value>sshfence</value> </property> <property> <name>mapred.ha.fencing.ssh.private-key-files</name> <value>/home/puppet/.ssh/id_rsa</value> </property> <!--==================TaskTracker fencing:=============================================== --> <property> <name>mapreduce.tasktracker.http.address</name> <value>0.0.0.0:50033</value> </property> </configuration>
4) 配置core-site.xml,hdfs-site.xml:
将$HADOOP_PREFIX/etc/hadoop/core-site.xml,$HADOOP_PREFIX/etc/hadoop/hdfs-site.xml拷贝到$HADOOP_HOME/conf下
5) 配置slaves:vi $HADOOP_HOME/conf/slaves
Hadoop-DN-01 Hadoop-DN-02 Hadoop-DN-03 |
6)分发程序
scp -r /home/puppet/hadoop/cdh4.2.2/hadoop-2.0.0-mr1-cdh4.2.2 puppet@Hadoop-NN-02:/home/puppet/hadoop/cdh4.2.2/ scp -r /home/puppet/hadoop/cdh4.2.2/hadoop-2.0.0-mr1-cdh4.2.2 puppet@Hadoop-DN-01: /home/puppet/hadoop/cdh4.2.2/ scp -r /home/puppet/hadoop/cdh4.2.2/hadoop-2.0.0-mr1-cdh4.2.2 puppet@Hadoop-DN-02: /home/puppet/hadoop/cdh4.2.2/ scp -r /home/puppet/hadoop/cdh4.2.2/hadoop-2.0.0-mr1-cdh4.2.2 puppet@Hadoop-DN-02: /home/puppet/hadoop/cdh4.2.2/ |
7)初始化MRZFCK:
结点Hadoop-NN-01
命令$HADOOP_HOME/hadoop mrzkfc -formatZK
8)启动MRV1:
命令位置:$HADOOP_HOME/bin
在Hadoop-NN-01,Hadoop-NN-02启动jobtrackerha:bin/hadoop-daemon.sh start jobtrackerha
在Hadoop-NN-01,Hadoop-NN-02启动mrzfck: bin/hadoop-daemon.sh start mrzkfc
在Hadoop-NN-01上执行脚本: bin/hadoop-daemons.sh start tasktracker
9)验证:
进程:
1)JobTracker:Hadoop-NN-01,Hadoop-NN-02
[puppet@BigData-01 hadoop-2.0.0-mr1-cdh4.2.2]$ jps 27071 Jps 27051 MRZKFailoverController 26968 JobTrackerHADaemon 24707 NameNode 24993 DFSZKFailoverController |
2)TaskTracker:Hadoop-DN-01,Hadoop-DN-02,Hadoop-DN-03
[puppet@BigData-03 bin]$ jps 26497 JournalNode 25918 QuorumPeerMain 27173 TaskTracker 27218 Jps 26423 DataNode |
页面:
相关推荐
cloudera CDH4.4安装文档,支持集群
hive-0.13.1-cdh5.3.6-src.rarhive-0.13.1-cdh5.3.6-src.rarhive-0.13.1-cdh5.3.6-src.rarhive-0.13.1-cdh5.3.6-src.rarhive-0.13.1-cdh5.3.6-src.rarhive-0.13.1-cdh5.3.6-src.rarhive-0.13.1-cdh5.3.6-src.rarhive...
hadoop-2.5.0-cdh5.3.6-src.rarhadoop-2.5.0-cdh5.3.6-src.rarhadoop-2.5.0-cdh5.3.6-src.rarhadoop-2.5.0-cdh5.3.6-src.rarhadoop-2.5.0-cdh5.3.6-src.rarhadoop-2.5.0-cdh5.3.6-src.rarhadoop-2.5.0-cdh5.3.6-src....
hbase-0.98.6-cdh5.3.6-src.rarhbase-0.98.6-cdh5.3.6-src.rarhbase-0.98.6-cdh5.3.6-src.rarhbase-0.98.6-cdh5.3.6-src.rarhbase-0.98.6-cdh5.3.6-src.rarhbase-0.98.6-cdh5.3.6-src.rarhbase-0.98.6-cdh5.3.6-src....
- 在 CDH 环境中,需要将 Phoenix 安装到所有节点上,并配置 HBase 以使用 Phoenix。 - 修改配置文件(如 `conf/phoenix-server.properties`)以适应集群环境。 - 启动 Phoenix 服务并验证连接。 6. **使用 ...
因为CDH-6.3.2-1.cdh6.3.2.p0.1605554-el7.parcel命令生成的CDH-6.3.2-1.cdh6.3.2.p0.1605554-el7.parcel.sha1文件hash不匹配,自己找到官网上的parcel.sha1文件解决了这个问题,共享给大家
# 解压命令 tar -zxvf flink-shaded-hadoop-2-uber-3.0.0-cdh6.2.0-7.0.jar.tar.gz # 介绍 用于CDH部署 Flink所依赖的jar包
hive连接libthrift-0.9.0-cdh5-2.jar.hive的版本hive-0.13.1
1. **环境准备**:首先,你需要有一个运行着CDH 5.14.2的集群环境,包括HBase、Zookeeper和Hadoop等组件。确保所有组件都已经正确配置并且稳定运行。 2. **下载安装包**:从Apache官方网站或者其他可信源下载`...
hbase phoenix 客户端连接jdbc的jar包,SQuirreL SQL Client,DbVisualizer 等客户端连接hbase配置使用
CDH-6.3.2-1.cdh6.3.2.p0.1605554-el7.parcel.sha
CDH-6.3.2-1.cdh6.3.2.p0.1605554-el7.parcel CDH-6.3.2-1.cdh6.3.2.p0.1605554-el7.parcel.sha1 CDH-6.3.2-1.cdh6.3.2.p0.1605554-el7.parcel.sha256 manifest.json CentOS-7-x86_64-DVD-1611.iso cm6.3.1-redhat7....
hive连接jdbc的jar包hive-jdbc-1.1.0-cdh5.12.1-standalone.jar
hive-jdbc-2.1.1-cdh6.2.0(ieda等jdbc链接hive2.1.1);cdh6.2.0安装的hive2.1.1
hive连接驱动
2. **构建环境准备**:确保你的开发环境安装了Java Development Kit (JDK) 和Apache Maven,这是Java项目的主要构建工具。同时,可能还需要配置Hadoop和HBase的依赖库,因为Phoenix是建立在这两个项目之上的。 3. *...
CDH6.3.2完整安装包网盘下载,包含 CDH-6.3.2-1.cdh6.3.2.p0.1605554-bionic.parcel、CDH-6.3.2-1.cdh6.3.2.p0.1605554-bionic.parcel.sha1、CDH-6.3.2-1.cdh6.3.2.p0.1605554-bionic.parcel.sha256、CDH-6.3.2-1....
phoenix-4.8.0-cdh5.8.0-server.jar phoenix-4.8.0-cdh5.8.0-server.jar phoenix-4.8.0-cdh5.8.0-server.jar
kettle 9.1 连接hadoop clusters (CDH 6.2) 驱动
hive JDBC jar包。由于项目使用,此jar包从国外下载费了好大劲,现分享给大家。 hive JDBC jar包。由于项目使用,此jar包从国外下载费了好大劲,现分享给大家。 hive JDBC jar包。由于项目使用,此jar包从国外下载费...