hadoop2.5.0
【步骤】
1. 准备条件
(1)集群规划
主机类型 | IP地址 | 域名 |
master | 192.168.3.132 | hadoop01 |
slave1 | 192.168.3.134 | hadoop02 |
slave2 | 192.168.3.136 | hadoop03 |
slave3 | 192.168.3.138 | hadoop04 |
(2)以root身份登录操作系统
(3)在集群中的每台主机上执行如下命令,设置主机名。
hostname hadoop0*
编辑文件/etc/sysconfig/network如下
HOSTNAME= hadoop0*
(4)修改文件/etc/hosts如下
192.168.86.10 master.hadoop.com
192.168.86.11 slave1.hadoop.com
192.168.86.12 slave2.hadoop.com
192.168.86.13 slave3.hadoop.com
执行如下命令,将hosts文件复制到集群中每台主机上
scp /etc/hosts 192.168.50.*:/etc/hosts
(5)安装jdk
rpm -ivh jdk-7u67-linux-x64.rpm
创建文件
echo -e "JAVA_HOME=/usr/java/default\nexport PATH=\$JAVA_HOME/bin:\$PATH" > /etc/profile.d/java-env.sh
. /etc/profile.d/java-env.sh
(6)关闭iptables
service iptables stop
chkconfig iptables off
(7)关闭selinux。修改文件/etc/selinux/config,然后重启操作系统
SELINUX=disabled
2. 安装 (with YARN)
(1)在master.hadoop.com主机上执行
yum install hadoop-yarn-resourcemanager hadoop-mapreduce-historyserver hadoop-yarn-proxyserver hadoop-hdfs-namenode
yum install hadoop-hdfs-secondarynamenode 可选,如果使用HA,就不要安装此包
(2)在所有的slave*.hadoop.com主机上执行
yum install hadoop-yarn-nodemanager hadoop-mapreduce hadoop-hdfs-datanode
3. 配置。将以下文件修改完毕后,用scp命令复制到集群中的所有主机上
(1)创建配置文件

cp -r /etc/hadoop/conf.empty /etc/hadoop/conf.my_cluster alternatives --install /etc/hadoop/conf hadoop-conf /etc/hadoop/conf.my_cluster 50 alternatives --set hadoop-conf /etc/hadoop/conf.my_cluster
(2)创建必要的本地文件夹

sudo -u hdfs hadoop fs -mkdir -p /tmp && sudo -u hdfs hadoop fs -chmod -R 1777 /tmp sudo -u hdfs hadoop fs -mkdir -p /tmp/hadoop-yarn && sudo -u hdfs hadoop fs -chown -R mapred:mapred /tmp/hadoop-yarn sudo -u hdfs hadoop fs -mkdir -p /tmp/hadoop-yarn/staging/history/done_intermediate && sudo -u hdfs hadoop fs -chown -R mapred:mapred /tmp/hadoop-yarn/staging && sudo -u hdfs hadoop fs -chmod -R 1777 /tmp sudo -u hdfs hadoop fs -mkdir -p /var sudo -u hdfs hadoop fs -mkdir -p /var/log && sudo -u hdfs hadoop fs -chmod -R 1775 /var/log && sudo -u hdfs hadoop fs -chown yarn:mapred /var/log sudo -u hdfs hadoop fs -mkdir -p /var/log/hadoop-yarn/apps && sudo -u hdfs hadoop fs -chmod -R 1777 /var/log/hadoop-yarn/apps && sudo -u hdfs hadoop fs -chown yarn:mapred /var/log/hadoop-yarn/apps sudo -u hdfs hadoop fs -mkdir -p /user sudo -u hdfs hadoop fs -mkdir -p /user/history && sudo -u hdfs hadoop fs -chown mapred /user/history sudo -u hdfs hadoop fs -mkdir -p /user/test && sudo -u hdfs hadoop fs -chmod -R 777 /user/test && sudo -u hdfs hadoop fs -chown test /user/test sudo -u hdfs hadoop fs -mkdir -p /user/root && sudo -u hdfs hadoop fs -chmod -R 777 /user/root && sudo -u hdfs hadoop fs -chown root /user/root
(3)修改配置文件
1)core-site.xml

<property> <name>fs.defaultFS</name> <value>hdfs://master.hadoop.com:8020</value> </property> <property> <name>fs.trash.interval</name> <value>1440</value> </property> <property> <name>fs.trash.checkpoint.interval</name> <value>720</value> </property> <property> <name>hadoop.proxyuser.mapred.groups</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.mapred.hosts</name> <value>*</value> </property> <property> <name>io.compression.codecs</name> <value>org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.BZip2Codec,com.hadoop.compression.lzo.LzoCodec,com.hadoop.compression.lzo.LzopCodec,org.apache.hadoop.io.compress.SnappyCodec</value> </property>
2)hdfs-site.xml

<property> <name>dfs.permissions.superusergroup</name> <value>hadoop</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>file:///data/1/dfs/nn</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>file:///data/1/dfs/dn,file:///data/2/dfs/dn,file:///data/3/dfs/dn,file:///data/4/dfs/dn</value> </property> <property> <name>dfs.datanode.failed.volumes.tolerated</name> <value>3</value> </property> <property> <name>dfs.datanode.fsdataset.volume.choosing.policy</name> <value>org.apache.hadoop.hdfs.server.datanode.fsdataset.AvailableSpaceVolumeChoosingPolicy</value> </property> <property> <name>dfs.datanode.available-space-volume-choosing-policy.balanced-space-threshold</name> <value>10737418240</value> </property> <property> <name>dfs.datanode.available-space-volume-choosing-policy.balanced-space-preference-fraction</name> <value>0.75</value> </property> <property> <name>dfs.webhdfs.enabled</name> <value>true</value> </property> <property> <name>dfs.webhdfs.user.provider.user.pattern</name> <value>^[A-Za-z0-9_][A-Za-z0-9._-]*[$]?$</value> </property>
3)yarn-site.xml

<property> <name>yarn.resourcemanager.hostname</name> <value>master.hadoop.com</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> <property> <name>yarn.log-aggregation-enable</name> <value>true</value> </property> <property> <description>List of directories to store localized files in.</description> <name>yarn.nodemanager.local-dirs</name> <value>/data/1/yarn/local,/data/2/yarn/local,/data/3/yarn/local,/data/4/yarn/local</value> </property> <property> <description>Where to store container logs.</description> <name>yarn.nodemanager.log-dirs</name> <value>/data/1/yarn/logs,/data/2/yarn/logs,/data/3/yarn/logs,/data/4/yarn/logs</value> </property> <property> <description>Where to aggregate logs to.</description> <name>yarn.nodemanager.remote-app-log-dir</name> <value>hdfs://master.hadoop.com:8020/var/log/hadoop-yarn/apps</value> </property> <property> <description>Classpath for typical applications.</description> <name>yarn.application.classpath</name> <value> $HADOOP_CONF_DIR, $HADOOP_COMMON_HOME/*,$HADOOP_COMMON_HOME/lib/*, $HADOOP_HDFS_HOME/*,$HADOOP_HDFS_HOME/lib/*, $HADOOP_MAPRED_HOME/*,$HADOOP_MAPRED_HOME/lib/*, $HADOOP_YARN_HOME/*,$HADOOP_YARN_HOME/lib/* </value> </property> <property> <name>yarn.web-proxy.address</name> <value>master.hadoop.com</value> </property> <property> <description>It's not the memory the physical machine totally has, but that allocated to containers</description> <name>yarn.nodemanager.resource.memory-mb</name> <value>5120</value> </property> <property> <name>yarn.scheduler.minimum-allocation-mb</name> <value>512</value> </property> <property> <name>yarn.scheduler.maximum-allocation-mb</name> <value>10240</value> </property> <property> <name>yarn.app.mapreduce.am.resource.mb</name> <value>512</value> </property> <property> <name>yarn.app.mapreduce.am.command-opts</name> <value>-Xmx512m</value> </property> <property> <name>yarn.nodemanager.vmem-pmem-ratio</name> <value>2.1</value> </property> <property> <name>yarn.nodemanager.resource.cpu-vcores</name> <value>4</value> </property> <property> <name>yarn.scheduler.minimum-allocation-vcores</name> <value>1</value> </property> <property> <name>yarn.scheduler.maximum-allocation-vcores</name> <value>10</value> </property> <property> <name>yarn.scheduler.increment-allocation-mb</name> <value>512</value> </property> <property> <name>yarn.scheduler.increment-allocation-vcores</name> <value>1</value> </property>
4)mapred-site.xml

<property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>mapreduce.jobhistory.address</name> <value>master.hadoop.com:10020</value> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>master.hadoop.com:19888</value> </property> <property> <name>yarn.app.mapreduce.am.staging-dir</name> <value>/user/history</value> </property> <property> <name>mapreduce.jobhistory.intermediate-done-dir</name> <value>/user/history/intermediate-done-dir</value> </property> <property> <name>mapreduce.jobhistory.done-dir</name> <value>/user/history/done-dir</value> </property>
(4)复制配置文件到集群中的所有主机上
scp /etc/hadoop/conf.my_cluster/*-site.xml 192.168.50.*:/etc/hadoop/conf.my_cluster/
4. 格式化HDFS
sudo -u hdfs hdfs namenode -format
5. 启动HDFS
for x in `cd /etc/init.d ; ls hadoop-hdfs-*`; do service $x start; done
6. 在HDFS上创建必要的文件夹

sudo -u hdfs hadoop fs -mkdir -p /tmp && sudo -u hdfs hadoop fs -chmod -R 1777 /tmp sudo -u hdfs hadoop fs -mkdir -p /tmp/hadoop-yarn && sudo -u hdfs hadoop fs -chown -R mapred:mapred /tmp/hadoop-yarn sudo -u hdfs hadoop fs -mkdir -p /tmp/hadoop-yarn/staging/history/done_intermediate && sudo -u hdfs hadoop fs -chown -R mapred:mapred /tmp/hadoop-yarn/staging && sudo -u hdfs hadoop fs -chmod -R 1777 /tmp sudo -u hdfs hadoop fs -mkdir -p /var sudo -u hdfs hadoop fs -mkdir -p /var/log && sudo -u hdfs hadoop fs -chmod -R 1775 /var/log && sudo -u hdfs hadoop fs -chown yarn:mapred /var/log sudo -u hdfs hadoop fs -mkdir -p /var/log/hadoop-yarn/apps && sudo -u hdfs hadoop fs -chmod -R 1777 /var/log/hadoop-yarn/apps && sudo -u hdfs hadoop fs -chown yarn:mapred /var/log/hadoop-yarn/apps sudo -u hdfs hadoop fs -mkdir -p /user sudo -u hdfs hadoop fs -mkdir -p /user/history && sudo -u hdfs hadoop fs -chown mapred /user/history sudo -u hdfs hadoop fs -mkdir -p /user/test && sudo -u hdfs hadoop fs -chmod -R 777 /user/test && sudo -u hdfs hadoop fs -chown test /user/test sudo -u hdfs hadoop fs -mkdir -p /user/root && sudo -u hdfs hadoop fs -chmod -R 777 /user/root && sudo -u hdfs hadoop fs -chown root /user/root
7. 操作YARN
在集群中每台机器上执行如下命令:
(1)启动

service hadoop-yarn-resourcemanager start;service hadoop-mapreduce-historyserver start;service hadoop-yarn-proxyserver start;service hadoop-yarn-nodemanager start
(2)查看

service hadoop-yarn-resourcemanager status;service hadoop-mapreduce-historyserver status;service hadoop-yarn-proxyserver status;service hadoop-yarn-nodemanager status
(3)停止

service hadoop-yarn-resourcemanager stop;service hadoop-mapreduce-historyserver stop;service hadoop-yarn-proxyserver stop;service hadoop-yarn-nodemanager stop
(4)重启

service hadoop-yarn-resourcemanager restart;service hadoop-mapreduce-historyserver restart;service hadoop-yarn-proxyserver restart;service hadoop-yarn-nodemanager restart
8. 安装Hadoop客户端
(1)安装CentOS 6.5
(2)以root身份登录,执行以下命令:

rpm -ivh jdk-7u67-linux-x64.rpm yum install hadoop-client cp -r /etc/hadoop/conf.empty /etc/hadoop/conf.my_cluster alternatives --install /etc/hadoop/conf hadoop-conf /etc/hadoop/conf.my_cluster 50 alternatives --set hadoop-conf /etc/hadoop/conf.my_cluster scp 192.168.50.10:/etc/hadoop/conf.my_cluster/*-site.xml /etc/hadoop/conf.my_cluster/ scp 192.168.50.10:/etc/hosts /etc/ scp 192.168.50.10:/etc/profile.d/hadoop-env.sh /etc/profile.d/ . /etc/profile useradd -u 700 -g hadoop test passwd test <test用户密码>
9. 测试Hadoop with YARN

su - test #计算Pi hadoop fs -mkdir input hadoop fs -put /etc/hadoop/conf/*.xml input hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar wordcount input output hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar pi 2 100 #执行grep任务 hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar grep input output 'dfs[a-z.]+' hadoop fs -ls output hadoop fs -cat output/part-r-00000 | head
相关推荐
### Centos6.5镜像文件下载与安装详解 #### 一、CentOS 6.5简介 CentOS是Community ENTerprise Operating System的缩写,是一个基于Linux内核的操作系统,它主要提供了企业级的计算环境。CentOS 6.5作为CentOS 6...
在本教程中,我们将深入探讨如何在CentOS 6.5操作系统上搭建Hadoop 2.6.4环境。Hadoop是一个开源的分布式计算框架,主要用于处理和存储大量数据。让我们一步步来了解这个过程。 首先,确保你的系统是最新状态,通过...
CentOs6.5镜像源
centos 6.5 telnet 客户端 安装文件 rpm包 centos 6.5 telnet 服务端 安装文件 rpm包 http://vault.centos.org/6.5/os/x86_64/Packages/ 从官网下载亲测可以用
CentOS 6.5 的 软件源很多官方已经停止维护,该文件包内配置好的yum源是可以正常使用的。将原/etc/yum.repos.d/目录下原来的 .repo 备份, 然后把本文件包内的repo文件复制过去, yum clean all 后,再 yum ...
标题"基于centos6.5 已经编译好的hadoop-2.6.4"指的是在CentOS 6.5操作系统环境下,已经完成了对Hadoop 2.6.4版本的编译工作。这通常意味着用户可以直接在同样环境或相似环境中使用这个编译好的版本,而无需自己进行...
CentOS6.5的64位镜像文件iso。CentOS是基于linux内核而扩展的操作系统。
### CentOS6.5系统下Hadoop2.6.0完全分布式环境安装与配置知识点 #### 一、系统环境准备 **1.1 修改主机名** 为了确保主机名的一致性,首先需要更改主机名为“Master”: ```bash sudo vim /etc/sysconfig/...
CentOS 6.5 安装教程详解 CentOS 6.5 是一个稳定、功能强大且广泛应用的 Linux 发行版,本文将详细介绍 CentOS 6.5 的安装步骤,包括虚拟机的安装、CentOS 镜像文件的下载和使用、连接工具的配置等内容。 一、...
CentOS 6.5 下安装 Oracle 11g 本文档旨在指导用户在 CentOS 6.5 操作系统下安装 Oracle 11g 数据库management system。安装过程中需要满足一定的硬件和软件要求,并进行相应的网络设置和 RPM 依赖包安装。 一、...
"VMware创建安装CentOS6.5(配截图)" VMware 是一款功能强大且流行的虚拟机软件,用户可以在 VMware 中创建多个虚拟机,每个虚拟机都可以独立运行不同的操作系统。CentOS6.5 是一个基于 Linux 的开源操作系统,...
标题“centos6.5-hadoop-2.6.0-cdh5.9.0-nativelib”提及的是一个专为64位CentOS 6.5操作系统编译的Hadoop 2.6.0 CDH 5.9.0的本地库(nativelib)打包文件。这个压缩包包含了运行Hadoop在CentOS环境下所需的本机库,...
资源名称:CentOS 6.5 x64下安装19实体节点Hadoop 2.2.0集群配置指南内容简介: CentOS 6.5 x64下安装19实体节点Hadoop 2.2.0集群配置指南主要讲述的是CentOS 6.5 x64下安装19实体节点Hadoop 2.2.0集群配置指南;...
Centos6.5升级openssh-9.1p1包括32位和9.3版本的63位的全部rpm包,附带telnet客户端的rpm包 32位: openssh-9.1p1-1.el6.i386.rpm openssh-clients-9.1p1-1.el6.i386.rpm openssh-server-9.1p1-1.el6.i386.rpm ...
在Linux系统中,CentOS 6.5是一个广泛使用的版本,而MySQL 5.6则是一个流行的开源数据库管理系统。本文将详细介绍如何在CentOS 6.5上进行MySQL 5.6的一键安装过程,包括必要的环境准备、安装步骤、配置优化以及安全...
本文将详细讲解如何在CentOS 6.5系统上升级OpenSSH。 首先,升级OpenSSH通常涉及升级其依赖的库,特别是openssl。openssl是提供加密功能的核心库,新的版本通常会包含更强的加密算法和修复的安全漏洞。在升级...
标题中的"CentOS 6.5镜像下载地址,OpenStack专用,qcow2格式"指的是一款专为OpenStack设计的CentOS 6.5操作系统镜像文件,该文件采用了qcow2作为磁盘映像格式。qcow2是一种高效的虚拟磁盘格式,它在OpenStack这样的...
centos6.5 64位 gcc离线安装包,内含所有离线安装gcc所有rpm包。可依次安装,也可一起安装。内有安装说明。 cloog-ppl-0.15.7-1.2.el6.x86_64.rpm cpp-4.4.7-4.el6.x86_64.rpm gcc-4.4.7-4.el6.x86_64.rpm gcc-c++-...
### CentOS 6.5 for Oracle 的安装与配置详解 #### 一、概述 本文将详细介绍如何在服务器上安装并配置CentOS 6.5系统,特别是针对Oracle数据库环境的优化和配置方法。CentOS 6.5是一款稳定且广泛使用的Linux发行版...
这个特定的压缩包文件 "centos6.5-hadoop-2.6.4.tar.gz" 是为在CentOS 6.5操作系统上安装Hadoop 2.6.4版本准备的。本文将详细讲解如何在CentOS 6.5系统中安装和配置Hadoop,以及Hadoop的基本工作原理。 首先,我们...