hadoop(4)Cluster Setup on hadoop 2.4.1
1. Virtual Machine
Delete Ubuntu User
>userdel -r carl
List the Users
>users
Add User
>useradd carl
Change the computer name
>sudo vi /etc/hostname
Upgrade the Clients Machines
>apt-get update
>apt-get dist-upgrade
2. Install Environments on Clients
Install Java
>sudo add-apt-repository ppa:webupd8team/java
>sudo apt-get update
>sudo apt-get install oracle-java6-installer
Check java environment
>java -version
java version "1.6.0_45"
Install the SSH
>sudo apt-get install ssh
>cd ~/.ssh/
>ssh-keygen -t rsa
It will generate 2 files there
>ls -l
total 12 -rw------- 1 carl carl 1675 Jun 30 22:45 id_rsa -rw-r--r-- 1 carl carl 399 Jun 30 22:45 id_rsa.pub
>cat id_rsa.pub >> authorized_keys
Restart the ssh server after add the key to the authorized_keys
>sudo service ssh restart
Diable the Firewall
>sudo ufw disable
Firewall stopped and disabled on system startup
Prepare protoc
>sudo apt-get install protobuf-compiler
>protoc --version
libprotoc 2.5.0
Download the Source
>wget http://apache.arvixe.com/hadoop/common/hadoop-2.4.1/hadoop-2.4.1-src.tar.gz
Uzip the file and prepare to build with maven
>mvn package -Pdist -DskipTests -Dtar
Get the hadoop-2.4.1.tar.gz file and unzip and put it in working directory. Add to environment
export HADOOP_PREFIX=/opt/hadoop
export JAVA_HOME=/usr/lib/jvm/java-6-oracle
Testing with Standalone Operation, Everything works fine.
Pseudo Distributed Mode
Almost the same, follow this blog http://sillycat.iteye.com/blog/2084169
Run MapReduce Job on YARN
Follow this blog again, http://sillycat.iteye.com/blog/2084169
3. Set up the Hadoop Cluster
Change the name of the machine
>sudo vi /etc/hostname
change ubuntu140401 to ubuntu-master
change ubuntu140402 to ubuntu-client1
Add these kind of things in hosts, making each server knows about each other.
127.0.0.1 ubuntu-client1
10.190.191.242 ubuntu-master
Configuration File on Client1 - core-site.xml
>cat /opt/hadoop/etc/hadoop/core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://ubuntu-master:9000</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/opt/hadoop/temp</value>
</property>
<property>
<name>hadoop.proxyuser.hadoop.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hadoop.groups</name>
<value>*</value>
</property>
</configuration>
Configuration file on client1 - hdfs-site.xml
>cat etc/hadoop/hdfs-site.xml
<configuration>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>ubuntu-master:9001</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/opt/hadoop/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/opt/hadoop/dfs/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
</configuration>
Configuration file on client1 - mapped-site.xml
>cp etc/hadoop/mapred-site.xml.template etc/hadoop/mapred-site.xml
>cat etc/hadoop/mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>ubuntu-master:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>ubuntu-master:19888</value>
</property>
</configuration>
Configuration file on client1 - yarn-site.xml
>cat etc/hadoop/yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>ubuntu-master:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>ubuntu-master:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>ubuntu-master:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>ubuntu-master:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>ubuntu-master:8088</value>
</property>
</configuration>
Clone 2 more clients, Then I have 3 slaves machines ubuntu-client1, ubuntu-client2, ubuntu-client3. 1 master machine, ubuntu-master
Prepare the SSH connection
>cd ~/.ssh/
>cp id_rsa.pub ~/download/
scp this file to all the clients.
>cat id_rsa.pub >> ./authorized_keys
Then ubuntu-master can access to all the clients.
Start the HDFS cluster on ubuntu-master
>sbin/start-dfs.sh
Visit this URL http://ubuntu-master:50070/dfshealth.html#tab-overview
Start the YARN cluster on ubuntu-master
>sbin/start-yarn.sh
Visit the resource manager
http://ubuntu-master:8088/cluster
Node Manager on all the clients
http://ubuntu-client1:8042/node/node
Start the JobHistory Server
>sbin/mr-jobhistory-daemon.sh start historyserver
http://ubuntu-master:19888/jobhistory
Verify with Word count
create directory on HDFS
>hadoop fs -mkdir -p /data/worldcount
>hadoop fs -mkdir -p /output/
Put the xml files there
>hadoop fs -put /opt/hadoop/etc/hadoop/*.xml /data/worldcount/
>hadoop fs -ls /data/worldcount
14/07/09 13:25:21 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Found 7 items -rw-r--r-- 3 carl supergroup 3589 2014-07-09 13:24 /data/worldcount/capacity-scheduler.xml -rw-r--r-- 3 carl supergroup 1250 2014-07-09 13:24 /data/worldcount/core-site.xml -rw-r--r-- 3 carl supergroup 9257 2014-07-09 13:24 /data/worldcount/hadoop-policy.xml -rw-r--r-- 3 carl supergroup 1286 2014-07-09 13:24 /data/worldcount/hdfs-site.xml -rw-r--r-- 3 carl supergroup 620 2014-07-09 13:24 /data/worldcount/httpfs-site.xml -rw-r--r-- 3 carl supergroup 1063 2014-07-09 13:24 /data/worldcount/mapred-site.xml -rw-r--r-- 3 carl supergroup 1456 2014-07-09 13:24 /data/worldcount/yarn-site.xml
>hadoop jar /opt/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.1.jar wordcount /data/worldcount /output/worldcount
Successfully get the results
>hadoop fs -cat /output/worldcount/*
"*" 17 "AS 7 "License”); 7 "alice,bob 17 (ASF) 1 (root 1
References:
http://hadoop.apache.org/docs/r2.4.0/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html
http://hadoop.apache.org/docs/r2.4.0/hadoop-yarn/hadoop-yarn-site/YARN.html
http://blog.sina.com.cn/s/blog_5c5d5cdf0101dvgq.html
http://www.haogongju.net/art/2707216
http://blog.csdn.net/hadoop_/article/details/24196193
http://blog.csdn.net/hadoop_/article/details/17716945
http://hadoop.apache.org/docs/r2.4.0/hadoop-project-dist/hadoop-common/ClusterSetup.html
http://blog.huangchaosuper.cn/work/tech/2014/04/24/hadoop-install.html
相关推荐
根据提供的标题、描述、标签及部分内容,我们可以提炼出与Hadoop 2.4.1集群搭建及Hadoop HA相关的详细知识点。以下是对这些知识点的详细阐述: ### Hadoop 2.4.1 集群搭建与Hadoop HA #### 一、Hadoop 2.4.1 版本...
《Hadoop 2.4.1版本:大数据处理的核心与实践》 Hadoop,作为大数据处理领域的核心组件,自诞生以来就备受关注。Hadoop 2.4.1是其发展过程中的一个重要版本,它在稳定性、性能和可扩展性上都有显著提升,为大数据的...
4. **操作系统兼容性**:建议使用统一的操作系统版本,如CentOS 7.x,以减少因系统差异带来的问题。 #### 三、安装 Hadoop的安装可以通过下载官方提供的二进制包或者通过源码编译的方式进行。通常情况下,下载预...
《64位Hadoop2.4.1:构建高效分布式数据处理集群》 在大数据处理领域,Hadoop作为开源框架的代表,以其强大的分布式计算能力备受瞩目。64位Hadoop2.4.1是专为64位操作系统设计的版本,能够充分利用64位系统的资源,...
4. **YARN组件**:YARN是Hadoop资源管理和调度的新架构,它将原本在JobTracker中的资源管理和作业调度职责分离出来。ResourceManager负责全局资源分配,NodeManager则在每个节点上管理容器,运行任务。YARN提高了...
4. **配置环境变量**:设置`HADOOP_HOME`环境变量指向Hadoop的安装目录,并将`bin`子目录添加到`PATH`环境变量中。这样可以在命令行中直接运行Hadoop命令。 5. **修改配置文件**:打开`hadoop-env.sh`配置文件,...
最新版hadoop 64位安装包 centos 6.4 64位机器上编译 由于超过100MB,所以给的网盘链接地址 之前2.4的安装包地址错误,在这里一并补上
搭建有两个ResourceManager的Hadoop集群详细步骤,不可多得
hadoop 2.4.1 64位 hadoop.so.1.0.0 文件下载。
Hadoop官方不提供64位编译版,在此提供编译结果分享给大家 ...hadoop-2.4.1-amd64.z01 hadoop-2.4.1-amd64.z02 hadoop-2.4.1-amd64.zip 2.解压获得文件包: hadoop-2.4.1-amd64.tar.gz 3.在服务器上部署使用
hadoop 2.4.1完全分布式安装,主要建立5个节点的hadoop集群安装,采用apache原生态的hadoop
Hadoop官方不提供64位编译版,在此提供编译结果分享给大家 ...hadoop-2.4.1-amd64.z01 hadoop-2.4.1-amd64.z02 hadoop-2.4.1-amd64.zip 2.解压获得文件包: hadoop-2.4.1-amd64.tar.gz 3.在服务器上部署使用
【标题】"hadoop2.4.1_centos7_64位本包" 提供的是针对64位 CentOS 7 操作系统编译的 Hadoop 2.4.1 版本。Hadoop 是一个开源框架,主要用于分布式存储和处理大数据。在32位系统上,Hadoop 的二进制包可以直接使用,...
此资源包含了hadoop2.4.1版本,并且有linux环境下的hadoop集群搭建说明以及eclipse下配置hadoop的插件,linux和window下都使用hadoop2.4.1只不过是lib下的native中资源库不一样(本资源都是64位)
hadoop2.4.1伪分布式搭建
文件为百度云下载链接,包含2.4.1 64位和32位,2.6.0 64位,编译环境均为CentOS 64 ...hadoop-2.4.1-x64.tar.gz ----2.4.1 64位 hadoop-2.4.1-x86.tar.gz ----2.4.1 32位 hadoop-2.6.0-x64.tar.gz ----2.6.0 64位