`
sillycat
  • 浏览: 2542871 次
  • 性别: Icon_minigender_1
  • 来自: 成都
社区版块
存档分类
最新评论

hadoop(4)Cluster Setup on hadoop 2.4.1

 
阅读更多

hadoop(4)Cluster Setup on hadoop 2.4.1

1. Virtual Machine

Delete Ubuntu User
>userdel -r carl

List the Users
>users

Add User
>useradd carl

Change the computer name
>sudo vi /etc/hostname

Upgrade the Clients Machines
>apt-get update
>apt-get dist-upgrade

2. Install Environments on Clients
Install Java
>sudo add-apt-repository ppa:webupd8team/java
>sudo apt-get update
>sudo apt-get install oracle-java6-installer

Check java environment
>java -version
java version "1.6.0_45"

Install the SSH 
>sudo apt-get install ssh
>cd ~/.ssh/
>ssh-keygen -t rsa

It will generate 2 files there
>ls -l
total 12 -rw------- 1 carl carl 1675 Jun 30 22:45 id_rsa -rw-r--r-- 1 carl carl  399 Jun 30 22:45 id_rsa.pub

>cat id_rsa.pub >> authorized_keys

Restart the ssh server after add the key to the authorized_keys
>sudo service ssh restart

Diable the Firewall
>sudo ufw disable
Firewall stopped and disabled on system startup

Prepare protoc
>sudo apt-get install protobuf-compiler
>protoc --version
libprotoc 2.5.0

Download the Source
>wget http://apache.arvixe.com/hadoop/common/hadoop-2.4.1/hadoop-2.4.1-src.tar.gz
Uzip the file and prepare to build with maven

>mvn package -Pdist -DskipTests -Dtar

Get the hadoop-2.4.1.tar.gz file and unzip and put it in working directory. Add to environment
export HADOOP_PREFIX=/opt/hadoop
export JAVA_HOME=/usr/lib/jvm/java-6-oracle

Testing with Standalone Operation, Everything works fine.

Pseudo Distributed Mode
Almost the same, follow this blog http://sillycat.iteye.com/blog/2084169

Run MapReduce Job on YARN
Follow this blog again, http://sillycat.iteye.com/blog/2084169

3. Set up the Hadoop Cluster
Change the name of the machine
>sudo vi /etc/hostname

change ubuntu140401 to ubuntu-master
change ubuntu140402 to ubuntu-client1

Add these kind of things in hosts, making each server knows about each other.
127.0.0.1       ubuntu-client1

10.190.191.242  ubuntu-master

Configuration File on Client1 - core-site.xml
>cat /opt/hadoop/etc/hadoop/core-site.xml
<configuration>   

<property>

  <name>fs.defaultFS</name>

  <value>hdfs://ubuntu-master:9000</value>   

</property>   

<property>   

  <name>io.file.buffer.size</name>

  <value>131072</value>   

</property>   

<property>

  <name>hadoop.tmp.dir</name>

  <value>file:/opt/hadoop/temp</value>   

</property>   

<property>

  <name>hadoop.proxyuser.hadoop.hosts</name>

  <value>*</value>   

</property>   

<property>

  <name>hadoop.proxyuser.hadoop.groups</name>

  <value>*</value>   

</property>

</configuration>

Configuration file on client1 - hdfs-site.xml
>cat etc/hadoop/hdfs-site.xml
<configuration>   

  <property>

    <name>dfs.namenode.secondary.http-address</name>

    <value>ubuntu-master:9001</value>   

  </property>   

  <property>

    <name>dfs.namenode.name.dir</name>

    <value>file:/opt/hadoop/dfs/name</value>   

  </property>   

  <property>

    <name>dfs.datanode.data.dir</name>

    <value>file:/opt/hadoop/dfs/data</value>   

  </property>   

  <property>    

    <name>dfs.replication</name>

    <value>3</value>   

  </property>   

  <property>

    <name>dfs.webhdfs.enabled</name>       

    <value>true</value>   

  </property>

</configuration>

Configuration file on client1 - mapped-site.xml
>cp etc/hadoop/mapred-site.xml.template etc/hadoop/mapred-site.xml
>cat etc/hadoop/mapred-site.xml
<configuration>

  <property>

    <name>mapreduce.framework.name</name>

    <value>yarn</value>

  </property>

  <property>

    <name>mapreduce.jobhistory.address</name>

    <value>ubuntu-master:10020</value>

  </property>

  <property>

    <name>mapreduce.jobhistory.webapp.address</name>

    <value>ubuntu-master:19888</value>

  </property>

</configuration>

Configuration file on client1 - yarn-site.xml
>cat etc/hadoop/yarn-site.xml 
<configuration>

  <property>

    <name>yarn.nodemanager.aux-services</name>

    <value>mapreduce_shuffle</value>

  </property>

  <property>

    <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>    

    <value>org.apache.hadoop.mapred.ShuffleHandler</value>

  </property>

  <property>

    <name>yarn.resourcemanager.address</name>

    <value>ubuntu-master:8032</value>

  </property>

  <property>

    <name>yarn.resourcemanager.scheduler.address</name>

    <value>ubuntu-master:8030</value>

  </property>

  <property>

    <name>yarn.resourcemanager.resource-tracker.address</name>

    <value>ubuntu-master:8031</value>

  </property>

  <property>

    <name>yarn.resourcemanager.admin.address</name>

    <value>ubuntu-master:8033</value>

  </property>

  <property>

    <name>yarn.resourcemanager.webapp.address</name>

    <value>ubuntu-master:8088</value>

  </property>

</configuration>

Clone 2 more clients, Then I have 3 slaves machines ubuntu-client1, ubuntu-client2, ubuntu-client3. 1 master machine, ubuntu-master

Prepare the SSH connection
>cd ~/.ssh/
>cp id_rsa.pub ~/download/

scp this file to all the clients.
>cat id_rsa.pub >> ./authorized_keys

Then ubuntu-master can access to all the clients.

Start the HDFS cluster on ubuntu-master
>sbin/start-dfs.sh
Visit this URL http://ubuntu-master:50070/dfshealth.html#tab-overview

Start the YARN cluster on ubuntu-master
>sbin/start-yarn.sh

Visit the resource manager
http://ubuntu-master:8088/cluster

Node Manager on all the clients
http://ubuntu-client1:8042/node/node

Start the JobHistory Server
>sbin/mr-jobhistory-daemon.sh start historyserver

http://ubuntu-master:19888/jobhistory

Verify with Word count
create directory on HDFS
>hadoop fs -mkdir -p /data/worldcount
>hadoop fs -mkdir -p /output/

Put the xml files there
>hadoop fs -put /opt/hadoop/etc/hadoop/*.xml /data/worldcount/
>hadoop fs -ls /data/worldcount
14/07/09 13:25:21 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Found 7 items -rw-r--r--   3 carl supergroup       3589 2014-07-09 13:24 /data/worldcount/capacity-scheduler.xml -rw-r--r--   3 carl supergroup       1250 2014-07-09 13:24 /data/worldcount/core-site.xml -rw-r--r--   3 carl supergroup       9257 2014-07-09 13:24 /data/worldcount/hadoop-policy.xml -rw-r--r--   3 carl supergroup       1286 2014-07-09 13:24 /data/worldcount/hdfs-site.xml -rw-r--r--   3 carl supergroup        620 2014-07-09 13:24 /data/worldcount/httpfs-site.xml -rw-r--r--   3 carl supergroup       1063 2014-07-09 13:24 /data/worldcount/mapred-site.xml -rw-r--r--   3 carl supergroup       1456 2014-07-09 13:24 /data/worldcount/yarn-site.xml

>hadoop jar /opt/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.1.jar wordcount /data/worldcount /output/worldcount

Successfully get the results
>hadoop fs -cat /output/worldcount/*
"*" 17 "AS 7 "License”); 7 "alice,bob 17 (ASF) 1 (root 1


References:
http://hadoop.apache.org/docs/r2.4.0/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html
http://hadoop.apache.org/docs/r2.4.0/hadoop-yarn/hadoop-yarn-site/YARN.html

http://blog.sina.com.cn/s/blog_5c5d5cdf0101dvgq.html
http://www.haogongju.net/art/2707216
http://blog.csdn.net/hadoop_/article/details/24196193
http://blog.csdn.net/hadoop_/article/details/17716945

http://hadoop.apache.org/docs/r2.4.0/hadoop-project-dist/hadoop-common/ClusterSetup.html

http://blog.huangchaosuper.cn/work/tech/2014/04/24/hadoop-install.html

分享到:
评论

相关推荐

    hadoop2.4.1

    根据提供的标题、描述、标签及部分内容,我们可以提炼出与Hadoop 2.4.1集群搭建及Hadoop HA相关的详细知识点。以下是对这些知识点的详细阐述: ### Hadoop 2.4.1 集群搭建与Hadoop HA #### 一、Hadoop 2.4.1 版本...

    hadoop-2.4.1版本大数据

    《Hadoop 2.4.1版本:大数据处理的核心与实践》 Hadoop,作为大数据处理领域的核心组件,自诞生以来就备受关注。Hadoop 2.4.1是其发展过程中的一个重要版本,它在稳定性、性能和可扩展性上都有显著提升,为大数据的...

    Hadoop - cluster setup

    4. **操作系统兼容性**:建议使用统一的操作系统版本,如CentOS 7.x,以减少因系统差异带来的问题。 #### 三、安装 Hadoop的安装可以通过下载官方提供的二进制包或者通过源码编译的方式进行。通常情况下,下载预...

    64位Hadoop2.4.1

    《64位Hadoop2.4.1:构建高效分布式数据处理集群》 在大数据处理领域,Hadoop作为开源框架的代表,以其强大的分布式计算能力备受瞩目。64位Hadoop2.4.1是专为64位操作系统设计的版本,能够充分利用64位系统的资源,...

    Hadoop2.4.1的JAR包

    4. **YARN组件**:YARN是Hadoop资源管理和调度的新架构,它将原本在JobTracker中的资源管理和作业调度职责分离出来。ResourceManager负责全局资源分配,NodeManager则在每个节点上管理容器,运行任务。YARN提高了...

    windows下开发hadoop2.4.1程序需要覆盖的本地库文件

    4. **配置环境变量**:设置`HADOOP_HOME`环境变量指向Hadoop的安装目录,并将`bin`子目录添加到`PATH`环境变量中。这样可以在命令行中直接运行Hadoop命令。 5. **修改配置文件**:打开`hadoop-env.sh`配置文件,...

    hadoop 2.4.1 64位安装包

    最新版hadoop 64位安装包 centos 6.4 64位机器上编译 由于超过100MB,所以给的网盘链接地址 之前2.4的安装包地址错误,在这里一并补上

    hadoop2.4.1集群搭建.txt

    搭建有两个ResourceManager的Hadoop集群详细步骤,不可多得

    hadoop 2.4.1 64位hadoop.so.1.0.0

    hadoop 2.4.1 64位 hadoop.so.1.0.0 文件下载。

    Hadoop 2.4.1 64位编译版 hadoop-2.4.1-amd64.zip

    Hadoop官方不提供64位编译版,在此提供编译结果分享给大家 ...hadoop-2.4.1-amd64.z01 hadoop-2.4.1-amd64.z02 hadoop-2.4.1-amd64.zip 2.解压获得文件包: hadoop-2.4.1-amd64.tar.gz 3.在服务器上部署使用

    hadoop2.4.1集群安装

    hadoop 2.4.1完全分布式安装,主要建立5个节点的hadoop集群安装,采用apache原生态的hadoop

    Hadoop 2.4.1 64位编译版 hadoop-2.4.1-amd64.z02

    Hadoop官方不提供64位编译版,在此提供编译结果分享给大家 ...hadoop-2.4.1-amd64.z01 hadoop-2.4.1-amd64.z02 hadoop-2.4.1-amd64.zip 2.解压获得文件包: hadoop-2.4.1-amd64.tar.gz 3.在服务器上部署使用

    hadoop2.4.1_centos7_64位本包

    【标题】"hadoop2.4.1_centos7_64位本包" 提供的是针对64位 CentOS 7 操作系统编译的 Hadoop 2.4.1 版本。Hadoop 是一个开源框架,主要用于分布式存储和处理大数据。在32位系统上,Hadoop 的二进制包可以直接使用,...

    hadoop2.4.1(linux和window)及安装配置教程.7z

    此资源包含了hadoop2.4.1版本,并且有linux环境下的hadoop集群搭建说明以及eclipse下配置hadoop的插件,linux和window下都使用hadoop2.4.1只不过是lib下的native中资源库不一样(本资源都是64位)

    hadoop2.4.1伪分布式搭建

    hadoop2.4.1伪分布式搭建

    Hadoop2.4.1-x64& Hadoop2.6.0-x64

    文件为百度云下载链接,包含2.4.1 64位和32位,2.6.0 64位,编译环境均为CentOS 64 ...hadoop-2.4.1-x64.tar.gz ----2.4.1 64位 hadoop-2.4.1-x86.tar.gz ----2.4.1 32位 hadoop-2.6.0-x64.tar.gz ----2.6.0 64位

Global site tag (gtag.js) - Google Analytics