`
sillycat
  • 浏览: 2566926 次
  • 性别: Icon_minigender_1
  • 来自: 成都
社区版块
存档分类
最新评论

hadoop(3)Upgrade to YARN and Installation

 
阅读更多

hadoop(3)Upgrade to YARN and Installation

1. Introduction
Hadoop 1.0
HDFS and MapReduce, HDFS contains NameNode and multiple DataNode, MapReduce contains JobTracker and mutiple TaskTracker.

Hadoop 1.x, 0.21.X, 0.22.x

Hadoop 2.0
YARN(Yet Another Resource Negotiator)
JobTracker ——> ResourceManager, ApplicationMaster
Hadoop 0.23.x, 2.x

1. Installation

Try to build it my self.
Check Protocol Buffer
>protoc --version
libprotoc 2.5.0

Check the Java Version
>java -version
java version "1.6.0_65"

Create binary distribution without native code and without documentation:
>mvn package -Pdist -DskipTests -Dtar

If there is memory issue, then go this command, but till now, there is no memory issue for me
>export MAVEN_OPTS="-Xms256m -Xmx512m"

After build, I get this file in the diet directory. hadoop-2.4.0.tar.gz

Unzip the file
>tar zxvf hadoop-2.4.0.tar.gz
>sudo ln -s /Users/carl/tool/hadoop-2.4.0 /opt/hadoop-2.4.0
>sudo ln -s /opt/hadoop-2.4.0 /opt/hadoop

Edit the environment
export HADOOP_PREFIX=/opt/hadoop
export PATH=/opt/hadoop/bin:$PATH

>hadoop version
Hadoop 2.4.0 Subversion Unknown -r Unknown Compiled by carl on 2014-06-20T21:21Z

2. Standalone Operation
>mkdir input
>cp /opt/hadoop/etc/hadoop/*.xml input/
>hadoop jar /opt/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.0.jar grep input output 'dfs[a-z.]+'
>cat output/*

That is NON distribute mode. Single Java Process.

3. Pseudo Distributed Mode
Each Hadoop Daemon runs in a separate Java Process.

Setup ssh
I can directly ssh on my localhost. So I thought I already have done this.
>ssh-keygen -t dsa -P ‘’ -f ~/.ssh/id_dsa
>cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

Configuration
>vi etc/hadoop/core-site.xml

<configuration> 

  <property>       

    <name>fs.defaultFS</name>       

    <value>hdfs://localhost:9000</value> 

  </property>

</configuration>

>vi etc/hadoop/hdfs-site.xml

<configuration> 

  <property>       

    <name>dfs.replication</name>       

    <value>1</value> 

  </property>

</configuration>

Run the MapReduce job locally.

Format the filesystem
>hdfs namenode -format

Start the HDFS
>sbin/start-dfs.sh

Visit the overview pages from WEB UI
http://localhost:50070/dfshealth.html#tab-overview

Create the directories on HDFS
>hdfs dfs -mkdir /user
>hdfs dfs -mkdir /user/sillycat

Warning Message
2014-06-23 11:05:44.531 java[10276:1003] Unable to load realm info from SCDynamicStore 14/06/23 11:05:44 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

I know that I did not compile that native-hadoop library, so never mind.

Put the my local files to HDFS
>hdfs dfs -put etc/hadoop /user/sillycat/input

Run the hadoop command to start the jobs on HDFS
>hadoop jar /opt/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.0.jar grep /user/sillycat/input /user/sillycat/output 'dfs[a-z.]+'

Fetch the files from HDFS to local
>hdfs dfs -get /user/sillycat/output /Users/carl/work/hadoop/output2

Check the results from local disk
>cat /Users/carl/work/hadoop/output2/*

Chck the results from HDFS
>hdfs dfs -cat /user/sillycat/output/*

Stop the HDFS
>sbin/stop-dfs.sh

Run MapReduce Job on YARN
Prepare the Configuration
>cp etc/hadoop/mapred-site.xml.template etc/hadoop/mapred-site.xml

>vi etc/hadoop/mapred-site.xml

<configuration> 

  <property>       

    <name>mapreduce.framework.name</name>       

    <value>yarn</value> 

  </property>

</configuration>


>vi etc/hadoop/yarn-site.xml

<configuration> 

  <property>       

   <name>yarn.nodemanager.aux-services</name>              

   <value>mapreduce_shuffle</value> 

  </property>

</configuration>

Start the HDFS
>sbin/start-dfs.sh

Start the YARN
>sbin/start-yarn.sh

Web UI to track the jobs
http://localhost:8088/cluster

Setup the configuration files
>hdfs dfs -put /opt/hadoop/etc/hadoop /user/carl/input

Run the tasks
>hadoop jar /opt/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.0.jar grep input output 'dfs[a-z.]+'

But after I run the job, I did not see I get a good results. Then I go and check the log file from here.
/HADOOP_HOME/logs/yarn-carl-nodemanager-sparkworker1.local.log

Error Message
java.lang.RuntimeException: No class defined for mapreduce_shffle        at org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.serviceInit(AuxServices.java:109)        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)        at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)

Solution:
I get a typo in the configuration file. Change that. This should be the right value <value>mapreduce_shuffle</value>

Error Message:
14/06/23 12:20:21 INFO mapreduce.Job: Job job_1403543828091_0003 failed with state FAILED due to: Application application_1403543828091_0003 failed 2 times due to AM Container for appattempt_1403543828091_0003_000002 exited with  exitCode: 127 due to: Exception from container-launch: org.apache.hadoop.util.Shell$ExitCodeException: org.apache.hadoop.util.Shell$ExitCodeException: at org.apache.hadoop.util.Shell.runCommand(Shell.java:505) at org.apache.hadoop.util.Shell.run(Shell.java:418) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650)

Solution:
Find the log file from here http://localhost:8042/node/containerlogs/container_1403543828091_0003_01_000001/carl/stderr/?start=-4096

I saw this kind of error
/bin/bash: /bin/java: No such file or directory

>sudo ln -s /usr/bin/java /bin/java

Then run the command to execute the job
>hadoop jar /opt/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.0.jar grep input output 'dfs[a-z.]+'

It works.

References:
http://sillycat.iteye.com/blog/1556106
http://sillycat.iteye.com/blog/1556107

http://www.iteblog.com/archives/category/hadoop
http://my.oschina.net/leejun2005/blog/97802

Official Document
http://hadoop.apache.org/docs/r2.4.0/
http://www.bizdirusa.com/mirrors/apache/hadoop/common/ download URL

分享到:
评论

相关推荐

    Hadoop的yarn详解

    Hadoop的YARN架构是Hadoop版本2.x引入的一个重要组件,它负责处理资源管理和作业调度,而核心的计算任务处理则交给了MapReduce、Tez、Spark等计算框架。YARN的出现是为了解决Hadoop早期版本中的可扩展性问题,它通过...

    hadoop-yarn-api-2.5.1-API文档-中文版.zip

    赠送jar包:hadoop-yarn-api-2.5.1.jar; 赠送原API文档:hadoop-yarn-api-2.5.1-javadoc.jar; 赠送源代码:hadoop-yarn-api-2.5.1-sources.jar; 赠送Maven依赖信息文件:hadoop-yarn-api-2.5.1.pom; 包含翻译后...

    hadoop-yarn-client-2.6.5-API文档-中文版.zip

    赠送jar包:hadoop-yarn-client-2.6.5.jar; 赠送原API文档:hadoop-yarn-client-2.6.5-javadoc.jar; 赠送源代码:hadoop-yarn-client-2.6.5-sources.jar; 赠送Maven依赖信息文件:hadoop-yarn-client-2.6.5.pom;...

    Hadoop技术内幕_YARN

    Hadoop技术内幕_YARN-reading.part1.rar

    hadoop 源码解析_yarn源码解析

    Hadoop 源码解析_Yarn 源码解析 Hadoop 是一个基于 Java 的大数据处理框架,Yarn 是 Hadoop 的资源管理器,负责资源分配、任务调度和集群管理。下面是 Yarn 源码解析的知识点: 1. MR 程序提交 MR(MapReduce)...

    hadoop-yarn-client-2.6.5-API文档-中英对照版.zip

    赠送jar包:hadoop-yarn-client-2.6.5.jar; 赠送原API文档:hadoop-yarn-client-2.6.5-javadoc.jar; 赠送源代码:hadoop-yarn-client-2.6.5-sources.jar; 赠送Maven依赖信息文件:hadoop-yarn-client-2.6.5.pom;...

    Apache Hadoop 3.x state of the union and upgrade guidance

    Apache Hadoop YARN is the modern distributed operating system for big data applications....And you’ll leave with all the knowledge of how to upgrade painlessly from 2.x to 3.x to get all the benefits.

    Apache Hadoop YARN:【Hadoop YARN权威指南】

    Apache Hadoop YARN:Moving beyond MapReduce and Batch Processing with Apach 2 【yarn权威指南】

    Apache Hadoop:Hadoop资源管理器YARN详解.docx

    Apache Hadoop:Hadoop资源管理器YARN详解.docx

    hadoop-yarn-server-resourcemanager-2.7.4.jar

    hadoop2.7.4安装包补丁包,解决yarn定时调度启动问题!!

    05.hadoop上课笔记之hadoop5mapreduce和yarn

    Hadoop.MapReduce 和 YARN 笔记 本节笔记主要介绍了 Hadoop.MapReduce 和 YARN 的基本概念、组成部分、工作原理以及实践应用。 一、MapReduce 概念 MapReduce 是 Hadoop 的核心组件之一,负责处理大规模数据。...

    hadoop-yarn-api-2.7.3-API文档-中英对照版.zip

    赠送jar包:hadoop-yarn-api-2.7.3.jar; 赠送原API文档:hadoop-yarn-api-2.7.3-javadoc.jar; 赠送源代码:hadoop-yarn-api-2.7.3-sources.jar; 赠送Maven依赖信息文件:hadoop-yarn-api-2.7.3.pom; 包含翻译后...

    hadoop-yarn-client-2.7.3-API文档-中英对照版.zip

    赠送jar包:hadoop-yarn-client-2.7.3.jar; 赠送原API文档:hadoop-yarn-client-2.7.3-javadoc.jar; 赠送源代码:hadoop-yarn-client-2.7.3-sources.jar; 赠送Maven依赖信息文件:hadoop-yarn-client-2.7.3.pom;...

    hadoop-yarn-server-web-proxy-2.6.0-API文档-中文版.zip

    赠送jar包:hadoop-yarn-server-web-proxy-2.6.0.jar; 赠送原API文档:hadoop-yarn-server-web-proxy-2.6.0-javadoc.jar; 赠送源代码:hadoop-yarn-server-web-proxy-2.6.0-sources.jar; 赠送Maven依赖信息文件:...

    Hadoop资源管理器YARN详解

    ### Hadoop资源管理器YARN详解 #### 一、引言 随着大数据处理需求的日益增长,Hadoop作为主流的大数据处理平台之一,其资源管理能力对于提高整体系统的性能至关重要。YARN(Yet Another Resource Negotiator)是...

    hadoop 2.9.0 yarn-default.xml 属性集

    3. yarn.resourcemanager.bind-host:资源管理器用于监听连接请求的主机名或IP地址,默认值为*,表示绑定所有可用网络接口。 4. yarn.resourcemanager.scheduler.address:此属性设置资源管理器调度器的主机地址和...

    hadoop-yarn-server-resourcemanager-2.6.0-API文档-中文版.zip

    赠送jar包:hadoop-yarn-server-resourcemanager-2.6.0.jar; 赠送原API文档:hadoop-yarn-server-resourcemanager-2.6.0-javadoc.jar; 赠送源代码:hadoop-yarn-server-resourcemanager-2.6.0-sources.jar; 赠送...

    hadoop2.6基于yarn的安装配置

    192.168.0.30 hadoop3 192.168.0.40 hadoop4 ``` 关闭防火墙 在每台机器上,我们需要关闭防火墙,以便允许集群中的节点进行通信。我们可以使用 `chkconfig iptables off` 命令来关闭防火墙。 配置 SSH 免密码登录...

Global site tag (gtag.js) - Google Analytics