`
Donald_Draper
  • 浏览: 980097 次
社区版块
存档分类
最新评论

Hadoop2.7.1伪分布式环境搭建

阅读更多
系统环境:
Ubuntu15.10
Hadoop:2.7.1
java:1.7.0_79
1.安装SSH 并产生公私钥
sudo apt-get install ssh
ssh-keygen  -t  dsa  -P  ''  -f  ~/.ssh/id_dsa
cat  ~/.ssh/id_dsa.pub  >>  ~/.ssh/authorized_keys

2.安装同步工具:
sudo apt-get install rsync

3.下载jdk1.7.0_79
解压到/usr/lib/java/下:
4.下Hadoop2.7.1
解压到/hadoop下:
donald_draper@rain:/hadoop$ tar -zxvf hadoop-2.7.1
5.配置环境变量:
vim ./bashrc

在文件尾部添加:
export JAVA_HOME=/usr/lib/java/jdk1.7.0_79
export JRE_HOME=${JAVA_HOME}/jre
export CLASS_PATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
export HADOOP_HOME=/hadoop/hadoop-2.7.1
export PATH=${JAVA_HOME}/bin:${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin:${PATH} 

:wq
保存退出
6.配置hadoop
hadoop2.7.1的所有配置文件从存在/hadoop/hadoop-2.7.1/etc/hadoop之中。
cd /hadoop/hadoop-2.7.1/etc/hadoop
1)修改hadoop-env.sh 加入jdk家目录
export  JAVA_HOME=/usr/lib/java/jdk1.7.0_79

2)修改core-site.xml
donald_draper@rain:/hadoop/hadoop-2.7.1/etc/hadoop$ cat core-site.xml 
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->

<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://rain:9000</value>
    </property>
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/hadoop/tmp</value>
    </property>
</configuration>

3)修改hdfs-site.xml 
donald_draper@rain:/hadoop/hadoop-2.7.1/etc/hadoop$ cat hdfs-site.xml 
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
</configuration>

4)修改mapred-site.xml
donald_draper@rain:/hadoop/hadoop-2.7.1/etc/hadoop$ cat mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
            <property>
                <name>mapreduce.framework.name</name>
                <value>yarn</value>
            </property>
            <!-- 启动historyserver  -->  
            <property>  
                 <name>mapreduce.jobhistory.address</name>  
                 <value>rain:10020</value>  
            </property>  
  
            <property>  
                  <name>mapreduce.jobhistory.webapp.address</name>  
                  <value>rain:19888</value>  
            </property>  
            <!--dir为分布式文件系统中的文件目录,启动时先启动dfs,在启动historyserver -->  
            <property>  
                   <name>mapreduce.jobhistory.intermediate-done-dir</name>  
                   <value>/history/indone</value>  
            </property>  
            <!--dir为分布式文件系统中的文件目录,启动时先启动dfs,在启动historyserver -->  
            <property>  
                  <name>mapreduce.jobhistory.done-dir</name>  
                  <value>/history/done</value>  
           </property>  
</configuration>


5)修改yarn-site.xml
donald_draper@rain:/hadoop/hadoop-2.7.1/etc/hadoop$ cat yarn-site.xml 
<?xml version="1.0"?>
<configuration>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
</configuration>

6)修改slaves
slaves是指定子节点的位置,因为要在name上启动HDFS、在amrm启动yarn,所以name上的slaves文件指定的是datanode的位置,amrm上的slaves文件指定的是nodemanager的位置
cd /hadoop/hadoop-2.7.1/etc/hadoop/
vim slaves
rain
6.格式化HDFS,执行格式化命令 bin/
hdfs  namenode  -format  

7.启动HDFS,
cd  /hadoop/hadoop-2.7.1/sbin/
donald_draper@rain:/hadoop/hadoop-2.7.1/sbin$ ./start-dfs.sh
Starting namenodes on [rain]
rain: starting namenode, logging to /hadoop/hadoop-2.7.1/logs/hadoop-donald_draper-namenode-rain.out
localhost: starting datanode, logging to /hadoop/hadoop-2.7.1/logs/hadoop-donald_draper-datanode-rain.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /hadoop/hadoop-2.7.1/logs/hadoop-donald_draper-secondarynamenode-rain.out

8.启动历史服务器
donald_draper@rain:/hadoop/hadoop-2.7.1/sbin$ ./mr-jobhistory-daemon.sh  start historyserver
starting historyserver, logging to /hadoop/hadoop-2.7.1/logs/mapred-donald_draper-historyserver-rain.out

9.启动YARN
cd  /hadoop/hadoop-2.7.1/sbin/
donald_draper@rain:/hadoop/hadoop-2.7.1/sbin$ ./start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /hadoop/hadoop-2.7.1/logs/yarn-donald_draper-resourcemanager-rain.out
localhost: starting nodemanager, logging to /hadoop/hadoop-2.7.1/logs/yarn-donald_draper-nodemanager-rain.out

10.查看hdfs及yarn启动情况:
donald_draper@rain:/hadoop/hadoop-2.7.1/logs$ jps
7114 DataNode
7743 NodeManager
8921 Jps
7607 ResourceManager
7319 SecondaryNameNode
8779 JobHistoryServer
6984 NameNode

11.执行job
  
1)hdfs  dfs  -mkdir /test
   2)hdfs  dfs  -mkdir /test/input
   3)hdfs  dfs  -put  etc/hadoop/*.xml  /test/input
  4)donald_draper@rain:/hadoop/hadoop-2.7.1$ hadoop jar  share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar grep /test/input /test/output 'dfs[a-z.]+'

执行过程:
16/08/15 11:37:50 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
16/08/15 11:37:52 INFO input.FileInputFormat: Total input paths to process : 9
16/08/15 11:37:52 INFO mapreduce.JobSubmitter: number of splits:9
16/08/15 11:37:52 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1471230621598_0001
16/08/15 11:37:53 INFO impl.YarnClientImpl: Submitted application application_1471230621598_0001
16/08/15 11:37:53 INFO mapreduce.Job: The url to track the job: http://rain:8088/proxy/application_1471230621598_0001/
16/08/15 11:37:53 INFO mapreduce.Job: Running job: job_1471230621598_0001
16/08/15 11:38:16 INFO mapreduce.Job: Job job_1471230621598_0001 running in uber mode : false
16/08/15 11:38:16 INFO mapreduce.Job:  map 0% reduce 0%
16/08/15 11:45:11 INFO mapreduce.Job:  map 67% reduce 0%
16/08/15 11:48:06 INFO mapreduce.Job:  map 74% reduce 22%
16/08/15 11:48:22 INFO mapreduce.Job:  map 89% reduce 22%
16/08/15 11:48:23 INFO mapreduce.Job:  map 100% reduce 22%
16/08/15 11:48:49 INFO mapreduce.Job:  map 100% reduce 30%
16/08/15 11:48:51 INFO mapreduce.Job:  map 100% reduce 33%
16/08/15 11:48:54 INFO mapreduce.Job:  map 100% reduce 67%
16/08/15 11:49:03 INFO mapreduce.Job:  map 100% reduce 100%
16/08/15 11:49:25 INFO mapreduce.Job: Job job_1471230621598_0001 completed successfully
16/08/15 11:49:45 INFO mapreduce.Job: Counters: 50
	File System Counters
		FILE: Number of bytes read=51
		FILE: Number of bytes written=1156955
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=28205
		HDFS: Number of bytes written=143
		HDFS: Number of read operations=30
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=2
	Job Counters 
		Killed map tasks=2
		Launched map tasks=11
		Launched reduce tasks=1
		Data-local map tasks=11
		Total time spent by all maps in occupied slots (ms)=3308143
		Total time spent by all reduces in occupied slots (ms)=227199
		Total time spent by all map tasks (ms)=3308143
		Total time spent by all reduce tasks (ms)=227199
		Total vcore-seconds taken by all map tasks=3308143
		Total vcore-seconds taken by all reduce tasks=227199
		Total megabyte-seconds taken by all map tasks=3387538432
		Total megabyte-seconds taken by all reduce tasks=232651776
	Map-Reduce Framework
		Map input records=781
		Map output records=2
		Map output bytes=41
		Map output materialized bytes=99
		Input split bytes=969
		Combine input records=2
		Combine output records=2
		Reduce input groups=2
		Reduce shuffle bytes=99
		Reduce input records=2
		Reduce output records=2
		Spilled Records=4
		Shuffled Maps =9
		Failed Shuffles=0
		Merged Map outputs=9
		GC time elapsed (ms)=213752
		CPU time spent (ms)=39770
		Physical memory (bytes) snapshot=1636868096
		Virtual memory (bytes) snapshot=7041122304
		Total committed heap usage (bytes)=1388314624
	Shuffle Errors
		BAD_ID=0
		CONNECTION=0
		IO_ERROR=0
		WRONG_LENGTH=0
		WRONG_MAP=0
		WRONG_REDUCE=0
	File Input Format Counters 
		Bytes Read=27236
	File Output Format Counters 
		Bytes Written=143
16/08/15 11:49:47 INFO ipc.Client: Retrying connect to server: rain/192.168.126.136:45795. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
16/08/15 11:49:48 INFO ipc.Client: Retrying connect to server: rain/192.168.126.136:45795. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
16/08/15 11:49:49 INFO ipc.Client: Retrying connect to server: rain/192.168.126.136:45795. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
16/08/15 11:49:50 INFO mapred.ClientServiceDelegate: Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
16/08/15 11:50:49 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
16/08/15 11:50:51 INFO input.FileInputFormat: Total input paths to process : 1
16/08/15 11:50:51 INFO mapreduce.JobSubmitter: number of splits:1
16/08/15 11:50:53 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1471230621598_0002
16/08/15 11:50:53 INFO impl.YarnClientImpl: Submitted application application_1471230621598_0002
16/08/15 11:50:53 INFO mapreduce.Job: The url to track the job: [color=red]http://rain:8088/proxy/application_1471230621598_0002/[/color]
16/08/15 11:50:53 INFO mapreduce.Job: Running job: job_1471230621598_0002
16/08/15 11:51:29 INFO mapreduce.Job: Job job_1471230621598_0002 running in uber mode : false
16/08/15 11:51:29 INFO mapreduce.Job:  map 0% reduce 0%
16/08/15 11:51:39 INFO mapreduce.Job:  map 100% reduce 0%
16/08/15 11:51:48 INFO mapreduce.Job:  map 100% reduce 100%
16/08/15 11:51:51 INFO mapreduce.Job: Job job_1471230621598_0002 completed successfully
16/08/15 11:51:51 INFO mapreduce.Job: Counters: 49
	File System Counters
		FILE: Number of bytes read=51
		FILE: Number of bytes written=230397
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=276
		HDFS: Number of bytes written=29
		HDFS: Number of read operations=7
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=2
	Job Counters 
		Launched map tasks=1
		Launched reduce tasks=1
		Data-local map tasks=1
		Total time spent by all maps in occupied slots (ms)=6533
		Total time spent by all reduces in occupied slots (ms)=8187
		Total time spent by all map tasks (ms)=6533
		Total time spent by all reduce tasks (ms)=8187
		Total vcore-seconds taken by all map tasks=6533
		Total vcore-seconds taken by all reduce tasks=8187
		Total megabyte-seconds taken by all map tasks=6689792
		Total megabyte-seconds taken by all reduce tasks=8383488
	Map-Reduce Framework
		Map input records=2
		Map output records=2
		Map output bytes=41
		Map output materialized bytes=51
		Input split bytes=133
		Combine input records=0
		Combine output records=0
		Reduce input groups=1
		Reduce shuffle bytes=51
		Reduce input records=2
		Reduce output records=2
		Spilled Records=4
		Shuffled Maps =1
		Failed Shuffles=0
		Merged Map outputs=1
		GC time elapsed (ms)=59
		CPU time spent (ms)=1660
		Physical memory (bytes) snapshot=467501056
		Virtual memory (bytes) snapshot=1429606400
		Total committed heap usage (bytes)=276299776
	Shuffle Errors
		BAD_ID=0
		CONNECTION=0
		IO_ERROR=0
		WRONG_LENGTH=0
		WRONG_MAP=0
		WRONG_REDUCE=0
	File Input Format Counters 
		Bytes Read=143
	File Output Format Counters 
		Bytes Written=29

查看结果
5)
donald_draper@rain:/hadoop/hadoop-2.7.1$  hdfs  dfs  -get /test/output   output
16/08/15 11:52:19 WARN hdfs.DFSClient: DFSInputStream has been closed already
16/08/15 11:52:19 WARN hdfs.DFSClient: DFSInputStream has been closed already

6)
donald_draper@rain:/hadoop/hadoop-2.7.1$ cat   output/* 
1	dfsadmin
1	dfs.replication


备注:另外一种查看结果的方式
 
 hdfs dfs -cat /test/output/* 

12.关闭hadoop
stop-yarn.sh
mr-jobhistory-daemon.sh stop historyserver
stop-dfs.sh 

访问地址:
[url]http://192.168.126.136:50070 namenode[/url]



[url]http://192.168.126.136:8088 resourcemanager [/url]




[url]http://192.168.126.136:19888  jobhistroysever [/url]






相关错误:
2016-08-15 11:28:50,625 FATAL org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer: Error starting JobHistoryServer
org.apache.hadoop.yarn.webapp.WebAppException: Error starting http server
at org.apache.hadoop.yarn.webapp.WebApps$Builder.start(WebApps.java:279)
at org.apache.hadoop.mapreduce.v2.hs.HistoryClientService.initializeWebApp(HistoryClientService.java:156)
at org.apache.hadoop.mapreduce.v2.hs.HistoryClientService.serviceStart(HistoryClientService.java:121)
at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120)
at org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.serviceStart(JobHistoryServer.java:195)
at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.launchJobHistoryServer(JobHistoryServer.java:222)
at org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.main(JobHistoryServer.java:231)
Caused by: java.net.SocketException: Unresolved address
问题解决:
查看mapred-site.xml的服务器地址,及web地址配置



  • 大小: 36.2 KB
  • 大小: 73.5 KB
  • 大小: 54.2 KB
分享到:
评论

相关推荐

    hadoop伪分布式配置教程.doc

    本教程详细指导如何在Ubuntu 14.04 64位系统上配置Hadoop的伪分布式环境。虽然教程是基于Ubuntu 14.04,但同样适用于Ubuntu 12.04、16.04以及32位系统,同时也适用于CentOS/RedHat系统的类似配置。教程经过验证,...

    Hadoop2.7.1+Hbase1.2.1集群环境搭建(7)hbase 性能优化

    本篇将详细阐述如何在Hadoop 2.7.1环境下搭建HBase 1.2.1集群,并进行性能优化,以提升系统效率。 首先,我们需要了解Hadoop和HBase的基本概念。Hadoop是基于分布式文件系统HDFS(Hadoop Distributed File System)...

    hadoop2.7.1.rar

    标题中的"hadoop2.7.1.rar"表明这是一个关于Apache Hadoop的压缩文件,具体版本为2.7.1。Hadoop是一个开源框架,主要用于分布式存储和计算,它由Apache软件基金会开发,广泛应用于大数据处理领域。这个压缩包可能是...

    hadoop2.7.1 windows 64位 dll文件

    在这个特定的压缩包中,我们关注的是Hadoop 2.7.1版本的Windows 64位DLL(动态链接库)文件,这些文件对于在Windows环境下运行Hadoop生态系统至关重要。 1. **winutils.exe**: 这是Hadoop在Windows上运行的关键组件...

    hadoop2.7.1+zk3.5+hbase2.1+phoenix 安装部署环境打包

    本压缩包提供了这些组件的安装部署资源,便于快速搭建一个完整的Hadoop2.7.1、ZK3.5、HBase2.1和Phoenix5.1.0的基础环境。 首先,Hadoop是Apache开源项目,它提供了分布式文件系统(HDFS)和MapReduce计算框架,...

    hadoop完全分布式集群搭建

    在开始搭建Hadoop完全分布式集群之前,需要确保已经准备好相应的硬件资源和软件环境。本篇指南旨在为初学者提供一个全面且详细的Hadoop集群搭建流程,以便更好地理解和掌握大数据处理的基本架构。 #### 二、硬件...

    hadoop2.7.1集群搭建手册.docx

    在这个文档中,我们将详细讲解如何在Linux Ubuntu 64位操作系统环境下搭建Hadoop 2.7.1集群。\n\n**环境准备**\n\n首先,我们需要在VirtualBox中创建Linux Ubuntu 64位虚拟机作为系统环境。Hadoop 2.7.1适用于任何...

    虚拟机搭建Hadoop伪分布式及Hbase.docx

    【虚拟机搭建Hadoop伪分布式及Hbase】的文档主要涉及了如何在虚拟机环境下配置Hadoop和Hbase。下面将详细阐述整个过程的关键步骤和相关知识点。 首先,我们需要准备必要的软件,包括虚拟机软件VMware 16.0、Ubuntu ...

    基于虚拟机集群hadoop2.7.1配置文件

    首先,我们来看标题——"基于虚拟机集群hadoop2.7.1配置文件"。这意味着我们要在多台虚拟机上建立一个Hadoop集群,使用的是Hadoop 2.7.1版本。这个版本是Hadoop的稳定版本,包含了YARN(Yet Another Resource ...

    hadoop2.7.1 Windows安装依赖文件

    总之,这份“hadoop2.7.1 Windows安装依赖文件”集合为在Windows环境下搭建Hadoop提供了必要的组件,通过正确配置和使用这些文件,开发者可以在Windows系统上顺利运行Hadoop,进行大数据处理和分析任务。

    hadoop-2.7.1.tar.gz

    总的来说,`hadoop-2.7.1.tar.gz` 包含了搭建、配置和运行一个功能齐全的Hadoop环境所需的所有文件,为大数据处理提供了强大的基础。无论是初学者还是经验丰富的开发者,都能从中学习到关于Hadoop分布式计算框架的...

    hadoop2.7.1+hbase2.1.4+zookeeper3.6.2.rar

    标题 "hadoop2.7.1+hbase2.1.4+zookeeper3.6.2.rar" 提供的信息表明这是一个包含Hadoop 2.7.1、HBase 2.1.4和ZooKeeper 3.6.2的软件集合。这个压缩包可能包含了这些分布式系统的安装文件、配置文件、文档以及其他...

    winutils.exe_hadoop-2.7.1

    总之,Hadoop 2.7.1版本的Winutils.exe和hadoop.dll在Windows环境下提供了对Hadoop的基本支持。通过正确配置环境变量,用户可以在Windows上搭建和运行Hadoop,进行大数据处理。在实际操作中,理解这些组件的功能和...

    hadoop win7环境搭建

    本文档旨在提供在 Windows 7 操作系统上搭建 Hadoop 的详细步骤,帮助读者顺利搭建并测试 Hadoop 环境。 #### 系统与软件准备 - **操作系统**: Windows 7 (64位) - **Hadoop 版本**: 2.7.1 - **JDK 版本**: 1.7 - ...

    虚拟机环境下Hadoop2.7.1+HBase1.3.5安装配置手册 .docx

    ### Hadoop2.7.1 + HBase1.3.5 在 CentOS6.5 虚拟机环境下的安装配置指南 #### 准备工作 为了确保 Hadoop 和 HBase 的顺利安装,需要提前做好一系列准备工作,包括安装 VMware、设置虚拟机、配置 CentOS 操作系统等...

    hadoop开发环境搭建

    ### Hadoop开发环境搭建知识点详解 #### 一、Hadoop简介及重要性 Hadoop是一个开源的分布式计算框架,能够高效地处理大规模数据集。它主要由两大部分组成:Hadoop Distributed File System (HDFS) 和 MapReduce。...

Global site tag (gtag.js) - Google Analytics