`
vinking934296
  • 浏览: 107202 次
  • 性别: Icon_minigender_1
  • 来自: 广州
社区版块
存档分类
最新评论

hadoop实战-06.ubuntu14.0安装hadoop 2.7.1( 3台主机) 小集群

 
阅读更多

之前配置的是1.0.2,这个版本较老了,所以升级成2.7.1了。

大致上两个版本的配置差异不会太大。

规划:

ubuntu1 172.19.43.178  master,namenode,jobtracker-master

ubuntu2 172.19.43.114  slave1,datanode,tasktracker-slave1

ubuntu3 172,19.43.98   slave2,datanode,tasktracker-slave2

 

1.配置jdk 1.7

下载64位 jdk1.7

复制jdk1.7到 /usr/java文件夹下,解压文件

root@vinking:/home/vinking/Downloads# cp jdk-7u71-linux-x64.tar.gz /usr/java

#sudo tar -zxvf jdk-7u71-linux-x64.tar.gz

 

编辑环境变量,增加如下

#vi /etc/profile

#set java environment

export JAVA_HOME=/usr/java/jdk1.7.0_71

export JRE_HOME=/usr/java/jdk1.7.0_71/jre

export CLASSPATH=.:$JAVA_HOME/lib:$JRE_HOME/lib:$CLASSPATH

export PATH=$JAVA_HOME/bin:$JRE_HOME/bin:$JAVA_HOME:$PATH

 

#source /etc/profile

查看是否安装成功:

#java -version

java version "1.7.0_71"

Java(TM) SE Runtime Environment (build 1.7.0_71-b14)

Java HotSpot(TM) 64-Bit Server VM (build 24.71-b01, mixed mode)

 

2.配置ssh免密码登录

root@vinking:/home/vinking# sudo apt-get install openssh-server

root@vinking:/home/vinking# sudo apt-get install openssh-client

root@vinking:/home/vinking# sudo /etc/init.d/ssh start

start: Job is already running: ssh

root@vinking:/home/vinking# ps -e|grep ssh

  2298 ?        00:00:00 ssh-agent

 99652 ?        00:00:00 sshd

 

root@vinking:/home/vinking# ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa

Generating public/private dsa key pair.

Your identification has been saved in /root/.ssh/id_dsa.

Your public key has been saved in /root/.ssh/id_dsa.pub.

The key fingerprint is:

e2:8d:00:e0:ba:8a:07:37:e9:d2:11:79:20:70:fa:1d root@vinking

The key's randomart image is:

+--[ DSA 1024]----+

|+ .              |

|o+.              |

|.o.oE            |

|..oo..           |

|. .+o . S        |

|..=  o +         |

|.= o  o .        |

|+ +              |

|oo               |

+-----------------+

# cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

# ssh -version

Bad escape character 'rsion'.

# ssh localhost

 

登录成功

 

3.安装2.7.1hadoop

官网安装说明:http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-common/ClusterSetup.html

参考网上一篇文章:http://wangzhijian.blog.51cto.com/6427016/1766619

 

a.下载安装

下载hadoop

#wget http://apache.fayea.com/hadoop/common/hadoop-2.7.1/hadoop-2.7.1.tar.gz

解压并安装

#tar-zxvf hadoop-2.7.1.tar.gz

#sudo mv hadoop-2.7.1 /usr/local/hadoop

 

b.配置环境变量

root@vinking:/usr/local/hadoop# vi ~/.bashrc

添加如下:

# Hadoop Start

export JAVA_HOME=/usr/java/jdk1.7.0_71

export HADOOP_HOME=/usr/local/hadoop

export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:

export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:

# Hadoop End

root@vinking:/usr/local/hadoop# source  ~/.bashrc

 

c.添加hosts(每台机都要)

root@vinking:/usr/local/hadoop# vi /etc/hosts

root@vinking:/usr/local/hadoop# cat /etc/hosts

127.0.0.1          localhost

172.19.43.178   master

172.19.43.114   slave1

172.19.43.98    slave2

 

d.配置集群:
 复制多两份 变成3台机 分别配置成master slave1 slave2

配置master

#hostname master

#vi /etc/hostname

master

 

#cd /usr/local/hadoop/etc/hadoop

 

配置core-site.xml

#vi core-site.xml

<configuration>  

    <property>  

        <name>fs.default.name</name>  

        <value>hdfs://master:9000</value>  

    </property>  

    <property>  

        <name>io.file.buffer.size</name>  

        <value>4096</value>  

    </property>  

</configuration>  

 

配置hdfs-site.xml

vi hdfs-site.xml

<configuration>  

    <property>  

        <name>dfs.replication</name>  

        <value>2</value>  

    </property>  

    <property>  

        <name>dfs.namenode.name.dir</name>  

        <value>/usr/local/hadoop/dfs/name</value>  

    </property>  

    <property>  

        <name>dfs.datanode.data.dir</name>  

        <value>/usr/local/hadoop/dfs/data</value>  

    </property>  

    <property>  

        <name>dfs.namenode.secondary.http-address</name>  

        <value>master:50090</value>  

    </property>  

</configuration> 

 

配置mapred-site.xml

#sudo cp /usr/local/hadoop/etc/hadoop/mapred-site.xml.template /usr/local/hadoop/etc/hadoop/mapred-site.xml

#vi mapred-site.xml

<configuration>  

    <property>  

        <name>mapreduce.framework.name</name>  

        <value>yarn</value>  

     <final>true</final>

    </property>  

    <property>  

        <name>mapreduce.jobhistory.address</name>  

        <value>master:10020</value>  

    </property>  

    <property>  

        <name>mapreduce.jobhistory.webapp.address</name>  

        <value>master:19888</value>  

    </property>  

</configuration>  

 

配置yarn-site.xml

#vi yarn-site.xml

<configuration>

    <property>  

        <name>yarn.acl.enable</name>  

        <value>false</value>  

     <final>true</final>

    </property>  

    <property>  

        <name>yarn.admin.acl</name>  

        <value>*</value>  

    </property>  

    <property>  

        <name>yarn.log-aggregation-enable</name>  

        <value>false</value>  

    </property>  

    <property>  

        <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>  

        <value>org.apache.hadoop.mapred.ShuffleHandler</value>  

    </property>  

    <property>  

        <name>yarn.resourcemanager.address</name>  

        <value>master:8032</value>  

    </property>  

    <property>  

        <name>yarn.resourcemanager.resource-tracker.address</name>  

        <value>master:8035</value>  

    </property>  

    <property>  

        <name>yarn.resourcemanager.admin.address</name>  

        <value>master:8033</value>  

    </property>  

    <property>  

        <name>yarn.resourcemanager.webapp.address</name>  

        <value>master:8088</value>  

    </property>  

    <property>  

        <name>yarn.resourcemanager.hostname</name>  

        <value>master</value>  

    </property>  

    <property>  

        <name>yarn.nodemanager.aux-services</name>  

        <value>mapreduce_shuffle</value>  

    </property>  

</configuration>

 

e.指定IAVA_HOME安装路径

#vi/usr/local/hadoop/etc/hadoop/hadoop-env.sh

添加如下:

export JAVA_HOME=/usr/java/jdk1.7.0_71

 

f.指定集群中的master节点(NameNode、ResourceManager)所拥有的slaver节点

#vi /usr/local/hadoop/etc/hadoop/slaves

Slave1

Slave2

 

master的配置向Slave复制Hadoop

#scp -r /usr/local/hadoop slave1:/usr/local/

#scp -r /usr/local/hadoop slave2:/usr/local/

 

g.master格式化节点 

root@master:/usr/local/hadoop/etc/hadoop# hdfs namenode -format

 

master启动服务:

root@master:/usr/local/hadoop/etc/hadoop# start-all.sh

This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh

 

h.查看配置启动是否成功

看DFS使用状况

root@master:/usr/local/hadoop/bin# hadoop dfsadmin -report

DEPRECATED: Use of this script to execute hdfs command is deprecated.

Instead use the hdfs command for it.

 

Configured Capacity: 39891361792 (37.15 GB)

Present Capacity: 24475799552 (22.79 GB)

DFS Remaining: 24475750400 (22.79 GB)

DFS Used: 49152 (48 KB)

DFS Used%: 0.00%

Under replicated blocks: 0

Blocks with corrupt replicas: 0

Missing blocks: 0

Missing blocks (with replication factor 1): 0

 

-------------------------------------------------

Live datanodes (2):

 

Name: 172.19.43.114:50010 (slave1)

Hostname: slave1

Decommission Status : Normal

Configured Capacity: 19945680896 (18.58 GB)

DFS Used: 24576 (24 KB)

Non DFS Used: 7552659456 (7.03 GB)

DFS Remaining: 12392996864 (11.54 GB)

DFS Used%: 0.00%

DFS Remaining%: 62.13%

Configured Cache Capacity: 0 (0 B)

Cache Used: 0 (0 B)

Cache Remaining: 0 (0 B)

Cache Used%: 100.00%

Cache Remaining%: 0.00%

Xceivers: 1

Last contact: Thu Jun 23 18:42:28 HKT 2016

 

 

Name: 172.19.43.98:50010 (slave2)

Hostname: slave2

Decommission Status : Normal

Configured Capacity: 19945680896 (18.58 GB)

DFS Used: 24576 (24 KB)

Non DFS Used: 7862902784 (7.32 GB)

DFS Remaining: 12082753536 (11.25 GB)

DFS Used%: 0.00%

DFS Remaining%: 60.58%

Configured Cache Capacity: 0 (0 B)

Cache Used: 0 (0 B)

Cache Remaining: 0 (0 B)

Cache Used%: 100.00%

Cache Remaining%: 0.00%

Xceivers: 1

Last contact: Thu Jun 23 18:42:27 HKT 2016

 

查看master后台java进程(前面是进程号,后面是进程)

root@slave1:~# jps

3204 DataNode

3461 Jps

3344 NodeManager

root@slave2:~# jps

32196 DataNode

32425 Jps

32324 NodeManager

root@master:/usr/local/hadoop# jps

4613 ResourceManager

4436 SecondaryNameNode

4250 NameNode

7436 Jps

 

http://localhost:50070 网页查看集群的运行情况

4.验证测试

root@master:/usr/local/hadoop# hadoop dfs -mkdir /input

DEPRECATED: Use of this script to execute hdfs command is deprecated.

Instead use the hdfs command for it.

这里要把hadoop换成hdfs

下面这样就不会出错了

root@master:/usr/local/hadoop# hdfs dfs -mkdir /user

root@master:/usr/local/hadoop# hdfs dfs -mkdir /user/hadoop

root@master:/usr/local/hadoop# hdfs dfs -mkdir /user/hadoop/input

建立input文件夹

root@master:/usr/local/hadoop# mkdir input

root@master:/usr/local/hadoop# cd input

root@master:/usr/local/hadoop/input# vi test.txt

root@master:/usr/local/hadoop/input# cat test.txt

Hello World

Hello Hadoop

 

将test.txt上传到hdfs上

root@master:/usr/local/hadoop/input# hdfs dfs -put test.txt /user/hadoop/input

执行Wordcount程序

root@master:/usr/local/hadoop# hadoop jar share/hadoop/mapreduce/sources/hadoop-mapreduce-examples-2.7.1-sources.jar org.apache.hadoop.examples.WordCount /user/hadoop/input /user/hadoop/output

16/06/24 11:27:30 INFO client.RMProxy: Connecting to ResourceManager at master/172.19.43.178:8032

16/06/24 11:27:31 INFO input.FileInputFormat: Total input paths to process : 1

16/06/24 11:27:31 INFO mapreduce.JobSubmitter: number of splits:1

16/06/24 11:27:31 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1466678312847_0002

16/06/24 11:27:32 INFO impl.YarnClientImpl: Submitted application application_1466678312847_0002

16/06/24 11:27:32 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1466678312847_0002/

16/06/24 11:27:32 INFO mapreduce.Job: Running job: job_1466678312847_0002

16/06/24 11:28:40 INFO mapreduce.Job: Job job_1466678312847_0002 running in uber mode : false

16/06/24 11:28:40 INFO mapreduce.Job:  map 0% reduce 0%

16/06/24 11:30:08 INFO mapreduce.Job:  map 100% reduce 0%

16/06/24 11:31:22 INFO mapreduce.Job:  map 100% reduce 100%

16/06/24 11:31:23 INFO mapreduce.Job: Job job_1466678312847_0002 completed successfully

16/06/24 11:31:23 INFO mapreduce.Job: Counters: 49

File System Counters

FILE: Number of bytes read=43

FILE: Number of bytes written=230815

FILE: Number of read operations=0

FILE: Number of large read operations=0

FILE: Number of write operations=0

HDFS: Number of bytes read=135

HDFS: Number of bytes written=25

HDFS: Number of read operations=6

HDFS: Number of large read operations=0

HDFS: Number of write operations=2

Job Counters

Launched map tasks=1

Launched reduce tasks=1

Data-local map tasks=1

Total time spent by all maps in occupied slots (ms)=85589

Total time spent by all reduces in occupied slots (ms)=70001

Total time spent by all map tasks (ms)=85589

Total time spent by all reduce tasks (ms)=70001

Total vcore-seconds taken by all map tasks=85589

Total vcore-seconds taken by all reduce tasks=70001

Total megabyte-seconds taken by all map tasks=87643136

Total megabyte-seconds taken by all reduce tasks=71681024

Map-Reduce Framework

Map input records=2

Map output records=4

Map output bytes=41

Map output materialized bytes=43

Input split bytes=110

Combine input records=4

Combine output records=3

Reduce input groups=3

Reduce shuffle bytes=43

Reduce input records=3

Reduce output records=3

Spilled Records=6

Shuffled Maps =1

Failed Shuffles=0

Merged Map outputs=1

GC time elapsed (ms)=602

CPU time spent (ms)=9800

Physical memory (bytes) snapshot=301338624

Virtual memory (bytes) snapshot=1334976512

Total committed heap usage (bytes)=136122368

Shuffle Errors

BAD_ID=0

CONNECTION=0

IO_ERROR=0

WRONG_LENGTH=0

WRONG_MAP=0

WRONG_REDUCE=0

File Input Format Counters

Bytes Read=25

File Output Format Counters

Bytes Written=25

 

查看生产的单词统计数据

root@master:/usr/local/hadoop# hdfs dfs -ls /user/hadoop/input

Found 1 items

-rw-r--r--   2 root supergroup         25 2016-06-24 11:03 /user/hadoop/input/test.txt

root@master:/usr/local/hadoop# hdfs dfs -ls /user/hadoop/output

Found 2 items

-rw-r--r--   2 root supergroup          0 2016-06-24 11:31 /user/hadoop/output/_SUCCESS

-rw-r--r--   2 root supergroup         25 2016-06-24 11:31 /user/hadoop/output/part-r-00000

root@master:/usr/local/hadoop# hdfs dfs -cat /user/hadoop/output/part-r-00000

Hadoop 1

Hello 2

World 1

 

  • 大小: 18.3 KB
分享到:
评论

相关推荐

    hadoop-2.7.1.tar.gz.zip

    这个名为“hadoop-2.7.1.tar.gz.zip”的文件包含了Hadoop的2.7.1版本,这是一个非常重要的里程碑,因为它包含了对Hadoop生态系统的许多改进和修复。 首先,我们要明白文件的结构。这是一个压缩文件,最外层是.zip...

    hadoop-2.10.0jar.zip

    hadoop-annotations-2.10.0.jar hadoop-auth-2.10.0.jar hadoop-common-2.10.0-tests.jar hadoop-common-2.10.0.jar hadoop-nfs-2.10.0.jar hamcrest-core-1.3.jar htrace-core4-4.1.0-incubating.jar httpclient-...

    hadoop-2.7.1.tar.gz-百度网盘下载链接

    hadoop-2.7.1.tar.gz做大数据处理,分布式计算的,安装在linux服务器上。

    hadoop-mapreduce-examples-2.7.1.jar

    hadoop-mapreduce-examples-2.7.1.jar

    hadoop-2.7.1.tar.gz

    `hadoop-2.7.1.tar.gz` 是一个包含了Hadoop 2.7.1版本源码或二进制文件的压缩包。这个版本在Hadoop的发展历程中是一个重要的里程碑,因为它引入了许多改进和新特性,旨在提升性能、稳定性和易用性。 1. **Hadoop...

    hadoop-lzo-0.4.20.jar

    hadoop2 lzo 文件 ,编译好的64位 hadoop-lzo-0.4.20.jar 文件 ,在mac 系统下编译的,用法:解压后把hadoop-lzo-0.4.20.jar 放到你的hadoop 安装路径下的lib 下,把里面lib/Mac_OS_X-x86_64-64 下的所有文件 拷到 ...

    hadoop-eclipse-plugin-2.7.1.jar

    hadoop-eclipse-plugin-2.7.1.jar插件,直接放在eclipse插件目录中

    hadoop-2.7.3.tar.gz 下载 hadoop tar 包下载

    Hadoop是Apache软件基金会开发的一个开源分布式计算框架,它允许在廉价硬件上处理大量数据,是大数据处理领域的重要...通过下载并部署hadoop-2.7.3.tar.gz,用户可以搭建自己的Hadoop集群,从而应对大数据时代的挑战。

    hadoop最新版本3.1.1全量jar包

    hadoop-annotations-3.1.1.jar hadoop-common-3.1.1.jar hadoop-mapreduce-client-core-3.1.1.jar hadoop-yarn-api-3.1.1.jar hadoop-auth-3.1.1.jar hadoop-hdfs-3.1.1.jar hadoop-mapreduce-client-hs-3.1.1.jar ...

    hadoop-3.1.3.tar.gz

    替换/path/to/hadoop-3.1.3为实际的Hadoop安装路径,然后执行`source ~/.bashrc`使改动生效。 四、配置Hadoop参数 在Hadoop的配置目录($HADOOP_HOME/etc/hadoop)下,有两个主要的配置文件:core-site.xml和hdfs-...

    hadoop-3.1.4.tar.zip

    这个压缩文件"hadoop-3.1.4.tar.zip"包含了Hadoop的源代码、文档、配置文件以及相关的依赖库,使得用户可以方便地在本地环境中安装和运行Hadoop。 在解压文件"hadoop-3.1.4.tar.gz"后,我们可以得到Hadoop的完整...

    hadoop.dll & winutils.exe For hadoop-2.7.1

    本文将详细探讨与"Hadoop.dll"和"winutils.exe"相关的知识点,以及它们在Hadoop-2.7.1版本中的作用。 Hadoop.dll是Hadoop在Windows操作系统上的一个关键组件,它是Apache Hadoop对Windows平台的适配部分。由于...

    hadoop-3.3.1.tar.gz

    这个“hadoop-3.3.1.tar.gz”文件是一个压缩包,包含了所有必要的组件和配置文件,用于在各种操作系统上安装和运行Hadoop。 1. **Hadoop架构** Hadoop由两个主要组件构成:Hadoop Distributed File System (HDFS) ...

    hadoop-2.7.4.tar.gz

    这个“hadoop-2.7.4.tar.gz”文件是针对Windows平台编译好的Hadoop 2.7.4版本安装包,提供了在Windows系统上部署和运行Hadoop集群的可能性。 在Hadoop 2.7.4中,有以下几个关键知识点: 1. **HDFS(Hadoop ...

    hadoop-3.3.0.tar.gz

    这个压缩包文件“hadoop-3.3.0.tar.gz”包含了Hadoop的3.3.0版本,这是一个重要的更新,提供了许多改进和新特性。在Linux环境中,我们可以使用tar命令来解压这个文件,以便在本地系统上安装和运行Hadoop。 Hadoop的...

    hadoop-3.1.4.tar.gz

    这个压缩包“hadoop-3.1.4.tar.gz”是Hadoop 3.1.4版本的Linux 64位编译安装包,适用于CentOS 6操作系统。在大数据领域,Hadoop扮演着至关重要的角色,它提供了高效、可扩展的数据处理能力,使得企业能够从庞杂的...

    Hadoop下载 hadoop-3.3.3.tar.gz

    Hadoop是一个由Apache基金会所开发的分布式系统基础架构。用户可以在不了解分布式底层细节的情况下,开发分布式程序。充分利用集群的威力进 Hadoop是一个由Apache基金会所开发的分布式系统基础架构。用户可以在不...

    hadoop-3.2.4.tar.gz

    这个名为“hadoop-3.2.4.tar.gz”的压缩包文件包含了Hadoop 3.2.4版本的所有源代码、二进制文件、配置文件和其他相关资源。Hadoop在大数据领域扮演着至关重要的角色,因为它的主要组件——HDFS(Hadoop Distributed ...

Global site tag (gtag.js) - Google Analytics