之前配置的是1.0.2,这个版本较老了,所以升级成2.7.1了。
大致上两个版本的配置差异不会太大。
规划:
ubuntu1 172.19.43.178 master,namenode,jobtracker-master
ubuntu2 172.19.43.114 slave1,datanode,tasktracker-slave1
ubuntu3 172,19.43.98 slave2,datanode,tasktracker-slave2
1.配置jdk 1.7
下载64位 jdk1.7
复制jdk1.7到 /usr/java文件夹下,解压文件
root@vinking:/home/vinking/Downloads# cp jdk-7u71-linux-x64.tar.gz /usr/java
#sudo tar -zxvf jdk-7u71-linux-x64.tar.gz
编辑环境变量,增加如下
#vi /etc/profile
#set java environment
export JAVA_HOME=/usr/java/jdk1.7.0_71
export JRE_HOME=/usr/java/jdk1.7.0_71/jre
export CLASSPATH=.:$JAVA_HOME/lib:$JRE_HOME/lib:$CLASSPATH
export PATH=$JAVA_HOME/bin:$JRE_HOME/bin:$JAVA_HOME:$PATH
#source /etc/profile
查看是否安装成功:
#java -version
java version "1.7.0_71"
Java(TM) SE Runtime Environment (build 1.7.0_71-b14)
Java HotSpot(TM) 64-Bit Server VM (build 24.71-b01, mixed mode)
2.配置ssh免密码登录
root@vinking:/home/vinking# sudo apt-get install openssh-server
root@vinking:/home/vinking# sudo apt-get install openssh-client
root@vinking:/home/vinking# sudo /etc/init.d/ssh start
start: Job is already running: ssh
root@vinking:/home/vinking# ps -e|grep ssh
2298 ? 00:00:00 ssh-agent
99652 ? 00:00:00 sshd
root@vinking:/home/vinking# ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
Generating public/private dsa key pair.
Your identification has been saved in /root/.ssh/id_dsa.
Your public key has been saved in /root/.ssh/id_dsa.pub.
The key fingerprint is:
e2:8d:00:e0:ba:8a:07:37:e9:d2:11:79:20:70:fa:1d root@vinking
The key's randomart image is:
+--[ DSA 1024]----+
|+ . |
|o+. |
|.o.oE |
|..oo.. |
|. .+o . S |
|..= o + |
|.= o o . |
|+ + |
|oo |
+-----------------+
# cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
# ssh -version
Bad escape character 'rsion'.
# ssh localhost
登录成功
3.安装2.7.1hadoop
官网安装说明:http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-common/ClusterSetup.html
参考网上一篇文章:http://wangzhijian.blog.51cto.com/6427016/1766619
a.下载安装
下载hadoop
#wget http://apache.fayea.com/hadoop/common/hadoop-2.7.1/hadoop-2.7.1.tar.gz
解压并安装
#tar-zxvf hadoop-2.7.1.tar.gz
#sudo mv hadoop-2.7.1 /usr/local/hadoop
b.配置环境变量
root@vinking:/usr/local/hadoop# vi ~/.bashrc
添加如下:
# Hadoop Start
export JAVA_HOME=/usr/java/jdk1.7.0_71
export HADOOP_HOME=/usr/local/hadoop
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:
# Hadoop End
root@vinking:/usr/local/hadoop# source ~/.bashrc
c.添加hosts(每台机都要)
root@vinking:/usr/local/hadoop# vi /etc/hosts
root@vinking:/usr/local/hadoop# cat /etc/hosts
127.0.0.1 localhost
172.19.43.178 master
172.19.43.114 slave1
172.19.43.98 slave2
d.配置集群:
复制多两份 变成3台机 分别配置成master slave1 slave2
配置master
#hostname master
#vi /etc/hostname
master
#cd /usr/local/hadoop/etc/hadoop
配置core-site.xml
#vi core-site.xml
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://master:9000</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>4096</value>
</property>
</configuration>
配置hdfs-site.xml
vi hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/usr/local/hadoop/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/usr/local/hadoop/dfs/data</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>master:50090</value>
</property>
</configuration>
配置mapred-site.xml
#sudo cp /usr/local/hadoop/etc/hadoop/mapred-site.xml.template /usr/local/hadoop/etc/hadoop/mapred-site.xml
#vi mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
<final>true</final>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>master:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>master:19888</value>
</property>
</configuration>
配置yarn-site.xml
#vi yarn-site.xml
<configuration>
<property>
<name>yarn.acl.enable</name>
<value>false</value>
<final>true</final>
</property>
<property>
<name>yarn.admin.acl</name>
<value>*</value>
</property>
<property>
<name>yarn.log-aggregation-enable</name>
<value>false</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>master:8032</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>master:8035</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>master:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>master:8088</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>master</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
e.指定IAVA_HOME安装路径
#vi/usr/local/hadoop/etc/hadoop/hadoop-env.sh
添加如下:
export JAVA_HOME=/usr/java/jdk1.7.0_71
f.指定集群中的master节点(NameNode、ResourceManager)所拥有的slaver节点
#vi /usr/local/hadoop/etc/hadoop/slaves
Slave1
Slave2
master的配置向Slave复制Hadoop
#scp -r /usr/local/hadoop slave1:/usr/local/
#scp -r /usr/local/hadoop slave2:/usr/local/
g.master格式化节点
root@master:/usr/local/hadoop/etc/hadoop# hdfs namenode -format
master启动服务:
root@master:/usr/local/hadoop/etc/hadoop# start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
h.查看配置启动是否成功
查看DFS使用状况
root@master:/usr/local/hadoop/bin# hadoop dfsadmin -report
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
Configured Capacity: 39891361792 (37.15 GB)
Present Capacity: 24475799552 (22.79 GB)
DFS Remaining: 24475750400 (22.79 GB)
DFS Used: 49152 (48 KB)
DFS Used%: 0.00%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0
-------------------------------------------------
Live datanodes (2):
Name: 172.19.43.114:50010 (slave1)
Hostname: slave1
Decommission Status : Normal
Configured Capacity: 19945680896 (18.58 GB)
DFS Used: 24576 (24 KB)
Non DFS Used: 7552659456 (7.03 GB)
DFS Remaining: 12392996864 (11.54 GB)
DFS Used%: 0.00%
DFS Remaining%: 62.13%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Thu Jun 23 18:42:28 HKT 2016
Name: 172.19.43.98:50010 (slave2)
Hostname: slave2
Decommission Status : Normal
Configured Capacity: 19945680896 (18.58 GB)
DFS Used: 24576 (24 KB)
Non DFS Used: 7862902784 (7.32 GB)
DFS Remaining: 12082753536 (11.25 GB)
DFS Used%: 0.00%
DFS Remaining%: 60.58%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Thu Jun 23 18:42:27 HKT 2016
查看master后台java进程(前面是进程号,后面是进程)
root@slave1:~# jps
3204 DataNode
3461 Jps
3344 NodeManager
root@slave2:~# jps
32196 DataNode
32425 Jps
32324 NodeManager
root@master:/usr/local/hadoop# jps
4613 ResourceManager
4436 SecondaryNameNode
4250 NameNode
7436 Jps
http://localhost:50070 网页查看集群的运行情况
4.验证测试
root@master:/usr/local/hadoop# hadoop dfs -mkdir /input
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
这里要把hadoop换成hdfs
下面这样就不会出错了
root@master:/usr/local/hadoop# hdfs dfs -mkdir /user
root@master:/usr/local/hadoop# hdfs dfs -mkdir /user/hadoop
root@master:/usr/local/hadoop# hdfs dfs -mkdir /user/hadoop/input
建立input文件夹
root@master:/usr/local/hadoop# mkdir input
root@master:/usr/local/hadoop# cd input
root@master:/usr/local/hadoop/input# vi test.txt
root@master:/usr/local/hadoop/input# cat test.txt
Hello World
Hello Hadoop
将test.txt上传到hdfs上
root@master:/usr/local/hadoop/input# hdfs dfs -put test.txt /user/hadoop/input
执行Wordcount程序
root@master:/usr/local/hadoop# hadoop jar share/hadoop/mapreduce/sources/hadoop-mapreduce-examples-2.7.1-sources.jar org.apache.hadoop.examples.WordCount /user/hadoop/input /user/hadoop/output
16/06/24 11:27:30 INFO client.RMProxy: Connecting to ResourceManager at master/172.19.43.178:8032
16/06/24 11:27:31 INFO input.FileInputFormat: Total input paths to process : 1
16/06/24 11:27:31 INFO mapreduce.JobSubmitter: number of splits:1
16/06/24 11:27:31 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1466678312847_0002
16/06/24 11:27:32 INFO impl.YarnClientImpl: Submitted application application_1466678312847_0002
16/06/24 11:27:32 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1466678312847_0002/
16/06/24 11:27:32 INFO mapreduce.Job: Running job: job_1466678312847_0002
16/06/24 11:28:40 INFO mapreduce.Job: Job job_1466678312847_0002 running in uber mode : false
16/06/24 11:28:40 INFO mapreduce.Job: map 0% reduce 0%
16/06/24 11:30:08 INFO mapreduce.Job: map 100% reduce 0%
16/06/24 11:31:22 INFO mapreduce.Job: map 100% reduce 100%
16/06/24 11:31:23 INFO mapreduce.Job: Job job_1466678312847_0002 completed successfully
16/06/24 11:31:23 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=43
FILE: Number of bytes written=230815
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=135
HDFS: Number of bytes written=25
HDFS: Number of read operations=6
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=85589
Total time spent by all reduces in occupied slots (ms)=70001
Total time spent by all map tasks (ms)=85589
Total time spent by all reduce tasks (ms)=70001
Total vcore-seconds taken by all map tasks=85589
Total vcore-seconds taken by all reduce tasks=70001
Total megabyte-seconds taken by all map tasks=87643136
Total megabyte-seconds taken by all reduce tasks=71681024
Map-Reduce Framework
Map input records=2
Map output records=4
Map output bytes=41
Map output materialized bytes=43
Input split bytes=110
Combine input records=4
Combine output records=3
Reduce input groups=3
Reduce shuffle bytes=43
Reduce input records=3
Reduce output records=3
Spilled Records=6
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=602
CPU time spent (ms)=9800
Physical memory (bytes) snapshot=301338624
Virtual memory (bytes) snapshot=1334976512
Total committed heap usage (bytes)=136122368
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=25
File Output Format Counters
Bytes Written=25
查看生产的单词统计数据
root@master:/usr/local/hadoop# hdfs dfs -ls /user/hadoop/input
Found 1 items
-rw-r--r-- 2 root supergroup 25 2016-06-24 11:03 /user/hadoop/input/test.txt
root@master:/usr/local/hadoop# hdfs dfs -ls /user/hadoop/output
Found 2 items
-rw-r--r-- 2 root supergroup 0 2016-06-24 11:31 /user/hadoop/output/_SUCCESS
-rw-r--r-- 2 root supergroup 25 2016-06-24 11:31 /user/hadoop/output/part-r-00000
root@master:/usr/local/hadoop# hdfs dfs -cat /user/hadoop/output/part-r-00000
Hadoop 1
Hello 2
World 1
相关推荐
这个名为“hadoop-2.7.1.tar.gz.zip”的文件包含了Hadoop的2.7.1版本,这是一个非常重要的里程碑,因为它包含了对Hadoop生态系统的许多改进和修复。 首先,我们要明白文件的结构。这是一个压缩文件,最外层是.zip...
hadoop-annotations-2.10.0.jar hadoop-auth-2.10.0.jar hadoop-common-2.10.0-tests.jar hadoop-common-2.10.0.jar hadoop-nfs-2.10.0.jar hamcrest-core-1.3.jar htrace-core4-4.1.0-incubating.jar httpclient-...
hadoop-2.7.1.tar.gz做大数据处理,分布式计算的,安装在linux服务器上。
hadoop-mapreduce-examples-2.7.1.jar
`hadoop-2.7.1.tar.gz` 是一个包含了Hadoop 2.7.1版本源码或二进制文件的压缩包。这个版本在Hadoop的发展历程中是一个重要的里程碑,因为它引入了许多改进和新特性,旨在提升性能、稳定性和易用性。 1. **Hadoop...
hadoop2 lzo 文件 ,编译好的64位 hadoop-lzo-0.4.20.jar 文件 ,在mac 系统下编译的,用法:解压后把hadoop-lzo-0.4.20.jar 放到你的hadoop 安装路径下的lib 下,把里面lib/Mac_OS_X-x86_64-64 下的所有文件 拷到 ...
hadoop-eclipse-plugin-2.7.1.jar插件,直接放在eclipse插件目录中
Hadoop是Apache软件基金会开发的一个开源分布式计算框架,它允许在廉价硬件上处理大量数据,是大数据处理领域的重要...通过下载并部署hadoop-2.7.3.tar.gz,用户可以搭建自己的Hadoop集群,从而应对大数据时代的挑战。
hadoop-annotations-3.1.1.jar hadoop-common-3.1.1.jar hadoop-mapreduce-client-core-3.1.1.jar hadoop-yarn-api-3.1.1.jar hadoop-auth-3.1.1.jar hadoop-hdfs-3.1.1.jar hadoop-mapreduce-client-hs-3.1.1.jar ...
替换/path/to/hadoop-3.1.3为实际的Hadoop安装路径,然后执行`source ~/.bashrc`使改动生效。 四、配置Hadoop参数 在Hadoop的配置目录($HADOOP_HOME/etc/hadoop)下,有两个主要的配置文件:core-site.xml和hdfs-...
这个压缩文件"hadoop-3.1.4.tar.zip"包含了Hadoop的源代码、文档、配置文件以及相关的依赖库,使得用户可以方便地在本地环境中安装和运行Hadoop。 在解压文件"hadoop-3.1.4.tar.gz"后,我们可以得到Hadoop的完整...
本文将详细探讨与"Hadoop.dll"和"winutils.exe"相关的知识点,以及它们在Hadoop-2.7.1版本中的作用。 Hadoop.dll是Hadoop在Windows操作系统上的一个关键组件,它是Apache Hadoop对Windows平台的适配部分。由于...
这个“hadoop-3.3.1.tar.gz”文件是一个压缩包,包含了所有必要的组件和配置文件,用于在各种操作系统上安装和运行Hadoop。 1. **Hadoop架构** Hadoop由两个主要组件构成:Hadoop Distributed File System (HDFS) ...
这个“hadoop-2.7.4.tar.gz”文件是针对Windows平台编译好的Hadoop 2.7.4版本安装包,提供了在Windows系统上部署和运行Hadoop集群的可能性。 在Hadoop 2.7.4中,有以下几个关键知识点: 1. **HDFS(Hadoop ...
这个压缩包文件“hadoop-3.3.0.tar.gz”包含了Hadoop的3.3.0版本,这是一个重要的更新,提供了许多改进和新特性。在Linux环境中,我们可以使用tar命令来解压这个文件,以便在本地系统上安装和运行Hadoop。 Hadoop的...
这个压缩包“hadoop-3.1.4.tar.gz”是Hadoop 3.1.4版本的Linux 64位编译安装包,适用于CentOS 6操作系统。在大数据领域,Hadoop扮演着至关重要的角色,它提供了高效、可扩展的数据处理能力,使得企业能够从庞杂的...
Hadoop是一个由Apache基金会所开发的分布式系统基础架构。用户可以在不了解分布式底层细节的情况下,开发分布式程序。充分利用集群的威力进 Hadoop是一个由Apache基金会所开发的分布式系统基础架构。用户可以在不...
这个名为“hadoop-3.2.4.tar.gz”的压缩包文件包含了Hadoop 3.2.4版本的所有源代码、二进制文件、配置文件和其他相关资源。Hadoop在大数据领域扮演着至关重要的角色,因为它的主要组件——HDFS(Hadoop Distributed ...