一、上传hadoop包到master机器/usr目录
版本:hadoop-1.2.1.tar.gz
解压:
tar -zxvf hadoop-1.2.1.tar.gz
当前目录产出hadoop-1.2.1目录,进去创建tmp目录备用:
[root@master hadoop-1.2.1]# mkdir tmp
返回usr目录,赋给hadoop用户hadoop-1.2.1读写权限
[root@master usr]# chown -R hadoop:hadoop hadoop-1.2.1/
插曲:在后面操作时,是赋完hadoop目录权限后,再建立的tmp目录,所以格式化namenode时,出现错误:
[hadoop@master conf]$ hadoop namenode -format Warning: $HADOOP_HOME is deprecated. 13/09/08 00:33:06 INFO namenode.NameNode: STARTUP_MSG: /************************************************************ STARTUP_MSG: Starting NameNode STARTUP_MSG: host = master.hadoop/192.168.70.101 STARTUP_MSG: args = [-format] STARTUP_MSG: version = 1.2.1 STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2 -r 1503152; compiled by 'mattf' on Mon Jul 22 15:23:09 PDT 2013 STARTUP_MSG: java = 1.6.0_45 ************************************************************/ 13/09/08 00:33:06 INFO util.GSet: Computing capacity for map BlocksMap 13/09/08 00:33:06 INFO util.GSet: VM type = 32-bit 13/09/08 00:33:06 INFO util.GSet: 2.0% max memory = 1013645312 13/09/08 00:33:06 INFO util.GSet: capacity = 2^22 = 4194304 entries 13/09/08 00:33:06 INFO util.GSet: recommended=4194304, actual=4194304 13/09/08 00:33:06 INFO namenode.FSNamesystem: fsOwner=hadoop 13/09/08 00:33:06 INFO namenode.FSNamesystem: supergroup=supergroup 13/09/08 00:33:06 INFO namenode.FSNamesystem: isPermissionEnabled=true 13/09/08 00:33:06 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100 13/09/08 00:33:06 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s) 13/09/08 00:33:06 INFO namenode.FSEditLog: dfs.namenode.edits.toleration.length = 0 13/09/08 00:33:06 INFO namenode.NameNode: Caching file names occuring more than 10 times 13/09/08 00:33:07 ERROR namenode.NameNode: java.io.IOException: Cannot create directory /usr/hadoop-1.2.1/tmp/dfs/name/current at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.clearDirectory(Storage.java:294) at org.apache.hadoop.hdfs.server.namenode.FSImage.format(FSImage.java:1337) at org.apache.hadoop.hdfs.server.namenode.FSImage.format(FSImage.java:1356) at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:1261) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1467) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1488) 13/09/08 00:33:07 INFO namenode.NameNode: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down NameNode at master.hadoop/192.168.70.101 ************************************************************/ [hadoop@master conf]$
修正权限后解决。
二、配置hadoop环境变量(root) master slave都做:
[root@master conf]# vi /etc/profile
HADOOP_HOME=/usr/hadoop-1.2.1 export HADOOP_HOME PATH=$PATH:$HADOOP_HOME/bin export PATH
加载环境变量:
[root@master conf]# source /etc/profile
测试环境变量:
[root@master conf]# hadoop Warning: $HADOOP_HOME is deprecated. Usage: hadoop [--config confdir] COMMAND where COMMAND is one of: namenode -format format the DFS filesystem ....
done
三、修改hadoop JAVA_HOME路径:
[root@slave01 conf]# vi hadoop-env.sh
# The java implementation to use. Required. export JAVA_HOME=/usr/jdk1.6.0_45
四、修改core-site.xml配置
<configuration> <property> <name>hadoop.tmp.dir</name> <value>/usr/hadoop-1.2.1/tmp</value> </property> <property> <name>fs.default.name</name> <value>hdfs://master.hadoop:9000</value> </property> </configuration>
五、修改hdfs-site.xml
[hadoop@master conf]$ vi hdfs-site.xml
<configuration> <property> <name>dfs.data.dir</name> <value>/usr/hadoop-1.2.1/data</value> </property> <property> <name>dfs.replication</name> <value>2</value> </property> </configuration>
六、修改mapred-site.xml
[hadoop@master conf]$ vi mapred-site.xml
<configuration> <property> <name>mapred.job.tracker</name> <value>master.hadoop:9001</value> </property> </configuration>
七、修改 masters和slaves
[hadoop@master conf]$ vi masters
添加hostname或者IP
master.hadoop
[hadoop@master conf]$ vi slaves
slave01.hadoop slave02.hadoop
八、将修改好的hadoop分发给slave,由于slave节点hadoop用户还没有usr目录下的写权限,所以目的主机用root,源主机无所谓
[root@master usr]# [root@master usr]# scp -r hadoop-1.2.1/ root@slave01.hadoop:/usr ... [root@master usr]# [root@master usr]# scp -r hadoop-1.2.1/ root@slave02.hadoop:/usr
然后,slave修改hadoop-1.2.1目录权限
九、格式化HDFS文件系统
[hadoop@master usr]$ [hadoop@master usr]$ hadoop namenode -format Warning: $HADOOP_HOME is deprecated. 14/10/22 07:26:09 INFO namenode.NameNode: STARTUP_MSG: /************************************************************ STARTUP_MSG: Starting NameNode STARTUP_MSG: host = master.hadoop/192.168.1.100 STARTUP_MSG: args = [-format] STARTUP_MSG: version = 1.2.1 STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2 -r 1503152; compiled by 'mattf' on Mon Jul 22 15:23:09 PDT 2013 STARTUP_MSG: java = 1.6.0_45 ************************************************************/ 14/10/22 07:26:09 INFO util.GSet: Computing capacity for map BlocksMap 14/10/22 07:26:09 INFO util.GSet: VM type = 32-bit 14/10/22 07:26:09 INFO util.GSet: 2.0% max memory = 1013645312 14/10/22 07:26:09 INFO util.GSet: capacity = 2^22 = 4194304 entries 14/10/22 07:26:09 INFO util.GSet: recommended=4194304, actual=4194304 14/10/22 07:26:09 INFO namenode.FSNamesystem: fsOwner=hadoop 14/10/22 07:26:09 INFO namenode.FSNamesystem: supergroup=supergroup 14/10/22 07:26:09 INFO namenode.FSNamesystem: isPermissionEnabled=true 14/10/22 07:26:09 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100 14/10/22 07:26:09 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s) 14/10/22 07:26:09 INFO namenode.FSEditLog: dfs.namenode.edits.toleration.length = 0 14/10/22 07:26:09 INFO namenode.NameNode: Caching file names occuring more than 10 times 14/10/22 07:26:09 INFO common.Storage: Image file /usr/hadoop/tmp/dfs/name/current/fsimage of size 112 bytes saved in 0 seconds. 14/10/22 07:26:09 INFO namenode.FSEditLog: closing edit log: position=4, editlog=/usr/hadoop/tmp/dfs/name/current/edits 14/10/22 07:26:09 INFO namenode.FSEditLog: close success: truncate to 4, editlog=/usr/hadoop/tmp/dfs/name/current/edits 14/10/22 07:26:09 INFO common.Storage: Storage directory /usr/hadoop/tmp/dfs/name has been successfully formatted. 14/10/22 07:26:09 INFO namenode.NameNode: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down NameNode at master.hadoop/192.168.1.100 ************************************************************/ [hadoop@master usr]$
出现.......successfully formatted 为成功,见第一步的插曲
十、启动hadoop
启动器,先关闭iptables(master、slave都要关闭),不然执行任务可能出错
[root@master usr]# service iptables stop iptables: Flushing firewall rules: [ OK ] iptables: Setting chains to policy ACCEPT: filter [ OK ] iptables: Unloading modules: [ OK ] [root@master usr]#
(slave忘记关闭防火墙)插曲:
[hadoop@master hadoop-1.2.1]$ hadoop jar hadoop-examples-1.2.1.jar pi 10 100 Warning: $HADOOP_HOME is deprecated. Number of Maps = 10 Samples per Map = 100 13/09/08 02:17:05 INFO hdfs.DFSClient: Exception in createBlockOutputStream 192.168.70.102:50010 java.net.NoRouteToHostException: No route to host 13/09/08 02:17:05 INFO hdfs.DFSClient: Abandoning blk_9160013073143341141_4460 13/09/08 02:17:05 INFO hdfs.DFSClient: Excluding datanode 192.168.70.102:50010 13/09/08 02:17:05 INFO hdfs.DFSClient: Exception in createBlockOutputStream 192.168.70.103:50010 java.net.NoRouteToHostException: No route to host 13/09/08 02:17:05 INFO hdfs.DFSClient: Abandoning blk_-1734085534405596274_4461 13/09/08 02:17:05 INFO hdfs.DFSClient: Excluding datanode 192.168.70.103:50010 13/09/08 02:17:05 WARN hdfs.DFSClient: DataStreamer Exception: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /user/hadoop/PiEstimator_TMP_3_141592654/in/part0 could only be replicated to 0 nodes, instead of 1 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1920) at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:783)
关闭后解决,另外,配置尽量用IP吧。
启动
[root@master usr]# su hadoop [hadoop@master usr]$ start-all.sh Warning: $HADOOP_HOME is deprecated. starting namenode, logging to /usr/hadoop-1.2.1/libexec/../logs/hadoop-hadoop-namenode-master.hadoop.out slave01.hadoop: starting datanode, logging to /usr/hadoop-1.2.1/libexec/../logs/hadoop-hadoop-datanode-slave01.hadoop.out slave02.hadoop: starting datanode, logging to /usr/hadoop-1.2.1/libexec/../logs/hadoop-hadoop-datanode-slave02.hadoop.out The authenticity of host 'master.hadoop (192.168.70.101)' can't be established. RSA key fingerprint is 6c:e0:d7:22:92:80:85:fb:a6:d6:a4:8f:75:b0:96:7e. Are you sure you want to continue connecting (yes/no)? yes master.hadoop: Warning: Permanently added 'master.hadoop,192.168.70.101' (RSA) to the list of known hosts. master.hadoop: starting secondarynamenode, logging to /usr/hadoop-1.2.1/libexec/../logs/hadoop-hadoop-secondarynamenode-master.hadoop.out starting jobtracker, logging to /usr/hadoop-1.2.1/libexec/../logs/hadoop-hadoop-jobtracker-master.hadoop.out slave02.hadoop: starting tasktracker, logging to /usr/hadoop-1.2.1/libexec/../logs/hadoop-hadoop-tasktracker-slave02.hadoop.out slave01.hadoop: starting tasktracker, logging to /usr/hadoop-1.2.1/libexec/../logs/hadoop-hadoop-tasktracker-slave01.hadoop.out [hadoop@master usr]$
从日志看出,启动过程:namenode(master)----> datanode(slave01、slave02)---->secondarynamenode(master)-----> jobtracker(master)-----> 最后启动tasktracker(slave01、slave02)
十一、验证
查看hadoop进程,master、slave节点分别用jps
master:
[hadoop@master tmp]$ jps 6009 Jps 5560 SecondaryNameNode 5393 NameNode 5627 JobTracker [hadoop@master tmp]$
slave01:
[hadoop@slave01 tmp]$ jps 3855 Jps 3698 TaskTracker 3636 DataNode
slave02:
[root@slave02 tmp]# jps 3628 TaskTracker 3748 Jps 3567 DataNode [root@slave02 tmp]#
查看集群状态:hadoop dfsadmin -report
[hadoop@master tmp]$ hadoop dfsadmin -report Warning: $HADOOP_HOME is deprecated. Configured Capacity: 14174945280 (13.2 GB) Present Capacity: 7577288704 (7.06 GB) DFS Remaining: 7577231360 (7.06 GB) DFS Used: 57344 (56 KB) DFS Used%: 0% Under replicated blocks: 0 Blocks with corrupt replicas: 0 Missing blocks: 0 ------------------------------------------------- Datanodes available: 2 (2 total, 0 dead) Name: 192.168.70.103:50010 Decommission Status : Normal Configured Capacity: 7087472640 (6.6 GB) DFS Used: 28672 (28 KB) Non DFS Used: 3298820096 (3.07 GB) DFS Remaining: 3788623872(3.53 GB) DFS Used%: 0% DFS Remaining%: 53.46% Last contact: Sun Sep 08 01:19:18 PDT 2013 Name: 192.168.70.102:50010 Decommission Status : Normal Configured Capacity: 7087472640 (6.6 GB) DFS Used: 28672 (28 KB) Non DFS Used: 3298836480 (3.07 GB) DFS Remaining: 3788607488(3.53 GB) DFS Used%: 0% DFS Remaining%: 53.45% Last contact: Sun Sep 08 01:19:17 PDT 2013 [hadoop@master tmp]$
集群管理页面:master IP
http://192.168.70.101:50030
http://192.168.70.101:50070/
十二、执行任务,算个圆周率
[hadoop@master hadoop-1.2.1]$ hadoop jar hadoop-examples-1.2.1.jar pi 10 100
第一个参数10:表示运行10次map任务
第二个参数100:表示每个map取样的个数
正常结果:
[hadoop@master hadoop-1.2.1]$ hadoop jar hadoop-examples-1.2.1.jar pi 10 100 Warning: $HADOOP_HOME is deprecated. Number of Maps = 10 Samples per Map = 100 Wrote input for Map #0 Wrote input for Map #1 Wrote input for Map #2 Wrote input for Map #3 Wrote input for Map #4 Wrote input for Map #5 Wrote input for Map #6 Wrote input for Map #7 Wrote input for Map #8 Wrote input for Map #9 Starting Job 13/09/08 02:21:50 INFO mapred.FileInputFormat: Total input paths to process : 10 13/09/08 02:21:52 INFO mapred.JobClient: Running job: job_201309080221_0001 13/09/08 02:21:53 INFO mapred.JobClient: map 0% reduce 0% 13/09/08 02:24:06 INFO mapred.JobClient: map 10% reduce 0% 13/09/08 02:24:07 INFO mapred.JobClient: map 20% reduce 0% 13/09/08 02:24:21 INFO mapred.JobClient: map 30% reduce 0% 13/09/08 02:24:28 INFO mapred.JobClient: map 40% reduce 0% 13/09/08 02:24:31 INFO mapred.JobClient: map 50% reduce 0% 13/09/08 02:24:32 INFO mapred.JobClient: map 60% reduce 0% 13/09/08 02:24:38 INFO mapred.JobClient: map 70% reduce 0% 13/09/08 02:24:41 INFO mapred.JobClient: map 80% reduce 13% 13/09/08 02:24:44 INFO mapred.JobClient: map 80% reduce 23% 13/09/08 02:24:45 INFO mapred.JobClient: map 100% reduce 23% 13/09/08 02:24:47 INFO mapred.JobClient: map 100% reduce 26% 13/09/08 02:24:53 INFO mapred.JobClient: map 100% reduce 100% 13/09/08 02:24:54 INFO mapred.JobClient: Job complete: job_201309080221_0001 13/09/08 02:24:54 INFO mapred.JobClient: Counters: 30 13/09/08 02:24:54 INFO mapred.JobClient: Job Counters 13/09/08 02:24:54 INFO mapred.JobClient: Launched reduce tasks=1 13/09/08 02:24:54 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=638017 13/09/08 02:24:54 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 13/09/08 02:24:54 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 13/09/08 02:24:54 INFO mapred.JobClient: Launched map tasks=10 13/09/08 02:24:54 INFO mapred.JobClient: Data-local map tasks=10 13/09/08 02:24:54 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=44458 13/09/08 02:24:54 INFO mapred.JobClient: File Input Format Counters 13/09/08 02:24:54 INFO mapred.JobClient: Bytes Read=1180 13/09/08 02:24:54 INFO mapred.JobClient: File Output Format Counters 13/09/08 02:24:54 INFO mapred.JobClient: Bytes Written=97 13/09/08 02:24:54 INFO mapred.JobClient: FileSystemCounters 13/09/08 02:24:54 INFO mapred.JobClient: FILE_BYTES_READ=226 13/09/08 02:24:54 INFO mapred.JobClient: HDFS_BYTES_READ=2460 13/09/08 02:24:54 INFO mapred.JobClient: FILE_BYTES_WRITTEN=623419 13/09/08 02:24:54 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=215 13/09/08 02:24:54 INFO mapred.JobClient: Map-Reduce Framework 13/09/08 02:24:54 INFO mapred.JobClient: Map output materialized bytes=280 13/09/08 02:24:54 INFO mapred.JobClient: Map input records=10 13/09/08 02:24:54 INFO mapred.JobClient: Reduce shuffle bytes=280 13/09/08 02:24:54 INFO mapred.JobClient: Spilled Records=40 13/09/08 02:24:54 INFO mapred.JobClient: Map output bytes=180 13/09/08 02:24:54 INFO mapred.JobClient: Total committed heap usage (bytes)=1414819840 13/09/08 02:24:54 INFO mapred.JobClient: CPU time spent (ms)=377130 13/09/08 02:24:54 INFO mapred.JobClient: Map input bytes=240 13/09/08 02:24:54 INFO mapred.JobClient: SPLIT_RAW_BYTES=1280 13/09/08 02:24:54 INFO mapred.JobClient: Combine input records=0 13/09/08 02:24:54 INFO mapred.JobClient: Reduce input records=20 13/09/08 02:24:54 INFO mapred.JobClient: Reduce input groups=20 13/09/08 02:24:54 INFO mapred.JobClient: Combine output records=0 13/09/08 02:24:54 INFO mapred.JobClient: Physical memory (bytes) snapshot=1473769472 13/09/08 02:24:54 INFO mapred.JobClient: Reduce output records=0 13/09/08 02:24:54 INFO mapred.JobClient: Virtual memory (bytes) snapshot=4130349056 13/09/08 02:24:54 INFO mapred.JobClient: Map output records=20 Job Finished in 184.973 seconds Estimated value of Pi is 3.14800000000000000000 [hadoop@master hadoop-1.2.1]$
由于未关闭slave防火墙,见第十步插曲。
相关推荐
"Hadoop集群安装笔记" Hadoop集群安装笔记是一篇详细的安装指南,旨在帮助新手快速搭建Hadoop学习环境。以下是该笔记中的重要知识点: Hadoop集群安装目录 在安装Hadoop集群之前,需要准备好安装环境。安装环境...
在IT领域,Linux Hadoop集群安装是一个复杂但至关重要的任务,尤其对于大数据处理和分析的组织来说。Hadoop是Apache软件基金会开发的一个开源框架,它允许分布式存储和处理大规模数据集。下面,我们将深入探讨Hadoop...
本篇将详细讲解如何利用Ansible自动安装Hadoop集群。 首先,理解Ansible的工作原理至关重要。Ansible基于SSH(Secure Shell)协议,无需在目标节点上安装任何代理,通过控制节点即可实现远程管理。它使用YAML格式的...
### 完全分布式模式的Hadoop集群安装 #### 实验背景与目的 在现代大数据处理领域,Apache Hadoop因其强大的数据处理能力而受到广泛青睐。本文档旨在介绍如何在Linux环境下,利用三台虚拟机(一台主机两台从机)...
### Hadoop集群安装与配置详解 #### 一、引言 随着互联网技术的快速发展和企业数据量的激增,高效处理大规模数据的需求日益迫切。Hadoop作为一种开源的大数据处理框架,因其优秀的分布式处理能力和可扩展性,成为...
部署全分布模式Hadoop集群 实验报告一、实验目的 1. 熟练掌握 Linux 基本命令。 2. 掌握静态 IP 地址的配置、主机名和域名映射的修改。 3. 掌握 Linux 环境下 Java 的安装、环境变量的配置、Java 基本命令的使用。 ...
"hadoop集群安装脚本"是实现快速、便捷部署Hadoop集群的一种工具,尤其对于初学者或运维人员来说,这种一键式安装脚本极大地简化了复杂的配置过程。 Hadoop集群的核心组件包括HDFS(Hadoop Distributed File System...
总的来说,这份Hadoop集群安装指南为用户提供了一套完整的步骤来搭建和验证Hadoop集群,从基础的虚拟环境搭建、操作系统配置,到Hadoop集群的详细配置部署,以及问题解决方法的介绍,内容相当全面且实用,适合不同...
hadoop集群搭建步骤 集群规划 安装步骤:安装配置zooekeeper集群、安装配置hadoop集群、验证HDFS HA
7. **测试验证**:安装完成后,需要进行一系列的测试来验证Hadoop集群的功能,如HDFS的读写测试,MapReduce任务的执行,以及YARN资源调度的检查。 8. **监控与维护**:最后,安装自动化工具还应考虑后期的监控和...
本文将详细讲解如何搭建一个Hadoop集群,包括虚拟机安装、环境配置、Hadoop和Zookeeper的安装及集群验证。以下是对每个步骤的详细说明: 一、虚拟机安装与配置 在搭建Hadoop集群之前,首先需要准备多台虚拟机。可以...
### Hadoop2.2.0集群安装:QJM实现HA及Hdfs-site配置详解 #### 一、Hadoop2.2.0完全分布式集群平台安装设置概述 在深入探讨Hadoop2.2.0的安装过程之前,我们需要了解Hadoop的基本架构以及其核心组件——HDFS...
### Hadoop集群构建实训知识点详解 #### 一、运行平台构建 在构建Hadoop集群之前,需要对各台服务器进行必要的配置,确保集群能够稳定运行。主要包括修改主机名称、配置域名解析规则、关闭防火墙以及配置免密登录...
"Hadoop集群部署方案" Hadoop 集群部署方案是指在分布式系统中部署 Hadoop 集群的详细步骤和配置过程。下面是该方案的详细知识点解释: 1. Hadoop 简介 Hadoop 是Apache软件基金会旗下的开源项目,主要用于大数据...
Hadoop集群安装手册是一份详细的指南,用于指导用户如何安装和配置Hadoop集群。Hadoop是一个开源的、分布式存储与计算框架,由Apache基金会维护,被广泛应用于大数据处理场景。本文档的目的是为用户提供一个完整的...
Hadoop分布式集群容错验证
【标题】:Hadoop集群及插件安装 在大数据处理领域,Hadoop是一个核心的开源框架,用于存储和处理大规模数据。本指南将详细介绍如何在集群环境中安装和配置Hadoop,以及与其紧密相关的Zookeeper、HBase、Hive、...
9. **测试Hadoop集群**:启动后,可以使用Hadoop的命令行工具进行简单的操作,如`hadoop fs -ls`检查文件系统,`hadoop jar`运行MapReduce示例,验证集群是否正常工作。 10. **高可用性设置**:对于生产环境,通常...