- 浏览: 922833 次
- 性别:
- 来自: 上海
文章分类
- 全部博客 (263)
- J2EE (9)
- Spring (11)
- Hibernate (11)
- Struts (5)
- opensource (19)
- Hadoop (28)
- 架构设计 (8)
- 企业应用 (10)
- SNMP (8)
- SSO (4)
- webservice (11)
- RPC (2)
- 编程语言 (0)
- Java (30)
- Javascript (5)
- NoSQL (11)
- 数据库 (0)
- oracle (8)
- MySQL (3)
- web (1)
- Android (2)
- Linux (15)
- 软件工具 (15)
- 项目构建 (11)
- 测试工具 (2)
- Exception (19)
- 杂谈 (4)
- Groovy (5)
- Nodejs (1)
- MacOSX (4)
最新评论
-
fighhin:
decode(BinaryBitmap,java.util.M ...
条形码/二维码之开源利器ZXing图文介绍 -
u013489005:
追问:楼主,请问有中文文档么?我的邮箱是frankgray@s ...
Java表达式计算引擎:Expr4J -
u013489005:
感谢博主 需要引入的包是import java.io.*;im ...
Java表达式计算引擎:Expr4J -
calosteward:
感谢楼主分享。。 Zxing 我听说过的。__________ ...
条形码/二维码之开源利器ZXing图文介绍 -
u013810758:
judasqiqi 写道感谢楼主!想请问楼主一下这个生成的图片 ...
Java实现二维码QRCode的编码和解码
blog迁移至 :http://www.micmiu.com
Hadoop一个分布式系统基础架构,是Apache基金下的一个子项目。用户可以在不了解分布式底层细节的情况下,开发分布式程序。充分利用集群的威力高速运算和存储。Hadoop实现了一个分布式文件系统(Hadoop Distributed File System),简称HDFS。HDFS有着高容错性的特点,并且设计用来部署在低廉的(low-cost)硬件上。而且它提供高传输率(high throughput)来访问应用程序的数据,适合那些有着超大数据集(large data set)的应用程序。HDFS放宽了(relax)POSIX的要求(requirements)这样可以流的形式访问(streaming access)文件系统中的数据。
Hadoop官网:http://hadoop.apache.org/
本文详细介绍如何搭建一个Hadoop的测试环境,将分别从单机、伪分布式等逐个讲解,并讲述在不同操作系统(Centos、Ubuntu)搭建过程中可能碰到的问题及解决方法,所有的测试过程本人在Centos、Ubuntu下均全部测试成功,全文的目录结构:
以Ubuntu中的用户 michael为例:
二、前期准备
首先将hadoop-0.20.203.0rc1.tar.gz解压到/home/michael/下
修改hadoop-env.sh中JAVA_HOME的配置,找到如下信息:
修改为:
三、单机演示(Standalone Operation)
所涉及的到的操作命令如下:
详细过程如下:
到此单机演示成功。
四、伪分布式演示(Pseudo-Distributed Operation)
1. 修改配置文件:
conf/core-site.xml:
conf/hdfs-site.xml:
conf/mapred-site.xml:
2.设置SSH无密码登陆
ps:如果是Centos系统,有关SSH无密码的详细设置请看:Linux(Centos)配置OpenSSH无密码登陆
3.测试:
相关操作的基本命令如下:
以上的测试操作在Centos5中测试顺利成功,而在Ubuntu10.10系统中却失败了,在执行命令:bin/hadoop fs -put conf input 时出错,错误信息类似“could only be replicated to 0 nodes, instead of 1”,引起该错误信息的原因有多种(详见:http://sjsky.iteye.com/blog/1124545),但此处的出错的原因是由于hadoop.tmp.dir默认配置指向/tmp/hadoop-${user.name},而在Ubuntu系统中,/tmp目录下的文件系统的类型往往是Hadoop不支持的。
解决的办法是重新定义hadoop.tmp.dir指向,修改配置文件conf/core-site.xml如下:
再次进行测试,成功运行。
整个测试过程的详细信息如下:
到此伪分布式的演示成功。
转载请注明来自:Michael's blog @ http://sjsky.iteye.com
----------------------------- 分 ------------------------------ 隔 ------------------------------ 线 ------------------------------
现在对hadoop才是个入门,还没有深入理解呢
Hadoop一个分布式系统基础架构,是Apache基金下的一个子项目。用户可以在不了解分布式底层细节的情况下,开发分布式程序。充分利用集群的威力高速运算和存储。Hadoop实现了一个分布式文件系统(Hadoop Distributed File System),简称HDFS。HDFS有着高容错性的特点,并且设计用来部署在低廉的(low-cost)硬件上。而且它提供高传输率(high throughput)来访问应用程序的数据,适合那些有着超大数据集(large data set)的应用程序。HDFS放宽了(relax)POSIX的要求(requirements)这样可以流的形式访问(streaming access)文件系统中的数据。
Hadoop官网:http://hadoop.apache.org/
本文详细介绍如何搭建一个Hadoop的测试环境,将分别从单机、伪分布式等逐个讲解,并讲述在不同操作系统(Centos、Ubuntu)搭建过程中可能碰到的问题及解决方法,所有的测试过程本人在Centos、Ubuntu下均全部测试成功,全文的目录结构:
- 实验环境
- 准备工作
- 单机演示
- 伪分布式演示
- Windows Vista
- VirtualBox + Ubuntu10.10(OpenSSH 安装并启动)
- jdk 版本:1.6.0_20;安装路径为:/opt/jdk1.6(Hadoop要求jdk1.6.x)
- hadoop-0.20.203.0rc1.tar.gz(目前最新的稳定版本)
以Ubuntu中的用户 michael为例:
二、前期准备
首先将hadoop-0.20.203.0rc1.tar.gz解压到/home/michael/下
$tar -zxvf hadoop-0.20.203.0rc1.tar.gz -C /home/michael/ $mv hadoop-0.20.203.0 hadoop
修改hadoop-env.sh中JAVA_HOME的配置,找到如下信息:
引用
# The java implementation to use. Required.
# export JAVA_HOME=/usr/lib/j2sdk1.5-sun
# export JAVA_HOME=/usr/lib/j2sdk1.5-sun
修改为:
引用
# The java implementation to use. Required.
#当前系统的JDK的路径
export JAVA_HOME=/opt/jdk1.6
#当前系统的JDK的路径
export JAVA_HOME=/opt/jdk1.6
三、单机演示(Standalone Operation)
所涉及的到的操作命令如下:
$ cd /home/michael/hadoop $ mkdir input $ cp conf/*.xml input $ bin/hadoop jar hadoop-examples-*.jar grep input output 'dfs[a-z.]+' $ cat output/*
详细过程如下:
引用
michael@michael-VirtualBox:~/hadoop$ mkdir input
michael@michael-VirtualBox:~/hadoop$ cp conf/*.xml input
michael@michael-VirtualBox:~/hadoop$ bin/hadoop jar hadoop-examples-0.20.203.0.jar grep input output 'dfs[a-z.]+'
11/07/16 10:06:48 INFO mapred.FileInputFormat: Total input paths to process : 6
11/07/16 10:06:48 INFO mapred.JobClient: Running job: job_local_0001
11/07/16 10:06:48 INFO mapred.MapTask: numReduceTasks: 1
11/07/16 10:06:48 INFO mapred.MapTask: io.sort.mb = 100
11/07/16 10:06:49 INFO mapred.MapTask: data buffer = 79691776/99614720
11/07/16 10:06:49 INFO mapred.MapTask: record buffer = 262144/327680
11/07/16 10:06:49 INFO mapred.MapTask: Starting flush of map output
11/07/16 10:06:49 INFO mapred.JobClient: map 0% reduce 0%
11/07/16 10:06:49 INFO mapred.Task: Task:attempt_local_0001_m_000000_0 is done. And is in the process of commiting
11/07/16 10:06:51 INFO mapred.LocalJobRunner: file:/home/michael/hadoop/input/capacity-scheduler.xml:0+7457
11/07/16 10:06:51 INFO mapred.Task: Task 'attempt_local_0001_m_000000_0' done.
11/07/16 10:06:51 INFO mapred.MapTask: numReduceTasks: 1
11/07/16 10:06:51 INFO mapred.MapTask: io.sort.mb = 100
11/07/16 10:06:51 INFO mapred.MapTask: data buffer = 79691776/99614720
11/07/16 10:06:51 INFO mapred.MapTask: record buffer = 262144/327680
11/07/16 10:06:51 INFO mapred.MapTask: Starting flush of map output
11/07/16 10:06:52 INFO mapred.MapTask: Finished spill 0
11/07/16 10:06:52 INFO mapred.Task: Task:attempt_local_0001_m_000001_0 is done. And is in the process of commiting
11/07/16 10:06:52 INFO mapred.JobClient: map 100% reduce 0%
11/07/16 10:06:54 INFO mapred.LocalJobRunner: file:/home/michael/hadoop/input/hadoop-policy.xml:0+4644
11/07/16 10:06:54 INFO mapred.LocalJobRunner: file:/home/michael/hadoop/input/hadoop-policy.xml:0+4644
11/07/16 10:06:54 INFO mapred.Task: Task 'attempt_local_0001_m_000001_0' done.
11/07/16 10:06:54 INFO mapred.MapTask: numReduceTasks: 1
11/07/16 10:06:54 INFO mapred.MapTask: io.sort.mb = 100
11/07/16 10:06:55 INFO mapred.MapTask: data buffer = 79691776/99614720
11/07/16 10:06:55 INFO mapred.MapTask: record buffer = 262144/327680
11/07/16 10:06:55 INFO mapred.MapTask: Starting flush of map output
11/07/16 10:06:55 INFO mapred.Task: Task:attempt_local_0001_m_000002_0 is done. And is in the process of commiting
11/07/16 10:06:57 INFO mapred.LocalJobRunner: file:/home/michael/hadoop/input/mapred-queue-acls.xml:0+2033
11/07/16 10:06:57 INFO mapred.LocalJobRunner: file:/home/michael/hadoop/input/mapred-queue-acls.xml:0+2033
11/07/16 10:06:57 INFO mapred.Task: Task 'attempt_local_0001_m_000002_0' done.
11/07/16 10:06:57 INFO mapred.MapTask: numReduceTasks: 1
11/07/16 10:06:57 INFO mapred.MapTask: io.sort.mb = 100
11/07/16 10:06:58 INFO mapred.MapTask: data buffer = 79691776/99614720
11/07/16 10:06:58 INFO mapred.MapTask: record buffer = 262144/327680
11/07/16 10:06:58 INFO mapred.MapTask: Starting flush of map output
11/07/16 10:06:58 INFO mapred.Task: Task:attempt_local_0001_m_000003_0 is done. And is in the process of commiting
11/07/16 10:07:00 INFO mapred.LocalJobRunner: file:/home/michael/hadoop/input/mapred-site.xml:0+178
11/07/16 10:07:00 INFO mapred.LocalJobRunner: file:/home/michael/hadoop/input/mapred-site.xml:0+178
11/07/16 10:07:00 INFO mapred.Task: Task 'attempt_local_0001_m_000003_0' done.
11/07/16 10:07:00 INFO mapred.MapTask: numReduceTasks: 1
11/07/16 10:07:00 INFO mapred.MapTask: io.sort.mb = 100
11/07/16 10:07:01 INFO mapred.MapTask: data buffer = 79691776/99614720
11/07/16 10:07:01 INFO mapred.MapTask: record buffer = 262144/327680
11/07/16 10:07:01 INFO mapred.MapTask: Starting flush of map output
11/07/16 10:07:01 INFO mapred.Task: Task:attempt_local_0001_m_000004_0 is done. And is in the process of commiting
11/07/16 10:07:04 INFO mapred.LocalJobRunner: file:/home/michael/hadoop/input/core-site.xml:0+178
11/07/16 10:07:04 INFO mapred.LocalJobRunner: file:/home/michael/hadoop/input/core-site.xml:0+178
11/07/16 10:07:04 INFO mapred.Task: Task 'attempt_local_0001_m_000004_0' done.
11/07/16 10:07:04 INFO mapred.MapTask: numReduceTasks: 1
11/07/16 10:07:04 INFO mapred.MapTask: io.sort.mb = 100
11/07/16 10:07:04 INFO mapred.MapTask: data buffer = 79691776/99614720
11/07/16 10:07:04 INFO mapred.MapTask: record buffer = 262144/327680
11/07/16 10:07:04 INFO mapred.MapTask: Starting flush of map output
11/07/16 10:07:04 INFO mapred.Task: Task:attempt_local_0001_m_000005_0 is done. And is in the process of commiting
11/07/16 10:07:07 INFO mapred.LocalJobRunner: file:/home/michael/hadoop/input/hdfs-site.xml:0+178
11/07/16 10:07:07 INFO mapred.Task: Task 'attempt_local_0001_m_000005_0' done.
11/07/16 10:07:07 INFO mapred.LocalJobRunner:
11/07/16 10:07:07 INFO mapred.Merger: Merging 6 sorted segments
11/07/16 10:07:07 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 21 bytes
11/07/16 10:07:07 INFO mapred.LocalJobRunner:
11/07/16 10:07:07 INFO mapred.Task: Task:attempt_local_0001_r_000000_0 is done. And is in the process of commiting
11/07/16 10:07:07 INFO mapred.LocalJobRunner:
11/07/16 10:07:07 INFO mapred.Task: Task attempt_local_0001_r_000000_0 is allowed to commit now
11/07/16 10:07:07 INFO mapred.FileOutputCommitter: Saved output of task 'attempt_local_0001_r_000000_0' to file:/home/michael/hadoop/grep-temp-1267281521
11/07/16 10:07:10 INFO mapred.LocalJobRunner: reduce > reduce
11/07/16 10:07:10 INFO mapred.Task: Task 'attempt_local_0001_r_000000_0' done.
11/07/16 10:07:10 INFO mapred.JobClient: map 100% reduce 100%
11/07/16 10:07:10 INFO mapred.JobClient: Job complete: job_local_0001
11/07/16 10:07:10 INFO mapred.JobClient: Counters: 17
11/07/16 10:07:10 INFO mapred.JobClient: File Input Format Counters
11/07/16 10:07:10 INFO mapred.JobClient: Bytes Read=14668
11/07/16 10:07:10 INFO mapred.JobClient: File Output Format Counters
11/07/16 10:07:10 INFO mapred.JobClient: Bytes Written=123
11/07/16 10:07:10 INFO mapred.JobClient: FileSystemCounters
11/07/16 10:07:10 INFO mapred.JobClient: FILE_BYTES_READ=1106074
11/07/16 10:07:10 INFO mapred.JobClient: FILE_BYTES_WRITTEN=1231779
11/07/16 10:07:10 INFO mapred.JobClient: Map-Reduce Framework
11/07/16 10:07:10 INFO mapred.JobClient: Map output materialized bytes=55
11/07/16 10:07:10 INFO mapred.JobClient: Map input records=357
11/07/16 10:07:10 INFO mapred.JobClient: Reduce shuffle bytes=0
11/07/16 10:07:10 INFO mapred.JobClient: Spilled Records=2
11/07/16 10:07:10 INFO mapred.JobClient: Map output bytes=17
11/07/16 10:07:10 INFO mapred.JobClient: Map input bytes=14668
11/07/16 10:07:10 INFO mapred.JobClient: SPLIT_RAW_BYTES=611
11/07/16 10:07:10 INFO mapred.JobClient: Combine input records=1
11/07/16 10:07:10 INFO mapred.JobClient: Reduce input records=1
11/07/16 10:07:10 INFO mapred.JobClient: Reduce input groups=1
11/07/16 10:07:10 INFO mapred.JobClient: Combine output records=1
11/07/16 10:07:10 INFO mapred.JobClient: Reduce output records=1
11/07/16 10:07:10 INFO mapred.JobClient: Map output records=1
11/07/16 10:07:10 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
11/07/16 10:07:10 INFO mapred.FileInputFormat: Total input paths to process : 1
11/07/16 10:07:10 INFO mapred.JobClient: Running job: job_local_0002
11/07/16 10:07:10 INFO mapred.MapTask: numReduceTasks: 1
11/07/16 10:07:10 INFO mapred.MapTask: io.sort.mb = 100
11/07/16 10:07:11 INFO mapred.MapTask: data buffer = 79691776/99614720
11/07/16 10:07:11 INFO mapred.MapTask: record buffer = 262144/327680
11/07/16 10:07:11 INFO mapred.MapTask: Starting flush of map output
11/07/16 10:07:11 INFO mapred.MapTask: Finished spill 0
11/07/16 10:07:11 INFO mapred.Task: Task:attempt_local_0002_m_000000_0 is done. And is in the process of commiting
11/07/16 10:07:11 INFO mapred.JobClient: map 0% reduce 0%
11/07/16 10:07:13 INFO mapred.LocalJobRunner: file:/home/michael/hadoop/grep-temp-1267281521/part-00000:0+111
11/07/16 10:07:13 INFO mapred.LocalJobRunner: file:/home/michael/hadoop/grep-temp-1267281521/part-00000:0+111
11/07/16 10:07:13 INFO mapred.Task: Task 'attempt_local_0002_m_000000_0' done.
11/07/16 10:07:13 INFO mapred.LocalJobRunner:
11/07/16 10:07:13 INFO mapred.Merger: Merging 1 sorted segments
11/07/16 10:07:13 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 21 bytes
11/07/16 10:07:13 INFO mapred.LocalJobRunner:
11/07/16 10:07:13 INFO mapred.Task: Task:attempt_local_0002_r_000000_0 is done. And is in the process of commiting
11/07/16 10:07:13 INFO mapred.LocalJobRunner:
11/07/16 10:07:13 INFO mapred.Task: Task attempt_local_0002_r_000000_0 is allowed to commit now
11/07/16 10:07:13 INFO mapred.FileOutputCommitter: Saved output of task 'attempt_local_0002_r_000000_0' to file:/home/michael/hadoop/output
11/07/16 10:07:14 INFO mapred.JobClient: map 100% reduce 0%
11/07/16 10:07:16 INFO mapred.LocalJobRunner: reduce > reduce
11/07/16 10:07:16 INFO mapred.Task: Task 'attempt_local_0002_r_000000_0' done.
11/07/16 10:07:17 INFO mapred.JobClient: map 100% reduce 100%
11/07/16 10:07:17 INFO mapred.JobClient: Job complete: job_local_0002
11/07/16 10:07:17 INFO mapred.JobClient: Counters: 17
11/07/16 10:07:17 INFO mapred.JobClient: File Input Format Counters
11/07/16 10:07:17 INFO mapred.JobClient: Bytes Read=123
11/07/16 10:07:17 INFO mapred.JobClient: File Output Format Counters
11/07/16 10:07:17 INFO mapred.JobClient: Bytes Written=23
11/07/16 10:07:17 INFO mapred.JobClient: FileSystemCounters
11/07/16 10:07:17 INFO mapred.JobClient: FILE_BYTES_READ=606737
11/07/16 10:07:17 INFO mapred.JobClient: FILE_BYTES_WRITTEN=700981
11/07/16 10:07:17 INFO mapred.JobClient: Map-Reduce Framework
11/07/16 10:07:17 INFO mapred.JobClient: Map output materialized bytes=25
11/07/16 10:07:17 INFO mapred.JobClient: Map input records=1
11/07/16 10:07:17 INFO mapred.JobClient: Reduce shuffle bytes=0
11/07/16 10:07:17 INFO mapred.JobClient: Spilled Records=2
11/07/16 10:07:17 INFO mapred.JobClient: Map output bytes=17
11/07/16 10:07:17 INFO mapred.JobClient: Map input bytes=25
11/07/16 10:07:17 INFO mapred.JobClient: SPLIT_RAW_BYTES=110
11/07/16 10:07:17 INFO mapred.JobClient: Combine input records=0
11/07/16 10:07:17 INFO mapred.JobClient: Reduce input records=1
11/07/16 10:07:17 INFO mapred.JobClient: Reduce input groups=1
11/07/16 10:07:17 INFO mapred.JobClient: Combine output records=0
11/07/16 10:07:17 INFO mapred.JobClient: Reduce output records=1
11/07/16 10:07:17 INFO mapred.JobClient: Map output records=1
michael@michael-VirtualBox:~/hadoop$
michael@michael-VirtualBox:~/hadoop$ cat output/*
1 dfsadmin
michael@michael-VirtualBox:~/hadoop$
michael@michael-VirtualBox:~/hadoop$ cp conf/*.xml input
michael@michael-VirtualBox:~/hadoop$ bin/hadoop jar hadoop-examples-0.20.203.0.jar grep input output 'dfs[a-z.]+'
11/07/16 10:06:48 INFO mapred.FileInputFormat: Total input paths to process : 6
11/07/16 10:06:48 INFO mapred.JobClient: Running job: job_local_0001
11/07/16 10:06:48 INFO mapred.MapTask: numReduceTasks: 1
11/07/16 10:06:48 INFO mapred.MapTask: io.sort.mb = 100
11/07/16 10:06:49 INFO mapred.MapTask: data buffer = 79691776/99614720
11/07/16 10:06:49 INFO mapred.MapTask: record buffer = 262144/327680
11/07/16 10:06:49 INFO mapred.MapTask: Starting flush of map output
11/07/16 10:06:49 INFO mapred.JobClient: map 0% reduce 0%
11/07/16 10:06:49 INFO mapred.Task: Task:attempt_local_0001_m_000000_0 is done. And is in the process of commiting
11/07/16 10:06:51 INFO mapred.LocalJobRunner: file:/home/michael/hadoop/input/capacity-scheduler.xml:0+7457
11/07/16 10:06:51 INFO mapred.Task: Task 'attempt_local_0001_m_000000_0' done.
11/07/16 10:06:51 INFO mapred.MapTask: numReduceTasks: 1
11/07/16 10:06:51 INFO mapred.MapTask: io.sort.mb = 100
11/07/16 10:06:51 INFO mapred.MapTask: data buffer = 79691776/99614720
11/07/16 10:06:51 INFO mapred.MapTask: record buffer = 262144/327680
11/07/16 10:06:51 INFO mapred.MapTask: Starting flush of map output
11/07/16 10:06:52 INFO mapred.MapTask: Finished spill 0
11/07/16 10:06:52 INFO mapred.Task: Task:attempt_local_0001_m_000001_0 is done. And is in the process of commiting
11/07/16 10:06:52 INFO mapred.JobClient: map 100% reduce 0%
11/07/16 10:06:54 INFO mapred.LocalJobRunner: file:/home/michael/hadoop/input/hadoop-policy.xml:0+4644
11/07/16 10:06:54 INFO mapred.LocalJobRunner: file:/home/michael/hadoop/input/hadoop-policy.xml:0+4644
11/07/16 10:06:54 INFO mapred.Task: Task 'attempt_local_0001_m_000001_0' done.
11/07/16 10:06:54 INFO mapred.MapTask: numReduceTasks: 1
11/07/16 10:06:54 INFO mapred.MapTask: io.sort.mb = 100
11/07/16 10:06:55 INFO mapred.MapTask: data buffer = 79691776/99614720
11/07/16 10:06:55 INFO mapred.MapTask: record buffer = 262144/327680
11/07/16 10:06:55 INFO mapred.MapTask: Starting flush of map output
11/07/16 10:06:55 INFO mapred.Task: Task:attempt_local_0001_m_000002_0 is done. And is in the process of commiting
11/07/16 10:06:57 INFO mapred.LocalJobRunner: file:/home/michael/hadoop/input/mapred-queue-acls.xml:0+2033
11/07/16 10:06:57 INFO mapred.LocalJobRunner: file:/home/michael/hadoop/input/mapred-queue-acls.xml:0+2033
11/07/16 10:06:57 INFO mapred.Task: Task 'attempt_local_0001_m_000002_0' done.
11/07/16 10:06:57 INFO mapred.MapTask: numReduceTasks: 1
11/07/16 10:06:57 INFO mapred.MapTask: io.sort.mb = 100
11/07/16 10:06:58 INFO mapred.MapTask: data buffer = 79691776/99614720
11/07/16 10:06:58 INFO mapred.MapTask: record buffer = 262144/327680
11/07/16 10:06:58 INFO mapred.MapTask: Starting flush of map output
11/07/16 10:06:58 INFO mapred.Task: Task:attempt_local_0001_m_000003_0 is done. And is in the process of commiting
11/07/16 10:07:00 INFO mapred.LocalJobRunner: file:/home/michael/hadoop/input/mapred-site.xml:0+178
11/07/16 10:07:00 INFO mapred.LocalJobRunner: file:/home/michael/hadoop/input/mapred-site.xml:0+178
11/07/16 10:07:00 INFO mapred.Task: Task 'attempt_local_0001_m_000003_0' done.
11/07/16 10:07:00 INFO mapred.MapTask: numReduceTasks: 1
11/07/16 10:07:00 INFO mapred.MapTask: io.sort.mb = 100
11/07/16 10:07:01 INFO mapred.MapTask: data buffer = 79691776/99614720
11/07/16 10:07:01 INFO mapred.MapTask: record buffer = 262144/327680
11/07/16 10:07:01 INFO mapred.MapTask: Starting flush of map output
11/07/16 10:07:01 INFO mapred.Task: Task:attempt_local_0001_m_000004_0 is done. And is in the process of commiting
11/07/16 10:07:04 INFO mapred.LocalJobRunner: file:/home/michael/hadoop/input/core-site.xml:0+178
11/07/16 10:07:04 INFO mapred.LocalJobRunner: file:/home/michael/hadoop/input/core-site.xml:0+178
11/07/16 10:07:04 INFO mapred.Task: Task 'attempt_local_0001_m_000004_0' done.
11/07/16 10:07:04 INFO mapred.MapTask: numReduceTasks: 1
11/07/16 10:07:04 INFO mapred.MapTask: io.sort.mb = 100
11/07/16 10:07:04 INFO mapred.MapTask: data buffer = 79691776/99614720
11/07/16 10:07:04 INFO mapred.MapTask: record buffer = 262144/327680
11/07/16 10:07:04 INFO mapred.MapTask: Starting flush of map output
11/07/16 10:07:04 INFO mapred.Task: Task:attempt_local_0001_m_000005_0 is done. And is in the process of commiting
11/07/16 10:07:07 INFO mapred.LocalJobRunner: file:/home/michael/hadoop/input/hdfs-site.xml:0+178
11/07/16 10:07:07 INFO mapred.Task: Task 'attempt_local_0001_m_000005_0' done.
11/07/16 10:07:07 INFO mapred.LocalJobRunner:
11/07/16 10:07:07 INFO mapred.Merger: Merging 6 sorted segments
11/07/16 10:07:07 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 21 bytes
11/07/16 10:07:07 INFO mapred.LocalJobRunner:
11/07/16 10:07:07 INFO mapred.Task: Task:attempt_local_0001_r_000000_0 is done. And is in the process of commiting
11/07/16 10:07:07 INFO mapred.LocalJobRunner:
11/07/16 10:07:07 INFO mapred.Task: Task attempt_local_0001_r_000000_0 is allowed to commit now
11/07/16 10:07:07 INFO mapred.FileOutputCommitter: Saved output of task 'attempt_local_0001_r_000000_0' to file:/home/michael/hadoop/grep-temp-1267281521
11/07/16 10:07:10 INFO mapred.LocalJobRunner: reduce > reduce
11/07/16 10:07:10 INFO mapred.Task: Task 'attempt_local_0001_r_000000_0' done.
11/07/16 10:07:10 INFO mapred.JobClient: map 100% reduce 100%
11/07/16 10:07:10 INFO mapred.JobClient: Job complete: job_local_0001
11/07/16 10:07:10 INFO mapred.JobClient: Counters: 17
11/07/16 10:07:10 INFO mapred.JobClient: File Input Format Counters
11/07/16 10:07:10 INFO mapred.JobClient: Bytes Read=14668
11/07/16 10:07:10 INFO mapred.JobClient: File Output Format Counters
11/07/16 10:07:10 INFO mapred.JobClient: Bytes Written=123
11/07/16 10:07:10 INFO mapred.JobClient: FileSystemCounters
11/07/16 10:07:10 INFO mapred.JobClient: FILE_BYTES_READ=1106074
11/07/16 10:07:10 INFO mapred.JobClient: FILE_BYTES_WRITTEN=1231779
11/07/16 10:07:10 INFO mapred.JobClient: Map-Reduce Framework
11/07/16 10:07:10 INFO mapred.JobClient: Map output materialized bytes=55
11/07/16 10:07:10 INFO mapred.JobClient: Map input records=357
11/07/16 10:07:10 INFO mapred.JobClient: Reduce shuffle bytes=0
11/07/16 10:07:10 INFO mapred.JobClient: Spilled Records=2
11/07/16 10:07:10 INFO mapred.JobClient: Map output bytes=17
11/07/16 10:07:10 INFO mapred.JobClient: Map input bytes=14668
11/07/16 10:07:10 INFO mapred.JobClient: SPLIT_RAW_BYTES=611
11/07/16 10:07:10 INFO mapred.JobClient: Combine input records=1
11/07/16 10:07:10 INFO mapred.JobClient: Reduce input records=1
11/07/16 10:07:10 INFO mapred.JobClient: Reduce input groups=1
11/07/16 10:07:10 INFO mapred.JobClient: Combine output records=1
11/07/16 10:07:10 INFO mapred.JobClient: Reduce output records=1
11/07/16 10:07:10 INFO mapred.JobClient: Map output records=1
11/07/16 10:07:10 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
11/07/16 10:07:10 INFO mapred.FileInputFormat: Total input paths to process : 1
11/07/16 10:07:10 INFO mapred.JobClient: Running job: job_local_0002
11/07/16 10:07:10 INFO mapred.MapTask: numReduceTasks: 1
11/07/16 10:07:10 INFO mapred.MapTask: io.sort.mb = 100
11/07/16 10:07:11 INFO mapred.MapTask: data buffer = 79691776/99614720
11/07/16 10:07:11 INFO mapred.MapTask: record buffer = 262144/327680
11/07/16 10:07:11 INFO mapred.MapTask: Starting flush of map output
11/07/16 10:07:11 INFO mapred.MapTask: Finished spill 0
11/07/16 10:07:11 INFO mapred.Task: Task:attempt_local_0002_m_000000_0 is done. And is in the process of commiting
11/07/16 10:07:11 INFO mapred.JobClient: map 0% reduce 0%
11/07/16 10:07:13 INFO mapred.LocalJobRunner: file:/home/michael/hadoop/grep-temp-1267281521/part-00000:0+111
11/07/16 10:07:13 INFO mapred.LocalJobRunner: file:/home/michael/hadoop/grep-temp-1267281521/part-00000:0+111
11/07/16 10:07:13 INFO mapred.Task: Task 'attempt_local_0002_m_000000_0' done.
11/07/16 10:07:13 INFO mapred.LocalJobRunner:
11/07/16 10:07:13 INFO mapred.Merger: Merging 1 sorted segments
11/07/16 10:07:13 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 21 bytes
11/07/16 10:07:13 INFO mapred.LocalJobRunner:
11/07/16 10:07:13 INFO mapred.Task: Task:attempt_local_0002_r_000000_0 is done. And is in the process of commiting
11/07/16 10:07:13 INFO mapred.LocalJobRunner:
11/07/16 10:07:13 INFO mapred.Task: Task attempt_local_0002_r_000000_0 is allowed to commit now
11/07/16 10:07:13 INFO mapred.FileOutputCommitter: Saved output of task 'attempt_local_0002_r_000000_0' to file:/home/michael/hadoop/output
11/07/16 10:07:14 INFO mapred.JobClient: map 100% reduce 0%
11/07/16 10:07:16 INFO mapred.LocalJobRunner: reduce > reduce
11/07/16 10:07:16 INFO mapred.Task: Task 'attempt_local_0002_r_000000_0' done.
11/07/16 10:07:17 INFO mapred.JobClient: map 100% reduce 100%
11/07/16 10:07:17 INFO mapred.JobClient: Job complete: job_local_0002
11/07/16 10:07:17 INFO mapred.JobClient: Counters: 17
11/07/16 10:07:17 INFO mapred.JobClient: File Input Format Counters
11/07/16 10:07:17 INFO mapred.JobClient: Bytes Read=123
11/07/16 10:07:17 INFO mapred.JobClient: File Output Format Counters
11/07/16 10:07:17 INFO mapred.JobClient: Bytes Written=23
11/07/16 10:07:17 INFO mapred.JobClient: FileSystemCounters
11/07/16 10:07:17 INFO mapred.JobClient: FILE_BYTES_READ=606737
11/07/16 10:07:17 INFO mapred.JobClient: FILE_BYTES_WRITTEN=700981
11/07/16 10:07:17 INFO mapred.JobClient: Map-Reduce Framework
11/07/16 10:07:17 INFO mapred.JobClient: Map output materialized bytes=25
11/07/16 10:07:17 INFO mapred.JobClient: Map input records=1
11/07/16 10:07:17 INFO mapred.JobClient: Reduce shuffle bytes=0
11/07/16 10:07:17 INFO mapred.JobClient: Spilled Records=2
11/07/16 10:07:17 INFO mapred.JobClient: Map output bytes=17
11/07/16 10:07:17 INFO mapred.JobClient: Map input bytes=25
11/07/16 10:07:17 INFO mapred.JobClient: SPLIT_RAW_BYTES=110
11/07/16 10:07:17 INFO mapred.JobClient: Combine input records=0
11/07/16 10:07:17 INFO mapred.JobClient: Reduce input records=1
11/07/16 10:07:17 INFO mapred.JobClient: Reduce input groups=1
11/07/16 10:07:17 INFO mapred.JobClient: Combine output records=0
11/07/16 10:07:17 INFO mapred.JobClient: Reduce output records=1
11/07/16 10:07:17 INFO mapred.JobClient: Map output records=1
michael@michael-VirtualBox:~/hadoop$
michael@michael-VirtualBox:~/hadoop$ cat output/*
1 dfsadmin
michael@michael-VirtualBox:~/hadoop$
到此单机演示成功。
四、伪分布式演示(Pseudo-Distributed Operation)
1. 修改配置文件:
conf/core-site.xml:
<configuration> <property> <name>fs.default.name</name> <value>hdfs://localhost:9000</value> </property> </configuration>
conf/hdfs-site.xml:
<configuration> <property> <name>dfs.replication</name> <value>1</value> </property> </configuration>
conf/mapred-site.xml:
<configuration> <property> <name>mapred.job.tracker</name> <value>localhost:9001</value> </property> </configuration>
2.设置SSH无密码登陆
$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa $ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
ps:如果是Centos系统,有关SSH无密码的详细设置请看:Linux(Centos)配置OpenSSH无密码登陆
3.测试:
相关操作的基本命令如下:
#Format a new distributed-filesystem: $ bin/hadoop namenode -format #Start the hadoop daemons: $ bin/start-all.sh Copy the input files into the distributed filesystem: $ bin/hadoop fs -put conf input #Run some of the examples provided: $ bin/hadoop jar hadoop-examples-*.jar grep input output 'dfs[a-z.]+' #Copy the output files from the distributed filesystem to the local filesytem and examine them: $ bin/hadoop fs -get output output $ cat output/output/*
以上的测试操作在Centos5中测试顺利成功,而在Ubuntu10.10系统中却失败了,在执行命令:bin/hadoop fs -put conf input 时出错,错误信息类似“could only be replicated to 0 nodes, instead of 1”,引起该错误信息的原因有多种(详见:http://sjsky.iteye.com/blog/1124545),但此处的出错的原因是由于hadoop.tmp.dir默认配置指向/tmp/hadoop-${user.name},而在Ubuntu系统中,/tmp目录下的文件系统的类型往往是Hadoop不支持的。
解决的办法是重新定义hadoop.tmp.dir指向,修改配置文件conf/core-site.xml如下:
<configuration> <property> <name>fs.default.name</name> <value>hdfs://localhost:9000</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/home/michael/hadooptmp/hadoop-${user.name}</value> <description> A base for other temporary directories. </description> </property> </configuration>
再次进行测试,成功运行。
整个测试过程的详细信息如下:
引用
michael@michael-VirtualBox:~/hadoop$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
Generating public/private dsa key pair.
Your identification has been saved in /home/michael/.ssh/id_dsa.
Your public key has been saved in /home/michael/.ssh/id_dsa.pub.
The key fingerprint is:
2a:47:e3:3a:c8:80:ab:97:d1:c6:68:54:9a:45:9f:59 michael@michael-VirtualBox
The key's randomart image is:
+--[ DSA 1024]----+
| .. E |
| o. + |
| = + |
| + |
|.. + o S |
|o + +o o |
| = =. + |
|. = .+ |
|o. .. |
+-----------------+
michael@michael-VirtualBox:~/hadoop$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
michael@michael-VirtualBox:~/hadoop$ ssh localhost
Linux michael-VirtualBox 2.6.35-22-generic #33-Ubuntu SMP Sun Sep 19 20:34:50 UTC 2010 i686 GNU/Linux
Ubuntu 10.10
Welcome to Ubuntu!
* Documentation: https://help.ubuntu.com/
71 packages can be updated.
71 updates are security updates.
New release 'natty' available.
Run 'do-release-upgrade' to upgrade to it.
Last login: Wed Jul 15 15:56:17 2011 from shnap.local
michael@michael-VirtualBox:~$ exit
注销
Connection to localhost closed.
michael@michael-VirtualBox:~/hadoop$ bin/hadoop namenode -format
11/07/16 12:43:45 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = michael-VirtualBox/127.0.1.1
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 0.20.203.0
STARTUP_MSG: build = http://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20-security-203 -r 1099333; compiled by 'oom' on Wed May 4 07:57:50 PDT 2011
************************************************************/
11/07/16 12:43:46 INFO util.GSet: VM type = 32-bit
11/07/16 12:43:46 INFO util.GSet: 2% max memory = 19.33375 MB
11/07/16 12:43:46 INFO util.GSet: capacity = 2^22 = 4194304 entries
11/07/16 12:43:46 INFO util.GSet: recommended=4194304, actual=4194304
11/07/16 12:43:46 INFO namenode.FSNamesystem: fsOwner=michael
11/07/16 12:43:46 INFO namenode.FSNamesystem: supergroup=supergroup
11/07/16 12:43:46 INFO namenode.FSNamesystem: isPermissionEnabled=true
11/07/16 12:43:46 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100
11/07/16 12:43:46 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)
11/07/16 12:43:46 INFO namenode.NameNode: Caching file names occuring more than 10 times
11/07/16 12:43:47 INFO common.Storage: Image file of size 113 saved in 0 seconds.
11/07/16 12:43:47 INFO common.Storage: Storage directory /home/michael/hadooptmp/hadoop-michael/dfs/name has been successfully formatted.
11/07/16 12:43:47 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at michael-VirtualBox/127.0.1.1
************************************************************/
michael@michael-VirtualBox:~/hadoop$ bin/start-all.sh
starting namenode, logging to /home/michael/hadoop/bin/../logs/hadoop-michael-namenode-michael-VirtualBox.out
localhost: starting datanode, logging to /home/michael/hadoop/bin/../logs/hadoop-michael-datanode-michael-VirtualBox.out
localhost: starting secondarynamenode, logging to /home/michael/hadoop/bin/../logs/hadoop-michael-secondarynamenode-michael-VirtualBox.out
starting jobtracker, logging to /home/michael/hadoop/bin/../logs/hadoop-michael-jobtracker-michael-VirtualBox.out
localhost: starting tasktracker, logging to /home/michael/hadoop/bin/../logs/hadoop-michael-tasktracker-michael-VirtualBox.out
michael@michael-VirtualBox:~/hadoop$ jps
7948 SecondaryNameNode
8033 JobTracker
8887 Jps
7627 NameNode
7781 DataNode
8190 TaskTracker
michael@michael-VirtualBox:~/hadoop$ bin/hadoop fs -put conf input
michael@michael-VirtualBox:~/hadoop$ bin/hadoop jar hadoop-examples-*.jar grep input output 'dfs[a-z.]+'
11/07/16 12:46:21 INFO mapred.FileInputFormat: Total input paths to process : 15
11/07/16 12:46:21 INFO mapred.JobClient: Running job: job_201107161244_0001
11/07/16 12:46:22 INFO mapred.JobClient: map 0% reduce 0%
11/07/16 12:47:09 INFO mapred.JobClient: map 13% reduce 0%
11/07/16 12:47:33 INFO mapred.JobClient: map 26% reduce 0%
11/07/16 12:47:45 INFO mapred.JobClient: map 26% reduce 8%
11/07/16 12:47:54 INFO mapred.JobClient: map 40% reduce 8%
11/07/16 12:48:07 INFO mapred.JobClient: map 53% reduce 13%
11/07/16 12:48:16 INFO mapred.JobClient: map 53% reduce 17%
11/07/16 12:48:24 INFO mapred.JobClient: map 66% reduce 17%
11/07/16 12:48:36 INFO mapred.JobClient: map 80% reduce 22%
11/07/16 12:48:42 INFO mapred.JobClient: map 80% reduce 26%
11/07/16 12:48:45 INFO mapred.JobClient: map 93% reduce 26%
11/07/16 12:48:53 INFO mapred.JobClient: map 100% reduce 26%
11/07/16 12:48:58 INFO mapred.JobClient: map 100% reduce 33%
11/07/16 12:49:07 INFO mapred.JobClient: map 100% reduce 100%
11/07/16 12:49:14 INFO mapred.JobClient: Job complete: job_201107161244_0001
11/07/16 12:49:15 INFO mapred.JobClient: Counters: 26
11/07/16 12:49:15 INFO mapred.JobClient: Job Counters
11/07/16 12:49:15 INFO mapred.JobClient: Launched reduce tasks=1
11/07/16 12:49:15 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=255488
11/07/16 12:49:15 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
11/07/16 12:49:15 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
11/07/16 12:49:15 INFO mapred.JobClient: Launched map tasks=15
11/07/16 12:49:15 INFO mapred.JobClient: Data-local map tasks=15
11/07/16 12:49:15 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=115656
11/07/16 12:49:15 INFO mapred.JobClient: File Input Format Counters
11/07/16 12:49:15 INFO mapred.JobClient: Bytes Read=25623
11/07/16 12:49:15 INFO mapred.JobClient: File Output Format Counters
11/07/16 12:49:15 INFO mapred.JobClient: Bytes Written=180
11/07/16 12:49:15 INFO mapred.JobClient: FileSystemCounters
11/07/16 12:49:15 INFO mapred.JobClient: FILE_BYTES_READ=82
11/07/16 12:49:15 INFO mapred.JobClient: HDFS_BYTES_READ=27281
11/07/16 12:49:15 INFO mapred.JobClient: FILE_BYTES_WRITTEN=342206
11/07/16 12:49:15 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=180
11/07/16 12:49:15 INFO mapred.JobClient: Map-Reduce Framework
11/07/16 12:49:15 INFO mapred.JobClient: Map output materialized bytes=166
11/07/16 12:49:15 INFO mapred.JobClient: Map input records=716
11/07/16 12:49:15 INFO mapred.JobClient: Reduce shuffle bytes=166
11/07/16 12:49:15 INFO mapred.JobClient: Spilled Records=6
11/07/16 12:49:15 INFO mapred.JobClient: Map output bytes=70
11/07/16 12:49:15 INFO mapred.JobClient: Map input bytes=25623
11/07/16 12:49:15 INFO mapred.JobClient: Combine input records=3
11/07/16 12:49:15 INFO mapred.JobClient: SPLIT_RAW_BYTES=1658
11/07/16 12:49:15 INFO mapred.JobClient: Reduce input records=3
11/07/16 12:49:15 INFO mapred.JobClient: Reduce input groups=3
11/07/16 12:49:15 INFO mapred.JobClient: Combine output records=3
11/07/16 12:49:15 INFO mapred.JobClient: Reduce output records=3
11/07/16 12:49:15 INFO mapred.JobClient: Map output records=3
11/07/16 12:49:16 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
11/07/16 12:49:17 INFO mapred.FileInputFormat: Total input paths to process : 1
11/07/16 12:49:18 INFO mapred.JobClient: Running job: job_201107161244_0002
11/07/16 12:49:19 INFO mapred.JobClient: map 0% reduce 0%
11/07/16 12:49:40 INFO mapred.JobClient: map 100% reduce 0%
11/07/16 12:49:55 INFO mapred.JobClient: map 100% reduce 100%
11/07/16 12:50:00 INFO mapred.JobClient: Job complete: job_201107161244_0002
11/07/16 12:50:00 INFO mapred.JobClient: Counters: 26
11/07/16 12:50:00 INFO mapred.JobClient: Job Counters
11/07/16 12:50:00 INFO mapred.JobClient: Launched reduce tasks=1
11/07/16 12:50:00 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=16946
11/07/16 12:50:00 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
11/07/16 12:50:00 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
11/07/16 12:50:00 INFO mapred.JobClient: Launched map tasks=1
11/07/16 12:50:00 INFO mapred.JobClient: Data-local map tasks=1
11/07/16 12:50:00 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=14357
11/07/16 12:50:00 INFO mapred.JobClient: File Input Format Counters
11/07/16 12:50:00 INFO mapred.JobClient: Bytes Read=180
11/07/16 12:50:00 INFO mapred.JobClient: File Output Format Counters
11/07/16 12:50:00 INFO mapred.JobClient: Bytes Written=52
11/07/16 12:50:00 INFO mapred.JobClient: FileSystemCounters
11/07/16 12:50:00 INFO mapred.JobClient: FILE_BYTES_READ=82
11/07/16 12:50:00 INFO mapred.JobClient: HDFS_BYTES_READ=298
11/07/16 12:50:00 INFO mapred.JobClient: FILE_BYTES_WRITTEN=41947
11/07/16 12:50:00 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=52
11/07/16 12:50:00 INFO mapred.JobClient: Map-Reduce Framework
11/07/16 12:50:00 INFO mapred.JobClient: Map output materialized bytes=82
11/07/16 12:50:00 INFO mapred.JobClient: Map input records=3
11/07/16 12:50:00 INFO mapred.JobClient: Reduce shuffle bytes=82
11/07/16 12:50:00 INFO mapred.JobClient: Spilled Records=6
11/07/16 12:50:00 INFO mapred.JobClient: Map output bytes=70
11/07/16 12:50:00 INFO mapred.JobClient: Map input bytes=94
11/07/16 12:50:00 INFO mapred.JobClient: Combine input records=0
11/07/16 12:50:00 INFO mapred.JobClient: SPLIT_RAW_BYTES=118
11/07/16 12:50:00 INFO mapred.JobClient: Reduce input records=3
11/07/16 12:50:00 INFO mapred.JobClient: Reduce input groups=1
11/07/16 12:50:00 INFO mapred.JobClient: Combine output records=0
11/07/16 12:50:00 INFO mapred.JobClient: Reduce output records=3
11/07/16 12:50:00 INFO mapred.JobClient: Map output records=3
michael@michael-VirtualBox:~/hadoop$
michael@michael-VirtualBox:~/hadoop$ cat output/output/*
cat: output/output/_logs: 是一个目录
1 dfs.replication
1 dfs.server.namenode.
1 dfsadmin
michael@michael-VirtualBox:~/hadoop$
Generating public/private dsa key pair.
Your identification has been saved in /home/michael/.ssh/id_dsa.
Your public key has been saved in /home/michael/.ssh/id_dsa.pub.
The key fingerprint is:
2a:47:e3:3a:c8:80:ab:97:d1:c6:68:54:9a:45:9f:59 michael@michael-VirtualBox
The key's randomart image is:
+--[ DSA 1024]----+
| .. E |
| o. + |
| = + |
| + |
|.. + o S |
|o + +o o |
| = =. + |
|. = .+ |
|o. .. |
+-----------------+
michael@michael-VirtualBox:~/hadoop$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
michael@michael-VirtualBox:~/hadoop$ ssh localhost
Linux michael-VirtualBox 2.6.35-22-generic #33-Ubuntu SMP Sun Sep 19 20:34:50 UTC 2010 i686 GNU/Linux
Ubuntu 10.10
Welcome to Ubuntu!
* Documentation: https://help.ubuntu.com/
71 packages can be updated.
71 updates are security updates.
New release 'natty' available.
Run 'do-release-upgrade' to upgrade to it.
Last login: Wed Jul 15 15:56:17 2011 from shnap.local
michael@michael-VirtualBox:~$ exit
注销
Connection to localhost closed.
michael@michael-VirtualBox:~/hadoop$ bin/hadoop namenode -format
11/07/16 12:43:45 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = michael-VirtualBox/127.0.1.1
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 0.20.203.0
STARTUP_MSG: build = http://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20-security-203 -r 1099333; compiled by 'oom' on Wed May 4 07:57:50 PDT 2011
************************************************************/
11/07/16 12:43:46 INFO util.GSet: VM type = 32-bit
11/07/16 12:43:46 INFO util.GSet: 2% max memory = 19.33375 MB
11/07/16 12:43:46 INFO util.GSet: capacity = 2^22 = 4194304 entries
11/07/16 12:43:46 INFO util.GSet: recommended=4194304, actual=4194304
11/07/16 12:43:46 INFO namenode.FSNamesystem: fsOwner=michael
11/07/16 12:43:46 INFO namenode.FSNamesystem: supergroup=supergroup
11/07/16 12:43:46 INFO namenode.FSNamesystem: isPermissionEnabled=true
11/07/16 12:43:46 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100
11/07/16 12:43:46 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)
11/07/16 12:43:46 INFO namenode.NameNode: Caching file names occuring more than 10 times
11/07/16 12:43:47 INFO common.Storage: Image file of size 113 saved in 0 seconds.
11/07/16 12:43:47 INFO common.Storage: Storage directory /home/michael/hadooptmp/hadoop-michael/dfs/name has been successfully formatted.
11/07/16 12:43:47 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at michael-VirtualBox/127.0.1.1
************************************************************/
michael@michael-VirtualBox:~/hadoop$ bin/start-all.sh
starting namenode, logging to /home/michael/hadoop/bin/../logs/hadoop-michael-namenode-michael-VirtualBox.out
localhost: starting datanode, logging to /home/michael/hadoop/bin/../logs/hadoop-michael-datanode-michael-VirtualBox.out
localhost: starting secondarynamenode, logging to /home/michael/hadoop/bin/../logs/hadoop-michael-secondarynamenode-michael-VirtualBox.out
starting jobtracker, logging to /home/michael/hadoop/bin/../logs/hadoop-michael-jobtracker-michael-VirtualBox.out
localhost: starting tasktracker, logging to /home/michael/hadoop/bin/../logs/hadoop-michael-tasktracker-michael-VirtualBox.out
michael@michael-VirtualBox:~/hadoop$ jps
7948 SecondaryNameNode
8033 JobTracker
8887 Jps
7627 NameNode
7781 DataNode
8190 TaskTracker
michael@michael-VirtualBox:~/hadoop$ bin/hadoop fs -put conf input
michael@michael-VirtualBox:~/hadoop$ bin/hadoop jar hadoop-examples-*.jar grep input output 'dfs[a-z.]+'
11/07/16 12:46:21 INFO mapred.FileInputFormat: Total input paths to process : 15
11/07/16 12:46:21 INFO mapred.JobClient: Running job: job_201107161244_0001
11/07/16 12:46:22 INFO mapred.JobClient: map 0% reduce 0%
11/07/16 12:47:09 INFO mapred.JobClient: map 13% reduce 0%
11/07/16 12:47:33 INFO mapred.JobClient: map 26% reduce 0%
11/07/16 12:47:45 INFO mapred.JobClient: map 26% reduce 8%
11/07/16 12:47:54 INFO mapred.JobClient: map 40% reduce 8%
11/07/16 12:48:07 INFO mapred.JobClient: map 53% reduce 13%
11/07/16 12:48:16 INFO mapred.JobClient: map 53% reduce 17%
11/07/16 12:48:24 INFO mapred.JobClient: map 66% reduce 17%
11/07/16 12:48:36 INFO mapred.JobClient: map 80% reduce 22%
11/07/16 12:48:42 INFO mapred.JobClient: map 80% reduce 26%
11/07/16 12:48:45 INFO mapred.JobClient: map 93% reduce 26%
11/07/16 12:48:53 INFO mapred.JobClient: map 100% reduce 26%
11/07/16 12:48:58 INFO mapred.JobClient: map 100% reduce 33%
11/07/16 12:49:07 INFO mapred.JobClient: map 100% reduce 100%
11/07/16 12:49:14 INFO mapred.JobClient: Job complete: job_201107161244_0001
11/07/16 12:49:15 INFO mapred.JobClient: Counters: 26
11/07/16 12:49:15 INFO mapred.JobClient: Job Counters
11/07/16 12:49:15 INFO mapred.JobClient: Launched reduce tasks=1
11/07/16 12:49:15 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=255488
11/07/16 12:49:15 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
11/07/16 12:49:15 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
11/07/16 12:49:15 INFO mapred.JobClient: Launched map tasks=15
11/07/16 12:49:15 INFO mapred.JobClient: Data-local map tasks=15
11/07/16 12:49:15 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=115656
11/07/16 12:49:15 INFO mapred.JobClient: File Input Format Counters
11/07/16 12:49:15 INFO mapred.JobClient: Bytes Read=25623
11/07/16 12:49:15 INFO mapred.JobClient: File Output Format Counters
11/07/16 12:49:15 INFO mapred.JobClient: Bytes Written=180
11/07/16 12:49:15 INFO mapred.JobClient: FileSystemCounters
11/07/16 12:49:15 INFO mapred.JobClient: FILE_BYTES_READ=82
11/07/16 12:49:15 INFO mapred.JobClient: HDFS_BYTES_READ=27281
11/07/16 12:49:15 INFO mapred.JobClient: FILE_BYTES_WRITTEN=342206
11/07/16 12:49:15 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=180
11/07/16 12:49:15 INFO mapred.JobClient: Map-Reduce Framework
11/07/16 12:49:15 INFO mapred.JobClient: Map output materialized bytes=166
11/07/16 12:49:15 INFO mapred.JobClient: Map input records=716
11/07/16 12:49:15 INFO mapred.JobClient: Reduce shuffle bytes=166
11/07/16 12:49:15 INFO mapred.JobClient: Spilled Records=6
11/07/16 12:49:15 INFO mapred.JobClient: Map output bytes=70
11/07/16 12:49:15 INFO mapred.JobClient: Map input bytes=25623
11/07/16 12:49:15 INFO mapred.JobClient: Combine input records=3
11/07/16 12:49:15 INFO mapred.JobClient: SPLIT_RAW_BYTES=1658
11/07/16 12:49:15 INFO mapred.JobClient: Reduce input records=3
11/07/16 12:49:15 INFO mapred.JobClient: Reduce input groups=3
11/07/16 12:49:15 INFO mapred.JobClient: Combine output records=3
11/07/16 12:49:15 INFO mapred.JobClient: Reduce output records=3
11/07/16 12:49:15 INFO mapred.JobClient: Map output records=3
11/07/16 12:49:16 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
11/07/16 12:49:17 INFO mapred.FileInputFormat: Total input paths to process : 1
11/07/16 12:49:18 INFO mapred.JobClient: Running job: job_201107161244_0002
11/07/16 12:49:19 INFO mapred.JobClient: map 0% reduce 0%
11/07/16 12:49:40 INFO mapred.JobClient: map 100% reduce 0%
11/07/16 12:49:55 INFO mapred.JobClient: map 100% reduce 100%
11/07/16 12:50:00 INFO mapred.JobClient: Job complete: job_201107161244_0002
11/07/16 12:50:00 INFO mapred.JobClient: Counters: 26
11/07/16 12:50:00 INFO mapred.JobClient: Job Counters
11/07/16 12:50:00 INFO mapred.JobClient: Launched reduce tasks=1
11/07/16 12:50:00 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=16946
11/07/16 12:50:00 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
11/07/16 12:50:00 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
11/07/16 12:50:00 INFO mapred.JobClient: Launched map tasks=1
11/07/16 12:50:00 INFO mapred.JobClient: Data-local map tasks=1
11/07/16 12:50:00 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=14357
11/07/16 12:50:00 INFO mapred.JobClient: File Input Format Counters
11/07/16 12:50:00 INFO mapred.JobClient: Bytes Read=180
11/07/16 12:50:00 INFO mapred.JobClient: File Output Format Counters
11/07/16 12:50:00 INFO mapred.JobClient: Bytes Written=52
11/07/16 12:50:00 INFO mapred.JobClient: FileSystemCounters
11/07/16 12:50:00 INFO mapred.JobClient: FILE_BYTES_READ=82
11/07/16 12:50:00 INFO mapred.JobClient: HDFS_BYTES_READ=298
11/07/16 12:50:00 INFO mapred.JobClient: FILE_BYTES_WRITTEN=41947
11/07/16 12:50:00 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=52
11/07/16 12:50:00 INFO mapred.JobClient: Map-Reduce Framework
11/07/16 12:50:00 INFO mapred.JobClient: Map output materialized bytes=82
11/07/16 12:50:00 INFO mapred.JobClient: Map input records=3
11/07/16 12:50:00 INFO mapred.JobClient: Reduce shuffle bytes=82
11/07/16 12:50:00 INFO mapred.JobClient: Spilled Records=6
11/07/16 12:50:00 INFO mapred.JobClient: Map output bytes=70
11/07/16 12:50:00 INFO mapred.JobClient: Map input bytes=94
11/07/16 12:50:00 INFO mapred.JobClient: Combine input records=0
11/07/16 12:50:00 INFO mapred.JobClient: SPLIT_RAW_BYTES=118
11/07/16 12:50:00 INFO mapred.JobClient: Reduce input records=3
11/07/16 12:50:00 INFO mapred.JobClient: Reduce input groups=1
11/07/16 12:50:00 INFO mapred.JobClient: Combine output records=0
11/07/16 12:50:00 INFO mapred.JobClient: Reduce output records=3
11/07/16 12:50:00 INFO mapred.JobClient: Map output records=3
michael@michael-VirtualBox:~/hadoop$
michael@michael-VirtualBox:~/hadoop$ cat output/output/*
cat: output/output/_logs: 是一个目录
1 dfs.replication
1 dfs.server.namenode.
1 dfsadmin
michael@michael-VirtualBox:~/hadoop$
到此伪分布式的演示成功。
转载请注明来自:Michael's blog @ http://sjsky.iteye.com
----------------------------- 分 ------------------------------ 隔 ------------------------------ 线 ------------------------------
评论
3 楼
Menuz
2012-11-06
文档翻译的不错。
2 楼
sjsky
2011-07-16
conneyma 写道
牛人啊,都整些高深的东东~
现在对hadoop才是个入门,还没有深入理解呢
1 楼
conneyma
2011-07-16
牛人啊,都整些高深的东东~
发表评论
-
Hadoop2.x在Ubuntu系统中编译源码
2014-04-22 14:23 1128本文主要记录Hadoop2.x在Ubuntu 12.04下编 ... -
eclipse中开发Hadoop2.x的Map/Reduce项目
2014-04-22 14:20 1470本文演示如何在Eclipse中开发一个Map/Reduce项 ... -
Hadoop2.x eclipse plugin插件编译安装配置
2014-04-22 14:18 1455本文主要讲解如何编译安装配置 Hadoop2.x eclip ... -
maven编译Spark源码
2014-04-14 23:37 1905Spark 源码除了用 sbt/sbt assembly 编 ... -
java.net.ConnectException: to 0.0.0.0:10020 failed on connection exception
2014-04-14 23:36 4977在DataNode节点中的Hive CLI中执行 selec ... -
ERROR tool.ImportTool: Imported Failed: Attempted to generate class with no colu
2014-04-14 23:35 1627Sqoop 把数据从Oracle中迁移到Hive中时发生错误 ... -
Sqoop安装配置及演示
2014-04-09 16:51 2086Sqoop是一个用来将Hadoop(Hive、HBase)和 ... -
Hive和HBase整合
2014-04-09 16:50 1200本文主要描述Hive和HBase 环境整合配置的详细过程: ... -
HiveException:Not a host:port pair: PBUF
2014-04-09 16:49 1341HBase和Hive整合后,在Hive shell中执行建表 ... -
HiveException:Not a host:port pair: PBUF
2014-04-09 16:48 1081HBase和Hive整合后,在Hive shell中执行建表 ... -
HBase安装配置之完全分布式模式
2014-03-10 17:28 1199HBase安装模式有三种:单机模式、分布式(伪分布式和完全分 ... -
HBase安装配置之伪分布式模式
2014-03-10 17:26 1492HBase安装模式有三种:单机模式、分布式(伪分布式和完全分 ... -
HBase安装配置之单机模式
2014-03-09 22:51 2189HBase安装模式有三种:单机模式、分布式(伪分布式和完全分 ... -
Hive自定义分隔符InputFormat
2014-02-24 17:34 1650Hive默认创建的表字段分隔符为:\001(ctrl-A), ... -
Hive教程之DML数据导入导出
2014-02-20 17:26 588文章基本目录结构: 数据导入 导入本地文件 导入 ... -
HBase基于Hadoop2的源码编译
2014-02-20 17:24 1988本文以HBase0.98.0 为例,演示编译生成适用于Had ... -
Hive教程之metastore的三种模式
2014-02-20 17:24 1365Hive中metastore(元数据存储)的三种方式: ... -
Hive教程之DDL+DML
2014-02-11 17:21 998在完成 Hive安装配置 后自然而然就是它的基本应用,本文就 ... -
Hive安装配置详解
2014-02-11 17:20 814本文主要是在Hadoop单机模式中演示Hive默认(嵌入式d ... -
hadoop 2.2.0 集群模式安装配置和测试
2014-01-22 16:52 837本文详细记录Hadoop 2.2.0 集群安装配置的步骤,并 ...
相关推荐
在IT行业中,Hadoop是一个广泛使用的开源框架,主要用于大数据处理和分布式存储。Hadoop 2.7.3是这个框架的一个稳定版本,它包含了多个改进和优化,以提高性能和稳定性。在这个版本中,Winutils.exe和hadoop.dll是两...
Hadoop是一个开源的分布式计算框架,由Apache基金会开发,它主要设计用于处理和存储大量数据。在提供的信息中,我们关注的是"Hadoop的dll文件",这是一个动态链接库(DLL)文件,通常在Windows操作系统中使用,用于...
Hadoop是Apache软件基金会开发的一个开源分布式计算框架,它允许在普通硬件上高效处理大量数据。在Windows环境下,Hadoop的使用与Linux有所不同,因为它的设计最初是针对Linux操作系统的。"winutils"和"hadoop.dll...
在大数据处理领域,Hadoop是一个不可或缺的开源框架,它提供了分布式存储和计算的能力。本文将详细探讨与"Hadoop.dll"和"winutils.exe"相关的知识点,以及它们在Hadoop-2.7.1版本中的作用。 Hadoop.dll是Hadoop在...
在IT行业中,Hadoop是一个广泛使用的开源框架,主要用于大数据处理和分布式存储。Hadoop 2.7.3是Hadoop发展中的一个重要版本,它包含了众多的优化和改进,旨在提高性能、稳定性和易用性。在这个版本中,`hadoop.dll`...
在Hadoop生态系统中,`hadoop.dll`和`winutils.exe`是两个关键组件,尤其对于Windows用户来说,它们在本地开发和运行Hadoop相关应用时必不可少。`hadoop.dll`是一个动态链接库文件,主要用于在Windows环境中提供...
Hadoop 是一个处理、存储和分析海量的分布式、非结构化数据的开源框架。最初由 Yahoo 的工程师 Doug Cutting 和 Mike Cafarella Hadoop 是一个处理、存储和分析海量的分布式、非结构化数据的开源框架。最初由 Yahoo...
在Hadoop生态系统中,Hadoop 2.7.7是一个重要的版本,它为大数据处理提供了稳定性和性能优化。Hadoop通常被用作Linux环境下的分布式计算框架,但有时开发者或学习者在Windows环境下也需要进行Hadoop相关的开发和测试...
Hadoop是一个由Apache基金会所开发的分布式系统基础架构。用户可以在不了解分布式底层细节的情况下,开发分布式程序。充分利用集群的威力进 Hadoop是一个由Apache基金会所开发的分布式系统基础架构。用户可以在不...
在Hadoop生态系统中,`hadoop.dll`和`winutils.exe`是两个关键组件,尤其对于Windows用户来说。本文将详细介绍这两个文件以及它们在Hadoop 2.6.0版本中的作用。 `hadoop.dll`是Hadoop在Windows环境下运行所必需的一...
标题 "hadoop2.6 hadoop.dll+winutils.exe" 提到的是Hadoop 2.6版本中的两个关键组件:`hadoop.dll` 和 `winutils.exe`,这两个组件对于在Windows环境中配置和运行Hadoop至关重要。Hadoop原本是为Linux环境设计的,...
在windows环境下开发hadoop时,需要配置HADOOP_HOME环境变量,变量值D:\hadoop-common-2.7.3-bin-master,并在Path追加%HADOOP_HOME%\bin,有可能出现如下错误: org.apache.hadoop.io.nativeio.NativeIO$Windows....
在Windows环境下安装Hadoop 3.1.0是学习和使用大数据处理技术的重要步骤。Hadoop是一个开源框架,主要用于分布式存储和处理大规模数据集。在这个过程中,我们将详细讲解Hadoop 3.1.0在Windows上的安装过程以及相关...
在IT行业中,Hadoop是一个广泛使用的开源框架,主要用于大数据处理和分布式存储。它是由Apache软件基金会开发并维护的,旨在实现高效、可扩展的数据处理能力。Hadoop的核心由两个主要组件构成:Hadoop Distributed ...
Hadoop是Apache软件基金会开发的一个开源分布式计算框架,它的核心设计是解决大数据处理的问题。Hadoop 2.7.4是Hadoop发展过程中的一个重要版本,它提供了许多增强特性和稳定性改进,使得大规模数据处理更加高效和...
Hadoop是Apache软件基金会开发的一个开源分布式计算框架,主要由HDFS(Hadoop Distributed File System)和MapReduce两大部分组成,旨在提供一种可靠、可扩展、高效的数据处理和存储解决方案。在标题中提到的...
在Hadoop生态系统中,`winutils.exe`和`hadoop.dll`是Windows环境下运行Hadoop必备的组件,尤其对于开发和测试环境来说至关重要。这里我们深入探讨这两个组件以及与Eclipse插件的相关性。 首先,`winutils.exe`是...
在IT行业中,Hadoop是一个广泛使用的开源框架,主要用于大数据处理和分布式存储。Hadoop2.6.0是这个框架的一个重要版本,它包含了多项优化和改进,以提高系统的稳定性和性能。在这个压缩包中,我们关注的是与Windows...
此文件为hadoop-2.7.7.tar.gz,可在linux下直接进行安装,如在windows上安装,则需要hadooponwindows-master.zip,用windows-master里的文件替换解压好后hadoop的bin和etc即可。Hadoop 2.7.7是一款开源的分布式计算...
在搭建Hadoop环境的过程中,经常会遇到一些特定的依赖问题,比如缺少`hadoop.dll`和`winutils.exe`这两个关键组件。本文将详细介绍这两个文件及其在Hadoop生态系统中的作用,以及如何解决它们缺失的问题。 首先,`...