hadoop 单机节点集群安装
安装 ubantun 11.10 (乌班图),root:hadoop/hadoop
安装yum:sudo apt-get install yum
更改 root密码:sudo passwd root 输入两次密码即可
安装ssh:apt-get install ssh
配置ssh免密码登录
root@ubuntu:~# ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
Generating public/private dsa key pair.
Created directory '/root/.ssh'.
Your identification has been saved in /root/.ssh/id_dsa.
Your public key has been saved in /root/.ssh/id_dsa.pub.
The key fingerprint is:
be:03:05:4b:ea:dd:5e:c0:43:e7:a0:ce:13:26:5b:32 root@ubuntu
The key's randomart image is:
+--[ DSA 1024]----+
| |
| o o . |
| o * + |
| E * = . |
| . @ +So |
| o *.. . |
| +.. |
| o. |
| .. |
+-----------------+
root@ubuntu:~# cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
root@ubuntu:~# cd .ssh/
root@ubuntu:~/.ssh# ls
authorized_keys id_dsa id_dsa.pub
root@ubuntu:~/.ssh# cat authorized_keys
ssh-dss AAAAB3NzaC1kc3MAAACBANeaZTGYfT+t4C2EERqIOx6E7KOvrz/+kDSgBSMLz0rCKdXVMdVsfAmqXumfnq7StnFM2WYnz11tmGvHymmJQNrVvd4pAUFKd/oX6uCNpjXA45CMCivXNq67dBM1SDhhBzpntxLNDpU1GVniegiD2a6gFREzhdKe9a2OOA6xoECLAAAAFQDhsAFK1H0VezETQoXe3aWYnm9QRwAAAIBmE8x14azfEhaO59rF7uVlQbjKIOMcBtN9Q0tpuLNkX9bCaqVNMsTanaDXJKoW/cwboRY/32HET/Noc7uqfJZk4YY87ihaPsEeoOZ8uZpknWKDul6j3sbk1yroor16wqCA8CK2Z1Ro22lRY+WHtdeEFfFZd6vvlNxBwQTQCbKuggAAAIAB0gp4uSAY6DEiVnxLfjDx2fEqggGjIFnjfu8PoKV8tbtWpdoayYaw758gN3r3UptOWD/p9//MqVP4CYWwADi1jyTFduRJMjQX91e3VcxH2T69t0c5CJQIZEdleeW2pI1FYN3GNUoj2PD193zWINJTBdQhvW0WWs/2n2XPmqcNEw== root@ubuntu
root@ubuntu:~/.ssh# ssh localhost
The authenticity of host 'localhost (127.0.0.1)' can't be established.
ECDSA key fingerprint is 0e:1d:c9:40:67:53:db:fb:51:1b:a8:ce:35:32:29:3d.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'localhost' (ECDSA) to the list of known hosts.
Welcome to Ubuntu 11.10 (GNU/Linux 3.0.0-12-generic i686)
* Documentation: https://help.ubuntu.com/
New release '12.04.1 LTS' available.
Run 'do-release-upgrade' to upgrade to it.
The programs included with the Ubuntu system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.
Ubuntu comes with ABSOLUTELY NO WARRANTY, to the extent permitted by
applicable law.
安装rsync:apt-get install rsync
安装vsftpd:apt-get install vsftpd
启动vsftpd:service vsftpd start/stop
解压缩gz:gzip -d hadoop-1.1.0.tar.gz
解压缩tar:tar -xf hadoop-1.1.0.tar
设置JDK:
vi .profile
export JAVA_HOME=/jdk1.6
export CLASSPATH=.:$JAVA_HOME/lib:$JAVA_HOME/jre/lib
export PATH=$JAVA_HOME/bin:$JAVA_HOME/jre/bin:$PATH
设置hadoop:
修改conf/hadoop-env.sh:export JAVA_HOME=jdk1.6
修改hadoop核心配置文件core-site.xml ,这里配置的是HDFS的地址和端口:
vi conf/core-site.xml
修改hadoop中HDFS的配置hdfs-site.xml,配置的备份方式默认为3,因为安装的是单机版,所以修改为1
vi conf/hdfs-site.xml
修改hadoop中MapReduce的配置文件maprd-site.xml,配置的是JobTracker的地址和端口:
vi conf/maprd-site.xml
启动hadoop,在启动之前,需要格式化hadoop的文件系统HDFS,进度hadoop文件夹,输入下面命令:
bin/hadoop namenode -format
root@ubuntu:/hadoop-1.1.0/bin# ./hadoop namenode -format
12/10/29 18:54:58 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = ubuntu/127.0.1.1
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 1.1.0
STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.1 -r 1394289; compiled by 'hortonfo' on Thu Oct 4 22:06:49 UTC 2012
************************************************************/
12/10/29 18:54:58 INFO util.GSet: VM type = 32-bit
12/10/29 18:54:58 INFO util.GSet: 2% max memory = 19.33375 MB
12/10/29 18:54:58 INFO util.GSet: capacity = 2^22 = 4194304 entries
12/10/29 18:54:58 INFO util.GSet: recommended=4194304, actual=4194304
12/10/29 18:54:59 INFO namenode.FSNamesystem: fsOwner=root
12/10/29 18:54:59 INFO namenode.FSNamesystem: supergroup=supergroup
12/10/29 18:54:59 INFO namenode.FSNamesystem: isPermissionEnabled=true
12/10/29 18:54:59 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100
12/10/29 18:54:59 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)
12/10/29 18:54:59 INFO namenode.NameNode: Caching file names occuring more than 10 times
12/10/29 18:54:59 INFO common.Storage: Image file of size 110 saved in 0 seconds.
12/10/29 18:55:00 INFO namenode.FSEditLog: closing edit log: position=4, editlog=/tmp/hadoop-root/dfs/name/current/edits
12/10/29 18:55:00 INFO namenode.FSEditLog: close success: truncate to 4, editlog=/tmp/hadoop-root/dfs/name/current/edits
12/10/29 18:55:00 INFO common.Storage: Storage directory /tmp/hadoop-root/dfs/name has been successfully formatted.
12/10/29 18:55:00 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at ubuntu/127.0.1.1
************************************************************/
root@ubuntu:/hadoop-1.1.0/bin#
启动hadoop,输入 bin/start-all.sh,这个命令启动了所有服务。
root@ubuntu:/hadoop-1.1.0/bin# ./start-all.sh
starting namenode, logging to /hadoop-1.1.0/libexec/../logs/hadoop-root-namenode-ubuntu.out
localhost: starting datanode, logging to /hadoop-1.1.0/libexec/../logs/hadoop-root-datanode-ubuntu.out
localhost: starting secondarynamenode, logging to /hadoop-1.1.0/libexec/../logs/hadoop-root-secondarynamenode-ubuntu.out
starting jobtracker, logging to /hadoop-1.1.0/libexec/../logs/hadoop-root-jobtracker-ubuntu.out
localhost: starting tasktracker, logging to /hadoop-1.1.0/libexec/../logs/hadoop-root-tasktracker-ubuntu.out
此语句执行后会列出已启动的东西NameNode,JobTracker,SecondaryNameNode...如果NameNode没有成功启动的话就要先执行"bin/stop-all.sh"停掉所有东西,然后重新格式化namenode,再启动
验证hadoop是否安装成功,访问:
http://localhost:50030 (MapReduce的web界面)
http://localhost:50070 (这个是 HDFS的web界面 )
root@ubuntu:/jdk1.6/bin# ./jps
11325 JobTracker
13345 Jps
10826 NameNode
11036 DataNode
11541 TaskTracker
11252 SecondaryNameNode
Hadoop 从三个角度将主机划分为两种角色
1、Master 和 slave
2、从HDFS角度,主机分为:namenode 和 datanode(在分布式系统中,目录的管理是关键,管理目录的就相当于主任,而namenode就是目录管理者)
3、从MapReduce角度,主机分为:JobTracker 和 Task Tracker(一个job经常被划分为多个task,从这个角度不难理解他们之间的关系)
【跑 wordcount】
1、准备需要进行wordcount的文件
vi /tmp/test.txt
(打开后随便输入一些内容,如"mu ha ha ni da ye da ye da",然后保存退出)
2、将准备的测试文件上传到dfs文件系统中的firstTest目录下
hadoop dfs -copyFromLocal /tmp/test.txt firstTest
(注:如dfs中不包含firstTest目录的话就会自动创建一个,关于查看dfs文件系统中已有目录的指令为"hadoop dfs -ls")
3、执行wordcount
hadoop jar hadoop-mapred-example0.21.0.jar wordcount firstTest result
(注:此语句意为“对firstTest下的所有文件执行wordcount,将统计结果输出到result文件夹中”,若result文件夹不存在则会自动创建一个)
hadoop-mapred-example0.21.0.jar 在 hadoop的根目录下
root@ubuntu:/hadoop-1.1.0/bin# ./hadoop jar ../hadoop-examples-1.1.0.jar wordcount firstTest result
12/10/29 19:24:32 INFO input.FileInputFormat: Total input paths to process : 1
12/10/29 19:24:32 INFO util.NativeCodeLoader: Loaded the native-hadoop library
12/10/29 19:24:32 WARN snappy.LoadSnappy: Snappy native library not loaded
12/10/29 19:24:33 INFO mapred.JobClient: Running job: job_201210291856_0001
12/10/29 19:24:34 INFO mapred.JobClient: map 0% reduce 0%
12/10/29 19:24:52 INFO mapred.JobClient: map 100% reduce 0%
12/10/29 19:25:03 INFO mapred.JobClient: map 100% reduce 100%
12/10/29 19:25:04 INFO mapred.JobClient: Job complete: job_201210291856_0001
12/10/29 19:25:04 INFO mapred.JobClient: Counters: 29
12/10/29 19:25:04 INFO mapred.JobClient: Job Counters
12/10/29 19:25:04 INFO mapred.JobClient: Launched reduce tasks=1
12/10/29 19:25:04 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=16020
12/10/29 19:25:04 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
12/10/29 19:25:04 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
12/10/29 19:25:04 INFO mapred.JobClient: Launched map tasks=1
12/10/29 19:25:04 INFO mapred.JobClient: Data-local map tasks=1
12/10/29 19:25:04 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=11306
12/10/29 19:25:04 INFO mapred.JobClient: File Output Format Counters
12/10/29 19:25:04 INFO mapred.JobClient: Bytes Written=26
12/10/29 19:25:04 INFO mapred.JobClient: FileSystemCounters
12/10/29 19:25:04 INFO mapred.JobClient: FILE_BYTES_READ=52
12/10/29 19:25:04 INFO mapred.JobClient: HDFS_BYTES_READ=134
12/10/29 19:25:04 INFO mapred.JobClient: FILE_BYTES_WRITTEN=47699
12/10/29 19:25:04 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=26
12/10/29 19:25:04 INFO mapred.JobClient: File Input Format Counters
12/10/29 19:25:04 INFO mapred.JobClient: Bytes Read=28
12/10/29 19:25:04 INFO mapred.JobClient: Map-Reduce Framework
12/10/29 19:25:04 INFO mapred.JobClient: Map output materialized bytes=52
12/10/29 19:25:04 INFO mapred.JobClient: Map input records=1
12/10/29 19:25:04 INFO mapred.JobClient: Reduce shuffle bytes=52
12/10/29 19:25:04 INFO mapred.JobClient: Spilled Records=10
12/10/29 19:25:04 INFO mapred.JobClient: Map output bytes=64
12/10/29 19:25:04 INFO mapred.JobClient: CPU time spent (ms)=6830
12/10/29 19:25:04 INFO mapred.JobClient: Total committed heap usage (bytes)=210698240
12/10/29 19:25:04 INFO mapred.JobClient: Combine input records=9
12/10/29 19:25:04 INFO mapred.JobClient: SPLIT_RAW_BYTES=106
12/10/29 19:25:04 INFO mapred.JobClient: Reduce input records=5
12/10/29 19:25:04 INFO mapred.JobClient: Reduce input groups=5
12/10/29 19:25:04 INFO mapred.JobClient: Combine output records=5
12/10/29 19:25:04 INFO mapred.JobClient: Physical memory (bytes) snapshot=181235712
12/10/29 19:25:04 INFO mapred.JobClient: Reduce output records=5
12/10/29 19:25:04 INFO mapred.JobClient: Virtual memory (bytes) snapshot=751153152
12/10/29 19:25:04 INFO mapred.JobClient: Map output records=9
4、查看结果
hadoop dfs -cat result/part-r-00000
(注:结果文件默认是输出到一个名为“part-r-*****”的文件中的,可用指令“hadoop dfs -ls result”查看result目录下包含哪些文件)
安装yum:sudo apt-get install yum
更改 root密码:sudo passwd root 输入两次密码即可
安装ssh:apt-get install ssh
配置ssh免密码登录
root@ubuntu:~# ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
Generating public/private dsa key pair.
Created directory '/root/.ssh'.
Your identification has been saved in /root/.ssh/id_dsa.
Your public key has been saved in /root/.ssh/id_dsa.pub.
The key fingerprint is:
be:03:05:4b:ea:dd:5e:c0:43:e7:a0:ce:13:26:5b:32 root@ubuntu
The key's randomart image is:
+--[ DSA 1024]----+
| |
| o o . |
| o * + |
| E * = . |
| . @ +So |
| o *.. . |
| +.. |
| o. |
| .. |
+-----------------+
root@ubuntu:~# cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
root@ubuntu:~# cd .ssh/
root@ubuntu:~/.ssh# ls
authorized_keys id_dsa id_dsa.pub
root@ubuntu:~/.ssh# cat authorized_keys
ssh-dss AAAAB3NzaC1kc3MAAACBANeaZTGYfT+t4C2EERqIOx6E7KOvrz/+kDSgBSMLz0rCKdXVMdVsfAmqXumfnq7StnFM2WYnz11tmGvHymmJQNrVvd4pAUFKd/oX6uCNpjXA45CMCivXNq67dBM1SDhhBzpntxLNDpU1GVniegiD2a6gFREzhdKe9a2OOA6xoECLAAAAFQDhsAFK1H0VezETQoXe3aWYnm9QRwAAAIBmE8x14azfEhaO59rF7uVlQbjKIOMcBtN9Q0tpuLNkX9bCaqVNMsTanaDXJKoW/cwboRY/32HET/Noc7uqfJZk4YY87ihaPsEeoOZ8uZpknWKDul6j3sbk1yroor16wqCA8CK2Z1Ro22lRY+WHtdeEFfFZd6vvlNxBwQTQCbKuggAAAIAB0gp4uSAY6DEiVnxLfjDx2fEqggGjIFnjfu8PoKV8tbtWpdoayYaw758gN3r3UptOWD/p9//MqVP4CYWwADi1jyTFduRJMjQX91e3VcxH2T69t0c5CJQIZEdleeW2pI1FYN3GNUoj2PD193zWINJTBdQhvW0WWs/2n2XPmqcNEw== root@ubuntu
root@ubuntu:~/.ssh# ssh localhost
The authenticity of host 'localhost (127.0.0.1)' can't be established.
ECDSA key fingerprint is 0e:1d:c9:40:67:53:db:fb:51:1b:a8:ce:35:32:29:3d.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'localhost' (ECDSA) to the list of known hosts.
Welcome to Ubuntu 11.10 (GNU/Linux 3.0.0-12-generic i686)
* Documentation: https://help.ubuntu.com/
New release '12.04.1 LTS' available.
Run 'do-release-upgrade' to upgrade to it.
The programs included with the Ubuntu system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.
Ubuntu comes with ABSOLUTELY NO WARRANTY, to the extent permitted by
applicable law.
安装rsync:apt-get install rsync
安装vsftpd:apt-get install vsftpd
启动vsftpd:service vsftpd start/stop
解压缩gz:gzip -d hadoop-1.1.0.tar.gz
解压缩tar:tar -xf hadoop-1.1.0.tar
设置JDK:
vi .profile
export JAVA_HOME=/jdk1.6
export CLASSPATH=.:$JAVA_HOME/lib:$JAVA_HOME/jre/lib
export PATH=$JAVA_HOME/bin:$JAVA_HOME/jre/bin:$PATH
设置hadoop:
修改conf/hadoop-env.sh:export JAVA_HOME=jdk1.6
修改hadoop核心配置文件core-site.xml ,这里配置的是HDFS的地址和端口:
vi conf/core-site.xml
<configuration> <property> <name>fs.default.name</name> <value>hdfs://localhost:9000</value> </property> <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/home/hadoop/tmp</value> </property> </configuration>
修改hadoop中HDFS的配置hdfs-site.xml,配置的备份方式默认为3,因为安装的是单机版,所以修改为1
vi conf/hdfs-site.xml
<configuration> <property> <name>dfs.replication</name> <value>1</value> </property> </configuration>
修改hadoop中MapReduce的配置文件maprd-site.xml,配置的是JobTracker的地址和端口:
vi conf/maprd-site.xml
<configuration> <property> <name>mapred.job.tracker</name> <value>localhost:9001</value> </property> </configuration>
启动hadoop,在启动之前,需要格式化hadoop的文件系统HDFS,进度hadoop文件夹,输入下面命令:
bin/hadoop namenode -format
root@ubuntu:/hadoop-1.1.0/bin# ./hadoop namenode -format
12/10/29 18:54:58 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = ubuntu/127.0.1.1
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 1.1.0
STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.1 -r 1394289; compiled by 'hortonfo' on Thu Oct 4 22:06:49 UTC 2012
************************************************************/
12/10/29 18:54:58 INFO util.GSet: VM type = 32-bit
12/10/29 18:54:58 INFO util.GSet: 2% max memory = 19.33375 MB
12/10/29 18:54:58 INFO util.GSet: capacity = 2^22 = 4194304 entries
12/10/29 18:54:58 INFO util.GSet: recommended=4194304, actual=4194304
12/10/29 18:54:59 INFO namenode.FSNamesystem: fsOwner=root
12/10/29 18:54:59 INFO namenode.FSNamesystem: supergroup=supergroup
12/10/29 18:54:59 INFO namenode.FSNamesystem: isPermissionEnabled=true
12/10/29 18:54:59 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100
12/10/29 18:54:59 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)
12/10/29 18:54:59 INFO namenode.NameNode: Caching file names occuring more than 10 times
12/10/29 18:54:59 INFO common.Storage: Image file of size 110 saved in 0 seconds.
12/10/29 18:55:00 INFO namenode.FSEditLog: closing edit log: position=4, editlog=/tmp/hadoop-root/dfs/name/current/edits
12/10/29 18:55:00 INFO namenode.FSEditLog: close success: truncate to 4, editlog=/tmp/hadoop-root/dfs/name/current/edits
12/10/29 18:55:00 INFO common.Storage: Storage directory /tmp/hadoop-root/dfs/name has been successfully formatted.
12/10/29 18:55:00 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at ubuntu/127.0.1.1
************************************************************/
root@ubuntu:/hadoop-1.1.0/bin#
启动hadoop,输入 bin/start-all.sh,这个命令启动了所有服务。
root@ubuntu:/hadoop-1.1.0/bin# ./start-all.sh
starting namenode, logging to /hadoop-1.1.0/libexec/../logs/hadoop-root-namenode-ubuntu.out
localhost: starting datanode, logging to /hadoop-1.1.0/libexec/../logs/hadoop-root-datanode-ubuntu.out
localhost: starting secondarynamenode, logging to /hadoop-1.1.0/libexec/../logs/hadoop-root-secondarynamenode-ubuntu.out
starting jobtracker, logging to /hadoop-1.1.0/libexec/../logs/hadoop-root-jobtracker-ubuntu.out
localhost: starting tasktracker, logging to /hadoop-1.1.0/libexec/../logs/hadoop-root-tasktracker-ubuntu.out
此语句执行后会列出已启动的东西NameNode,JobTracker,SecondaryNameNode...如果NameNode没有成功启动的话就要先执行"bin/stop-all.sh"停掉所有东西,然后重新格式化namenode,再启动
验证hadoop是否安装成功,访问:
http://localhost:50030 (MapReduce的web界面)
http://localhost:50070 (这个是 HDFS的web界面 )
root@ubuntu:/jdk1.6/bin# ./jps
11325 JobTracker
13345 Jps
10826 NameNode
11036 DataNode
11541 TaskTracker
11252 SecondaryNameNode
Hadoop 从三个角度将主机划分为两种角色
1、Master 和 slave
2、从HDFS角度,主机分为:namenode 和 datanode(在分布式系统中,目录的管理是关键,管理目录的就相当于主任,而namenode就是目录管理者)
3、从MapReduce角度,主机分为:JobTracker 和 Task Tracker(一个job经常被划分为多个task,从这个角度不难理解他们之间的关系)
【跑 wordcount】
1、准备需要进行wordcount的文件
vi /tmp/test.txt
(打开后随便输入一些内容,如"mu ha ha ni da ye da ye da",然后保存退出)
2、将准备的测试文件上传到dfs文件系统中的firstTest目录下
hadoop dfs -copyFromLocal /tmp/test.txt firstTest
(注:如dfs中不包含firstTest目录的话就会自动创建一个,关于查看dfs文件系统中已有目录的指令为"hadoop dfs -ls")
3、执行wordcount
hadoop jar hadoop-mapred-example0.21.0.jar wordcount firstTest result
(注:此语句意为“对firstTest下的所有文件执行wordcount,将统计结果输出到result文件夹中”,若result文件夹不存在则会自动创建一个)
hadoop-mapred-example0.21.0.jar 在 hadoop的根目录下
root@ubuntu:/hadoop-1.1.0/bin# ./hadoop jar ../hadoop-examples-1.1.0.jar wordcount firstTest result
12/10/29 19:24:32 INFO input.FileInputFormat: Total input paths to process : 1
12/10/29 19:24:32 INFO util.NativeCodeLoader: Loaded the native-hadoop library
12/10/29 19:24:32 WARN snappy.LoadSnappy: Snappy native library not loaded
12/10/29 19:24:33 INFO mapred.JobClient: Running job: job_201210291856_0001
12/10/29 19:24:34 INFO mapred.JobClient: map 0% reduce 0%
12/10/29 19:24:52 INFO mapred.JobClient: map 100% reduce 0%
12/10/29 19:25:03 INFO mapred.JobClient: map 100% reduce 100%
12/10/29 19:25:04 INFO mapred.JobClient: Job complete: job_201210291856_0001
12/10/29 19:25:04 INFO mapred.JobClient: Counters: 29
12/10/29 19:25:04 INFO mapred.JobClient: Job Counters
12/10/29 19:25:04 INFO mapred.JobClient: Launched reduce tasks=1
12/10/29 19:25:04 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=16020
12/10/29 19:25:04 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
12/10/29 19:25:04 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
12/10/29 19:25:04 INFO mapred.JobClient: Launched map tasks=1
12/10/29 19:25:04 INFO mapred.JobClient: Data-local map tasks=1
12/10/29 19:25:04 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=11306
12/10/29 19:25:04 INFO mapred.JobClient: File Output Format Counters
12/10/29 19:25:04 INFO mapred.JobClient: Bytes Written=26
12/10/29 19:25:04 INFO mapred.JobClient: FileSystemCounters
12/10/29 19:25:04 INFO mapred.JobClient: FILE_BYTES_READ=52
12/10/29 19:25:04 INFO mapred.JobClient: HDFS_BYTES_READ=134
12/10/29 19:25:04 INFO mapred.JobClient: FILE_BYTES_WRITTEN=47699
12/10/29 19:25:04 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=26
12/10/29 19:25:04 INFO mapred.JobClient: File Input Format Counters
12/10/29 19:25:04 INFO mapred.JobClient: Bytes Read=28
12/10/29 19:25:04 INFO mapred.JobClient: Map-Reduce Framework
12/10/29 19:25:04 INFO mapred.JobClient: Map output materialized bytes=52
12/10/29 19:25:04 INFO mapred.JobClient: Map input records=1
12/10/29 19:25:04 INFO mapred.JobClient: Reduce shuffle bytes=52
12/10/29 19:25:04 INFO mapred.JobClient: Spilled Records=10
12/10/29 19:25:04 INFO mapred.JobClient: Map output bytes=64
12/10/29 19:25:04 INFO mapred.JobClient: CPU time spent (ms)=6830
12/10/29 19:25:04 INFO mapred.JobClient: Total committed heap usage (bytes)=210698240
12/10/29 19:25:04 INFO mapred.JobClient: Combine input records=9
12/10/29 19:25:04 INFO mapred.JobClient: SPLIT_RAW_BYTES=106
12/10/29 19:25:04 INFO mapred.JobClient: Reduce input records=5
12/10/29 19:25:04 INFO mapred.JobClient: Reduce input groups=5
12/10/29 19:25:04 INFO mapred.JobClient: Combine output records=5
12/10/29 19:25:04 INFO mapred.JobClient: Physical memory (bytes) snapshot=181235712
12/10/29 19:25:04 INFO mapred.JobClient: Reduce output records=5
12/10/29 19:25:04 INFO mapred.JobClient: Virtual memory (bytes) snapshot=751153152
12/10/29 19:25:04 INFO mapred.JobClient: Map output records=9
4、查看结果
hadoop dfs -cat result/part-r-00000
(注:结果文件默认是输出到一个名为“part-r-*****”的文件中的,可用指令“hadoop dfs -ls result”查看result目录下包含哪些文件)
相关推荐
### Hadoop单机安装与集群安装相关知识点 #### 一、文档概述 本文档主要针对Hadoop的安装和配置过程进行了详细的说明,旨在简化Hadoop的部署流程。它覆盖了Hadoop单机安装以及集群安装的过程,并且适用于CentOS 5/...
### Hadoop单机与集群部署知识点 #### 一、Hadoop单机系统版本安装配置 ...以上步骤详细介绍了Hadoop单机与集群部署的具体流程,包括单机安装配置、集群部署所需的各种准备以及具体配置方法,为读者提供了全面的指导。
四、Hadoop单机模式验证 - 运行Hadoop的HDFS和MapReduce示例程序,如WordCount,验证Hadoop是否正常工作。 五、Hadoop集群配置 - 在多台机器上重复上述步骤,确保每台机器都有相同的Hadoop配置。 - 配置各节点间...
4. 分布式模式安装:使用多个节点构成集群环境来运行 Hadoop。 5. 安装 SSH 服务器:Ubuntu 默认已安装了 SSH 客户端,此外还需要安装 SSH 服务器,以便在单节点模式和分布式模式下使用 SSH 登录。 6. 配置 SSH 无...
实验主题:武汉理工大学云计算应用 - Hadoop单机模式与伪分布式 **实验目的与意义:** 本次实验旨在让学生理解并掌握Hadoop的两种基础运行模式:单机模式和伪分布式模式。Hadoop是云计算领域的重要组件,主要用于大...
需要注意的是,Hadoop伪分布式集群虽然只在一台机器上模拟了多个节点,但它能帮助开发者理解Hadoop的架构和基本工作原理,是单机学习和测试Hadoop的常用方式。然而,在生产环境中,通常会部署完全分布式集群,以实现...
#### 三、Hadoop单机集群配置步骤详解 ##### 1. 设置固定IP 为了保证网络通信稳定,首先需要设置虚拟机的静态IP。这一步骤中,`DEVICE`的值应与`/etc/udev/rules.d/70-persistent-net.rules`文件中的`NAME`值相对应...
本教程将指导你如何在不同的模式下安装和配置 Hadoop,包括单机模式、伪分布式模式和分布式集群模式。 首先,我们从**SSH免密登录**开始。SSH (Secure Shell) 是用于远程访问Linux主机的安全协议。在Hadoop环境中,...
### CDH5 Hadoop集群完全离线安装说明 #### 系统环境 - 操作系统:CentOS 6.5 64位 #### 必备软件与工具 - JDK 1.8.0.111 安装包 - 方法一:使用 `.rpm` 文件 `jdk-8u111-linux-x64.rpm` - 方法二:使用 `.tar....
标题 "Hadoop Zookeeper HBase集群" 涉及到的是大数据处理领域中的三个关键组件:Hadoop、Zookeeper和HBase。这三个组件在构建分布式系统中起着至关重要的作用。 首先,Hadoop是Apache软件基金会开发的一个开源框架...
【标题】:Ubuntu环境下Hadoop单机模式安装详解 【描述】:本文档详细介绍了如何在Ubuntu操作系统上从零开始安装Hadoop,包括Linux安装、创建Hadoop用户组和用户、JDK安装、修改机器名、SSH服务安装以及实现SSH无...
二、Hadoop单机安装 1. **环境准备**:确保系统为Linux环境,如CentOS,并安装Java运行环境,因为Hadoop依赖Java。 2. **下载Hadoop**:访问Apache官网获取最新稳定版本的Hadoop。 3. **解压并配置环境变量**:将...
【Hadoop集群安装配置教程_Hadoop2.6.0】是针对在Ubuntu和CentOS操作系统上搭建Hadoop集群的详细指南。本教程适用于Hadoop 2.x系列版本,特别是Hadoop 2.6.0及类似版本如Hadoop 2.7.1、Hadoop 2.4.1。它假设读者已经...
教程:在linux虚拟机下(centos),通过docker容器,部署hadoop集群。一个master节点和三个slave节点。
Hadoop3.2.1 分布式集群安装 1.准备环境: 3 台 linux 虚拟机(CentOs7.x 64 位) ip 为: 192.168.2.100 192.168.2.101 192.168.2.102 对应的 hostname 为: hadoop100 hadoop101 hadoop102 2.虚拟机基本环境配置: ...
本教程详细介绍了如何在Ubuntu Linux环境下搭建一个基于Hadoop的单节点集群。通过学习这些步骤,读者不仅能够了解Hadoop的基本架构和配置方法,还能够对如何管理和监控Hadoop集群有更深入的理解。此外,读者还可以...
在IT行业中,Hadoop是一个广泛使用的开源框架,主要...在实际生产环境中,为了实现更高的可用性和容错性,通常会搭建多节点的完全分布式集群。但作为初学者,伪分布式环境足以满足学习需求,并且可以有效降低入门门槛。