转载请注明出处:
http://kevin12.iteye.com/blog/2028776
hadoop2.2环境搭建好后可以运行wordcount例子来查看一个文件中的单词数量,废话不多说,看下面的步骤:
首先在/usr/local/hadoop/下创建一个目录,是为了存放我们的测试文件,目录名称为myfile,在进入myfile中创建一个名称为wordcount.txt文件里面输入数据如下:
hello hadoop
hello java
hello world
运行命令hadoop fs -mkdir /input在hdfs中创建一个input目录;
运行命令hadoop fs -input /usr/hadoop/myfile/wordcount.txt /input/,将本地系统的wordcount.txt文件上传到hdfs的input目录中;
确保hdfs中的input目录下面没有out目录,否则会报错,将光标定位到/usr/local/hadoop/share/hadoop/mapreduce/目录中,然后运行下面的命令进行统计字母:
hadoop jar hadoop-mapreduce-examples-2.2.0.jar wordcount /input/wordcount.txt /input/out
下面是运行结束打印的结果:
[root@master mapreduce]# hadoop jar hadoop-mapreduce-examples-2.2.0.jar wordcount /input/wordcount.txt /input/out
14/03/09 19:32:19 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
14/03/09 19:32:22 INFO input.FileInputFormat: Total input paths to process : 1
14/03/09 19:32:22 INFO mapreduce.JobSubmitter: number of splits:1
14/03/09 19:32:22 INFO Configuration.deprecation: user.name is deprecated. Instead, use mapreduce.job.user.name
14/03/09 19:32:22 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
14/03/09 19:32:22 INFO Configuration.deprecation: mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class
14/03/09 19:32:22 INFO Configuration.deprecation: mapreduce.combine.class is deprecated. Instead, use mapreduce.job.combine.class
14/03/09 19:32:22 INFO Configuration.deprecation: mapreduce.map.class is deprecated. Instead, use mapreduce.job.map.class
14/03/09 19:32:22 INFO Configuration.deprecation: mapred.job.name is deprecated. Instead, use mapreduce.job.name
14/03/09 19:32:22 INFO Configuration.deprecation: mapreduce.reduce.class is deprecated. Instead, use mapreduce.job.reduce.class
14/03/09 19:32:22 INFO Configuration.deprecation: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
14/03/09 19:32:22 INFO Configuration.deprecation: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
14/03/09 19:32:22 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
14/03/09 19:32:22 INFO Configuration.deprecation: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class
14/03/09 19:32:22 INFO Configuration.deprecation: mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir
14/03/09 19:32:23 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1394289329220_0002
14/03/09 19:32:25 INFO impl.YarnClientImpl: Submitted application application_1394289329220_0002 to ResourceManager at /0.0.0.0:8032
14/03/09 19:32:25 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1394289329220_0002/
14/03/09 19:32:25 INFO mapreduce.Job: Running job: job_1394289329220_0002
14/03/09 19:32:55 INFO mapreduce.Job: Job job_1394289329220_0002 running in uber mode : false
14/03/09 19:32:55 INFO mapreduce.Job: map 0% reduce 0%
14/03/09 19:33:33 INFO mapreduce.Job: map 100% reduce 0%
14/03/09 19:33:45 INFO mapreduce.Job: map 100% reduce 100%
14/03/09 19:33:46 INFO mapreduce.Job: Job job_1394289329220_0002 completed successfully
14/03/09 19:33:47 INFO mapreduce.Job: Counters: 43
File System Counters
FILE: Number of bytes read=54
FILE: Number of bytes written=158345
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=139
HDFS: Number of bytes written=32
HDFS: Number of read operations=6
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=36121
Total time spent by all reduces in occupied slots (ms)=7030
Map-Reduce Framework
Map input records=3
Map output records=6
Map output bytes=60
Map output materialized bytes=54
Input split bytes=103
Combine input records=6
Combine output records=4
Reduce input groups=4
Reduce shuffle bytes=54
Reduce input records=4
Reduce output records=4
Spilled Records=8
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=588
CPU time spent (ms)=14810
Physical memory (bytes) snapshot=213233664
Virtual memory (bytes) snapshot=720707584
Total committed heap usage (bytes)=136908800
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=36
File Output Format Counters
Bytes Written=32
[root@master mapreduce]#
查看结果:
[root@master mapreduce]# hadoop fs -lsr /input
lsr: DEPRECATED: Please use 'ls -R' instead.
drwxr-xr-x - root supergroup 0 2014-03-09 19:33 /input/out
-rw-r--r-- 3 root supergroup 0 2014-03-09 19:33 /input/out/_SUCCESS
-rw-r--r-- 3 root supergroup 32 2014-03-09 19:33 /input/out/part-r-00000
-rw-r--r-- 3 root supergroup 36 2014-03-09 19:26 /input/wordcount.txt
[root@master mapreduce]# hadoop fs -cat /input/out/part-r-00000
hadoop 1
hello 3
java 1
world 1
[root@master mapreduce]#
成功!
转载请注明出处:
http://kevin12.iteye.com/blog/2028776
分享到:
相关推荐
在这个例子中,我们将深入理解Hadoop MapReduce的工作原理以及如何在Eclipse环境下实现WordCount源码。 1. **Hadoop MapReduce概述**: Hadoop MapReduce是由两个主要部分组成的:Map阶段和Reduce阶段。Map阶段将...
三、在Eclipse中测试例子(WordCount) 3.1 建工程 在Eclipse中创建一个新的Java项目,导入Hadoop相关库。 3.2 编写代码 编写MapReduce程序,实现WordCount功能,即统计文本文件中单词出现的次数。 3.3 运行调试 ...
Hadoop最初的设计目的是解决大规模数据处理问题,通过构建一个能够运行在廉价硬件上的分布式文件系统(HDFS)和一个分布式计算框架(MapReduce),实现了数据的高效处理。这使得企业无需依赖昂贵的专用服务器,就能...
###### 6.1 查看Hadoop自带的例子 ```bash hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.5.jar ``` 可以看到Hadoop提供的一些示例程序。 ###### 6.2 在HDFS中创建用户目录...
- 可以通过命令行或图形化界面发送邮件任务、运行wordcount等。 5. 注意事项: - 对于Oozie的使用,权限问题必须重视,特别是代理设置。 - 如果没有正确的权限配置,提交任务时会出现授权操作错误。 - 由于文档...
2.6 MapReduce 的运行过程可以通过 WordCount 的例子来解释,包括 Map 阶段和 Reduce 阶段。 2.7 使用 MapReduce 来实现 TopK 问题,可以使用自定义的Mapper 和 Reducer 来实现数据的处理和排序。 大数据工程师...
- **WordCount**:经典的词频统计例子,展示了MapReduce的基本使用方法。 - **TopN**:找出数据中出现频率最高的N项。 - **Join操作**:模拟数据库中的连接操作,用于处理多表关联的问题。 **2.3 自定义开发** ...