- 浏览: 563025 次
- 性别:
- 来自: 济南
文章分类
- 全部博客 (270)
- Ask chenwq (10)
- JSF (2)
- ExtJS (5)
- Life (19)
- jQuery (5)
- ASP (7)
- JavaScript (5)
- SQL Server (1)
- MySQL (4)
- En (1)
- development tools (14)
- Data mining related (35)
- Hadoop (33)
- Oracle (13)
- To Do (2)
- SSO (2)
- work/study diary (10)
- SOA (6)
- Ubuntu (7)
- J2SE (18)
- NetWorks (1)
- Struts2 (2)
- algorithm (9)
- funny (1)
- BMP (1)
- Paper Reading (2)
- MapReduce (23)
- Weka (3)
- web design (1)
- Data visualisation&R (1)
- Mahout (7)
- Social Recommendation (1)
- statistical methods (1)
- Git&GitHub (1)
- Python (1)
- Linux (1)
最新评论
-
brandNewUser:
楼主你好,问个问题,为什么我写的如下的:JobConf pha ...
Hadoop ChainMap -
Molisa:
Molisa 写道mapred.min.split.size指 ...
Hadoop MapReduce Job性能调优——修改Map和Reduce个数 -
Molisa:
mapred.min.split.size指的是block数, ...
Hadoop MapReduce Job性能调优——修改Map和Reduce个数 -
heyongcs:
请问导入之后,那些错误怎么解决?
Eclipse导入Mahout -
a420144030:
看了你的文章深受启发,想请教你几个问题我的数据都放到hbase ...
Mahout clustering Canopy+K-means 源码分析
持之以恒,但求对MapReduce有所觉悟
理论学习:
http://hadooptutorial.wikispaces.com
http://developer.yahoo.com/hadoop/tutorial/module4.html
实践学习:
执行倒排索引程序:
本段代码是Yahoo! Hadoop tutorial的module4——MapReduce最后面的代码
1、从Eclipse中导出Jar包LineIndexer.jar
2、将所有处理的文件上传到HDFS
root@ubuntu:/# hadoop dfs -put *.txt /user/root/input root@ubuntu:/# hadoop dfs -ls /user/root/input Found 3 items -rw-r--r-- 1 root supergroup 569218 2012-01-15 19:46 /user/root/input/All's Well That Ends Well.txt -rw-r--r-- 1 root supergroup 569218 2012-01-15 19:46 /user/root/input/As You Like It.txt -rw-r--r-- 1 root supergroup 569218 2012-01-15 19:46 /user/root/input/The Comedy of Errors.txt
3、执行jar包
root@ubuntu:/usr/hadoop-0.20.2/chenwq# hadoop jar LineIndexer.jar /user/root/input /user/root/output
4、查看Hadoop状态
http://localhost:50030/ - Hadoop 管理介面 http://localhost:50060/ - Hadoop Task Tracker 状态 http://localhost:50070/ - Hadoop DFS 状态
5、输出结果
12/01/16 04:53:14 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 12/01/16 04:53:14 INFO mapred.FileInputFormat: Total input paths to process : 14 12/01/16 04:53:15 INFO mapred.JobClient: Running job: job_201201150129_0001 12/01/16 04:53:16 INFO mapred.JobClient: map 0% reduce 0% 12/01/16 04:53:38 INFO mapred.JobClient: map 1% reduce 0% 12/01/16 04:53:44 INFO mapred.JobClient: map 2% reduce 0% 12/01/16 04:53:50 INFO mapred.JobClient: map 3% reduce 0% 12/01/16 04:53:56 INFO mapred.JobClient: map 4% reduce 0% 12/01/16 04:54:08 INFO mapred.JobClient: map 5% reduce 0% 12/01/16 04:54:11 INFO mapred.JobClient: map 6% reduce 0% 12/01/16 04:54:58 INFO mapred.JobClient: map 7% reduce 0% 12/01/16 04:55:06 INFO mapred.JobClient: map 7% reduce 1% 12/01/16 04:55:12 INFO mapred.JobClient: map 7% reduce 2% 12/01/16 04:55:15 INFO mapred.JobClient: map 8% reduce 2% 12/01/16 04:55:21 INFO mapred.JobClient: map 9% reduce 2% 12/01/16 04:55:27 INFO mapred.JobClient: map 10% reduce 2% 12/01/16 04:55:33 INFO mapred.JobClient: map 11% reduce 2% 12/01/16 04:55:42 INFO mapred.JobClient: map 12% reduce 2% 12/01/16 04:55:48 INFO mapred.JobClient: map 13% reduce 2% 12/01/16 04:56:23 INFO mapred.JobClient: map 13% reduce 3% 12/01/16 04:56:26 INFO mapred.JobClient: map 13% reduce 4% 12/01/16 04:56:38 INFO mapred.JobClient: map 14% reduce 4% 12/01/16 04:56:41 INFO mapred.JobClient: map 15% reduce 4% 12/01/16 04:56:47 INFO mapred.JobClient: map 16% reduce 4% 12/01/16 04:56:53 INFO mapred.JobClient: map 17% reduce 4% 12/01/16 04:56:59 INFO mapred.JobClient: map 18% reduce 4% 12/01/16 04:57:05 INFO mapred.JobClient: map 19% reduce 4% 12/01/16 04:57:11 INFO mapred.JobClient: map 20% reduce 4% 12/01/16 04:57:48 INFO mapred.JobClient: map 20% reduce 5% 12/01/16 04:57:51 INFO mapred.JobClient: map 20% reduce 6% 12/01/16 04:57:54 INFO mapred.JobClient: map 21% reduce 6% 12/01/16 04:58:00 INFO mapred.JobClient: map 22% reduce 6% 12/01/16 04:58:06 INFO mapred.JobClient: map 23% reduce 6% 12/01/16 04:58:12 INFO mapred.JobClient: map 24% reduce 6% 12/01/16 04:58:18 INFO mapred.JobClient: map 25% reduce 6% 12/01/16 04:58:24 INFO mapred.JobClient: map 26% reduce 6% 12/01/16 04:59:05 INFO mapred.JobClient: map 26% reduce 7% 12/01/16 04:59:12 INFO mapred.JobClient: map 26% reduce 8% 12/01/16 04:59:15 INFO mapred.JobClient: map 27% reduce 8% 12/01/16 04:59:21 INFO mapred.JobClient: map 28% reduce 8% 12/01/16 04:59:27 INFO mapred.JobClient: map 29% reduce 8% 12/01/16 04:59:33 INFO mapred.JobClient: map 30% reduce 8% 12/01/16 04:59:36 INFO mapred.JobClient: map 31% reduce 8% 12/01/16 04:59:42 INFO mapred.JobClient: map 32% reduce 8% 12/01/16 04:59:48 INFO mapred.JobClient: map 33% reduce 8% 12/01/16 05:00:30 INFO mapred.JobClient: map 33% reduce 10% 12/01/16 05:00:33 INFO mapred.JobClient: map 34% reduce 10% 12/01/16 05:00:36 INFO mapred.JobClient: map 34% reduce 11% 12/01/16 05:00:42 INFO mapred.JobClient: map 35% reduce 11% 12/01/16 05:00:48 INFO mapred.JobClient: map 36% reduce 11% 12/01/16 05:00:54 INFO mapred.JobClient: map 37% reduce 11% 12/01/16 05:01:00 INFO mapred.JobClient: map 38% reduce 11% 12/01/16 05:01:06 INFO mapred.JobClient: map 39% reduce 11% 12/01/16 05:01:12 INFO mapred.JobClient: map 40% reduce 11% 12/01/16 05:01:55 INFO mapred.JobClient: map 40% reduce 12% 12/01/16 05:01:58 INFO mapred.JobClient: map 41% reduce 13% 12/01/16 05:02:04 INFO mapred.JobClient: map 42% reduce 13% 12/01/16 05:02:10 INFO mapred.JobClient: map 43% reduce 13% 12/01/16 05:02:16 INFO mapred.JobClient: map 44% reduce 13% 12/01/16 05:02:22 INFO mapred.JobClient: map 45% reduce 13% 12/01/16 05:02:28 INFO mapred.JobClient: map 46% reduce 13% 12/01/16 05:03:15 INFO mapred.JobClient: map 46% reduce 14% 12/01/16 05:03:19 INFO mapred.JobClient: map 46% reduce 15% 12/01/16 05:03:22 INFO mapred.JobClient: map 47% reduce 15% 12/01/16 05:03:30 INFO mapred.JobClient: map 48% reduce 15% 12/01/16 05:03:36 INFO mapred.JobClient: map 49% reduce 15% 12/01/16 05:03:42 INFO mapred.JobClient: map 50% reduce 15% 12/01/16 05:03:48 INFO mapred.JobClient: map 51% reduce 15% 12/01/16 05:03:54 INFO mapred.JobClient: map 52% reduce 15% 12/01/16 05:04:00 INFO mapred.JobClient: map 53% reduce 15% 12/01/16 05:04:37 INFO mapred.JobClient: map 56% reduce 15% 12/01/16 05:04:40 INFO mapred.JobClient: map 56% reduce 16% 12/01/16 05:04:43 INFO mapred.JobClient: map 57% reduce 17% 12/01/16 05:04:46 INFO mapred.JobClient: map 60% reduce 17% 12/01/16 05:04:49 INFO mapred.JobClient: map 61% reduce 17% 12/01/16 05:04:52 INFO mapred.JobClient: map 64% reduce 20% 12/01/16 05:04:55 INFO mapred.JobClient: map 65% reduce 20% 12/01/16 05:04:58 INFO mapred.JobClient: map 68% reduce 20% 12/01/16 05:05:01 INFO mapred.JobClient: map 69% reduce 21% 12/01/16 05:05:04 INFO mapred.JobClient: map 73% reduce 21% 12/01/16 05:05:07 INFO mapred.JobClient: map 73% reduce 22% 12/01/16 05:05:10 INFO mapred.JobClient: map 76% reduce 22% 12/01/16 05:05:16 INFO mapred.JobClient: map 80% reduce 23% 12/01/16 05:05:22 INFO mapred.JobClient: map 83% reduce 25% 12/01/16 05:05:25 INFO mapred.JobClient: map 90% reduce 25% 12/01/16 05:05:31 INFO mapred.JobClient: map 96% reduce 27% 12/01/16 05:05:34 INFO mapred.JobClient: map 100% reduce 30% 12/01/16 05:05:44 INFO mapred.JobClient: map 100% reduce 33% 12/01/16 05:06:45 INFO mapred.JobClient: map 100% reduce 66% 12/01/16 05:06:48 INFO mapred.JobClient: map 100% reduce 67% 12/01/16 05:06:51 INFO mapred.JobClient: map 100% reduce 68% 12/01/16 05:06:57 INFO mapred.JobClient: map 100% reduce 69%
import java.io.IOException; import java.util.Iterator; import java.util.StringTokenizer; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapred.FileInputFormat; import org.apache.hadoop.mapred.FileOutputFormat; import org.apache.hadoop.mapred.FileSplit; import org.apache.hadoop.mapred.JobClient; import org.apache.hadoop.mapred.JobConf; import org.apache.hadoop.mapred.MapReduceBase; import org.apache.hadoop.mapred.Mapper; import org.apache.hadoop.mapred.OutputCollector; import org.apache.hadoop.mapred.Reducer; import org.apache.hadoop.mapred.Reporter; public class LineIndexer { public static class LineIndexMapper extends MapReduceBase implements Mapper<LongWritable, Text, Text, Text> { private final static Text word = new Text(); private final static Text location = new Text(); public void map(LongWritable key, Text val, OutputCollector<Text, Text> output, Reporter reporter) throws IOException { FileSplit fileSplit = (FileSplit)reporter.getInputSplit(); String fileName = fileSplit.getPath().getName(); location.set(fileName); String line = val.toString(); StringTokenizer itr = new StringTokenizer(line.toLowerCase()); while (itr.hasMoreTokens()) { word.set(itr.nextToken()); output.collect(word, location); } } } public static class LineIndexReducer extends MapReduceBase implements Reducer<Text, Text, Text, Text> { public void reduce(Text key, Iterator<Text> values, OutputCollector<Text, Text> output, Reporter reporter) throws IOException { boolean first = true; StringBuilder toReturn = new StringBuilder(); while (values.hasNext()){ if (!first) toReturn.append(", "); first=false; toReturn.append(values.next().toString()); } output.collect(key, new Text(toReturn.toString())); } } /** * The actual main() method for our program; this is the * "driver" for the MapReduce job. */ public static void main(String[] args) { JobClient client = new JobClient(); JobConf conf = new JobConf(LineIndexer.class); conf.setJobName("LineIndexer"); conf.setOutputKeyClass(Text.class); conf.setOutputValueClass(Text.class); FileInputFormat.addInputPath(conf, new Path(args[0])); FileOutputFormat.setOutputPath(conf, new Path(args[1])); conf.setMapperClass(LineIndexMapper.class); conf.setReducerClass(LineIndexReducer.class); client.setConf(conf); try { JobClient.runJob(conf); } catch (Exception e) { e.printStackTrace(); } } }
评论
2 楼
chenwq
2012-03-20
hdfs://localhost:9000/user/cmzx3444/input hdfs://localhost:9000/user/cmzx3444/output012
1 楼
chenwq
2012-02-16
上面程序只是个入门的Demo,
想着继续把Hadoop搞起的。
从使用MapReduce做日志分析开始。
找了IBM Developers相关文档,可以照着做一遍的。
http://www.ibm.com/developerworks/cn/java/java-lo-mapreduce/#icomments
百度研发部官方博客http://stblog.baidu-tech.com/?p=310
想着继续把Hadoop搞起的。
从使用MapReduce做日志分析开始。
找了IBM Developers相关文档,可以照着做一遍的。
http://www.ibm.com/developerworks/cn/java/java-lo-mapreduce/#icomments
百度研发部官方博客http://stblog.baidu-tech.com/?p=310
发表评论
-
Parallel K-Means Clustering Based on MapReduce
2012-08-04 20:28 1403K-means is a pleasingly paral ... -
Pagerank在Hadoop上的实现原理
2012-07-19 16:04 1460转自:pagerank 在 hadoop 上的实现原理 ... -
Including external jars in a Hadoop job
2012-06-25 20:24 1219办法1: 把所有的第三方jar和自己的class打成一个大 ... -
[转]BSP模型与实例分析(一)
2012-06-15 22:26 0一、BSP模型概念 BSP(Bulk Synchr ... -
Hadoop中两表JOIN的处理方法
2012-05-29 10:35 9631. 概述 在传统数据库(如:MYSQL)中,JOIN ... -
Hadoop DistributedCache
2012-05-27 23:45 1126Hadoop的DistributedCache,可以把 ... -
MapReduce,组合式,迭代式,链式
2012-05-27 23:27 23871.迭代式mapreduce 一些复杂的任务难以用一 ... -
Hadoop ChainMap
2012-05-27 23:09 1986单一MapReduce对一些非常简单的问题提供了很好的支持。 ... -
广度优先BFS的MapReduce实现
2012-05-25 21:47 4312社交网络中的图模型经常需要构造一棵树型结构:从一个特定的节点出 ... -
HADOOP程序日志
2012-05-23 19:53 1015*.log日志文件和*.out日志文件 进入Hadoo ... -
TFIDF based on MapReduce
2012-05-23 11:58 951Job1: Map: input: ( ... -
个人Hadoop 错误列表
2012-05-23 11:31 1490错误1:Too many fetch-failure ... -
Hadoop Map&Reduce个数优化设置以及JVM重用
2012-05-22 11:29 2430Hadoop与JVM重用对应的参数是map ... -
有空读下
2012-05-20 23:59 0MapReduce: JT默认task scheduli ... -
Hadoop MapReduce Job性能调优——修改Map和Reduce个数
2012-05-20 23:46 26754map task的数量即mapred ... -
Hadoop用于和Map Reduce作业交互的命令
2012-05-20 16:02 1225用法:hadoop job [GENERIC_OPTION ... -
Eclipse:Run on Hadoop 没有反应
2012-05-20 11:46 1277原因: hadoop-0.20.2下自带的eclise ... -
Hadoop0.20+ custom MultipleOutputFormat
2012-05-20 11:46 1540Hadoop0.20.2中无法使用MultipleOutput ... -
Custom KeyValueTextInputFormat
2012-05-19 16:23 1715在看老版的API时,发现旧的KeyValueTextInpu ... -
Hadoop SequenceFile Writer And Reader
2012-05-19 15:22 2067package cn.edu.xmu.dm.mpdemo ...
相关推荐
### MapReduce 学习笔记概览 #### 一、MapReduce 概述 MapReduce 是一种编程模型,用于大规模数据集(通常是分布在计算机集群上的数据)的并行运算。概念"Map(映射)"和"Reduce(归约)"是其主要思想,受到了函数...
Java MapReduce是一种基于Java编程语言的大数据处理框架,它实现了MapReduce编程模型,允许开发者编写能够在大量数据上并行运行的分布式算法。以下是Java MapReduce的核心内容概述: 1. **MapReduce框架**:Java ...
#### 六、Hadoop MapReduce 学习笔记 - **网址**: [Hadoop MapReduce 学习笔记](http://guoyunsky.iteye.com/blog/1233707) - **内容概述**: - **基本概念**: 讲解了MapReduce的基本概念和工作原理,以及它如何与...
压缩文件中包含了Hadoop生态系统、体系架构及特点,三大基本组件HDFS,MapReduce,YARN的学习笔记,文件为Markdown格式,进行了详细功能介绍说明,可以帮助大家学习hadoop的三大组件或者作为一份详细资料备份,帮助...
五、Hadoop学习笔记之四:运行MapReduce作业做集成测试 集成测试是在整个系统或部分系统组合后进行的测试,对于Hadoop项目,这通常意味着在真实或模拟的Hadoop集群上运行MapReduce作业。通过集成测试,可以验证应用...
hive hadoo MapReduce 介绍Hive。Hive入门,Hive学习笔记
实际案例分析是学习MapReduce时理解其应用和效果的最佳途径。例如,在订单分类统计案例中,需要理解map阶段如何处理输入数据并输出中间键值对,在reduce阶段如何对这些键值对进行汇总和输出最终结果。在二次排序案例...
**Hadoop学习笔记详解** Hadoop是一个开源的分布式计算框架,由Apache基金会开发,主要用于处理和存储海量数据。它的核心组件包括HDFS(Hadoop Distributed File System)和MapReduce,两者构成了大数据处理的基础...
在本篇"Hadoop学习笔记(三)"中,我们将探讨如何使用Hadoop的MapReduce框架来解决一个常见的问题——从大量数据中找出最大值。这个问题与SQL中的`SELECT MAX(NUMBER) FROM TABLE`查询相似,但在这里我们通过编程...
【尚硅谷大数据技术之Hadoop(MapReduce)1】深入解析MapReduce MapReduce是Google提出的一种用于处理和生成大规模数据集的编程模型,被广泛应用于大数据处理领域。Hadoop将其作为核心组件,实现了分布式计算的功能...
大数据技术学习笔记1 是一份关于大数据技术的学习笔记,涵盖了大数据技术的基本概念、Hadoop 生态系统、MapReduce 算法、Spark 框架、分布式计算平台等多个方面。 Hadoop 生态系统 Hadoop 是一个基于 Java 的开源...
本笔记将深入探讨大数据的基本概念,包括Hadoop、Hive、离线计算、实时计算、数据库、数据仓库、维度建模以及大规模并行处理MPP,还将介绍阿里云的一些大数据产品,如MaxCompute、DataWorks、数据集成、机器学习PAI...
MongoDB的MapReduce是一个强大的工具,它允许开发者处理和聚合大量数据。MapReduce基于一种分布式计算模型,将大规模数据处理任务分解为两步:Map(映射)和Reduce(归约)。在这个过程中,MongoDB首先应用Map函数...
【HADOOP学习笔记】 Hadoop是Apache基金会开发的一个开源分布式计算框架,是云计算领域的重要组成部分,尤其在大数据处理方面有着广泛的应用。本学习笔记将深入探讨Hadoop的核心组件、架构以及如何搭建云计算平台。...
大数据学习笔记 本资源摘要信息涵盖了大数据领域中的多个方面,包括Hadoop、HBase、Sqoop、Spark和Hive等技术栈。下面将对这些技术栈进行详细的解释和分析。 一、HDFS架构详尽分析 HDFS(Hadoop Distributed File...
### Pig学习笔记精要 **Pig** 是一个在 **Hadoop** 平台上用于数据分析的高级工具,它提供了一种非程序化的数据流语言,称为 **Pig Latin** ,来处理大规模的数据集。Pig 的设计目的是为了简化 **MapReduce** 的...
不过,MaxCompute的SQL的优点是对用户的学习成本低,用户不需要了解复杂的分布式计算概念。具备数据库操作经验的用户可以快速熟悉MaxCompute的SQL使用。MapReduce是MaxCompute提供的分布式数据处理模型,用户需要对...
压缩包内的“学习笔记”可能包括以下内容:Hadoop安装与配置教程,HDFS的基本操作和管理,MapReduce编程模型的实例解析,Hadoop集群的优化策略,以及YARN、HBase、Hive和Pig的使用方法等。这些笔记可以帮助读者深入...
### Hive学习笔记(更新版) #### 一、Hive简介 Hive 是一款构建于 Hadoop 之上的数据仓库工具,旨在提供一种简单易用的方法处理存储在 Hadoop 文件系统 (HDFS) 中的大量数据集。它允许用户使用类似于 SQL 的语言...