Hadoop读书笔记(一)Hadoop介绍:http://blog.csdn.net/caicongyang/article/details/39898629
Hadoop读书笔记(二)HDFS的shell操作:http://blog.csdn.net/caicongyang/article/details/41253927
Hadoop读书笔记(三)Java API操作HDFS:http://blog.csdn.net/caicongyang/article/details/41290955
Hadoop读书笔记(四)HDFS体系结构:http://blog.csdn.net/caicongyang/article/details/41322649
Hadoop读书笔记(五)MapReduce统计单词demo:http://blog.csdn.net/caicongyang/article/details/41453579
Hadoop读书笔记(六)MapReduce自定义数据类型demo:http://blog.csdn.net/caicongyang/article/details/41490379
1.说明
功能和上篇一样实现手机流量的统计(ps:可以与前面文章代码做对比)
数据格式也相同...
2.代码:
KpiApp.java
package old; import java.io.DataInput; import java.io.DataOutput; import java.io.IOException; import java.net.URI; import java.util.Iterator; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.io.Writable; import org.apache.hadoop.mapred.FileInputFormat; import org.apache.hadoop.mapred.FileOutputFormat; import org.apache.hadoop.mapred.JobClient; import org.apache.hadoop.mapred.JobConf; import org.apache.hadoop.mapred.MapReduceBase; import org.apache.hadoop.mapred.Mapper; import org.apache.hadoop.mapred.OutputCollector; import org.apache.hadoop.mapred.Reducer; import org.apache.hadoop.mapred.Reporter; import org.apache.hadoop.mapred.TextInputFormat; import org.apache.hadoop.mapred.TextOutputFormat; import org.apache.hadoop.mapred.lib.HashPartitioner; /** * * <p> * Title: KpiApp.java * Package old * </p> * <p> * Description: hadoop版本1.x的包一般是mapreduce * hadoop版本0.x的包一般是mapred * <p> * @author Tom.Cai * @created 2014-11-25 下午10:23:47 * @version V1.0 * */ public class KpiApp { private static final String INPUT_PATH = "hdfs://192.168.80.100:9000/wlan"; private static final String OUT_PATH = "hdfs://192.168.80.100:9000/wlan_out"; /** * 改动: * 1.不再使用Job,而是使用JobConf * 2.类的包名不再使用mapreduce,而是使用mapred * 3.不再使用job.waitForCompletion(true)提交作业,而是使用JobClient.runJob(job); * */ public static void main(String[] args) throws Exception { FileSystem fileSystem = FileSystem.get(new URI(INPUT_PATH), new Configuration()); Path outPath = new Path(OUT_PATH); if (fileSystem.exists(outPath)) { fileSystem.delete(outPath, true); } JobConf job = new JobConf(new Configuration(), KpiApp.class); FileInputFormat.setInputPaths(job, INPUT_PATH); job.setInputFormat(TextInputFormat.class); job.setMapperClass(KpiMapper.class); job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(KpiWite.class); job.setPartitionerClass(HashPartitioner.class); job.setNumReduceTasks(1); job.setReducerClass(KpiReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(KpiWite.class); FileOutputFormat.setOutputPath(job, new Path(OUT_PATH)); job.setOutputFormat(TextOutputFormat.class); JobClient.runJob(job); } /** * 新api:extends Mapper * 老api:extends MapRedcueBase implements Mapper */ static class KpiMapper extends MapReduceBase implements Mapper<LongWritable, Text, Text, KpiWite> { @Override public void map(LongWritable key, Text value, OutputCollector<Text, KpiWite> out, Reporter arg3) throws IOException { String[] splited = value.toString().split("\t"); String num = splited[1]; KpiWite kpi = new KpiWite(splited[6], splited[7], splited[8], splited[9]); out.collect(new Text(num), kpi); } } static class KpiReducer extends MapReduceBase implements Reducer<Text, KpiWite, Text, KpiWite> { @Override public void reduce(Text key, Iterator<KpiWite> value, OutputCollector<Text, KpiWite> out, Reporter arg3) throws IOException { long upPackNum = 0L; long downPackNum = 0L; long upPayLoad = 0L; long downPayLoad = 0L; while (value.hasNext()) { upPackNum += value.next().upPackNum; downPackNum += value.next().downPackNum; upPayLoad += value.next().upPayLoad; downPayLoad += value.next().downPayLoad; } out.collect(key, new KpiWite(String.valueOf(upPackNum), String.valueOf(downPackNum), String.valueOf(upPayLoad), String.valueOf(downPayLoad))); } } } class KpiWite implements Writable { long upPackNum; long downPackNum; long upPayLoad; long downPayLoad; public KpiWite() { } public KpiWite(String upPackNum, String downPackNum, String upPayLoad, String downPayLoad) { this.upPackNum = Long.parseLong(upPackNum); this.downPackNum = Long.parseLong(downPackNum); this.upPayLoad = Long.parseLong(upPayLoad); this.downPayLoad = Long.parseLong(downPayLoad); } @Override public void readFields(DataInput in) throws IOException { this.upPackNum = in.readLong(); this.downPackNum = in.readLong(); this.upPayLoad = in.readLong(); this.downPayLoad = in.readLong(); } @Override public void write(DataOutput out) throws IOException { out.writeLong(upPackNum); out.writeLong(downPackNum); out.writeLong(upPayLoad); out.writeLong(downPayLoad); } }
欢迎大家一起讨论学习!
有用的自己收!
记录与分享,让你我共成长!欢迎查看我的其他博客;我的博客地址:http://blog.csdn.net/caicongyang
相关推荐
org.apache.hadoop.mapreduce.server.jobtracker org.apache.hadoop.mapreduce.server.tasktracker org.apache.hadoop.mapreduce.tools org.apache.hadoop.mapreduce.v2 org.apache.hadoop.mapreduce.v2.app....
包org.apache.hadoop.mapreduce的Hadoop源代码分析
hadoop-mapreduce-examples-2.7.1.jar
hadoop-annotations-3.1.1.jar hadoop-common-3.1.1.jar hadoop-mapreduce-client-core-3.1.1.jar hadoop-yarn-api-3.1.1.jar hadoop-auth-3.1.1.jar hadoop-hdfs-3.1.1.jar hadoop-mapreduce-client-hs-3.1.1.jar ...
在源码层面,org.apache.hadoop.mapreduce包包含了关键的接口和类。Writeable、Counter和ID相关类处理计数和标识,Context类提供Mapper和Reducer所需的上下文信息,Mapper、Reducer和Job类定义了MapReduce的基本操作...
《Hadoop.MapReduce.v2.Cookbook》是一本专注于Hadoop MapReduce v2(也称为YARN)的实用指南,适合那些希望深入了解和利用Hadoop处理大数据的IT专业人士。这本书籍详细介绍了如何在Hadoop 2.x环境中有效地设计、...
Hadoop HDFS和MapReduce架构浅析.pdf 更多资源请点击:https://blog.csdn.net/weixin_44155966
标题 "hadoop-3.3.1-aarch64.tar.gz" 暗示这是一个针对aarch64架构(ARM64)的Hadoop 3.3.1版本的压缩包,适合在运行M1芯片(苹果公司的ARM架构处理器)的Mac系统上使用。Hadoop是一个开源的分布式计算框架,它允许...
1. 确保下载的`hadoop.dll`和`winutils.exe`与你的Hadoop版本兼容。 2. 配置环境变量,包括`HADOOP_HOME`和`PATH`,以便系统能找到这些文件。 3. 对于`winutils.exe`,确保设置了正确的HDFS根目录 (`hdfs dfs -...
本文将详细讲解如何在Windows操作系统上搭建和使用Hadoop 2.6.x及2.7.x版本的可执行环境,主要基于提供的压缩包文件:`hadoop2.7.1X64.zip`和`hadoop2.6(x64)V0.2.zip`。 一、Hadoop简介 Hadoop的核心组件包括HDFS...
Hadoop 采用 MapReduce 分布式计算框架,根据 GFS 原理开发了 HDFS(分布式文件系统),并根据 BigTable 原理开发了 HBase 数据存储系统。 Hadoop 和 Google 内部使用的分布式计算系统原理相同,其开源特性使其成为...
at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:133) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:437) at org.apache....
标题中的"hadoop-2.6.0-cdh5.14.2.tar.gz"是一个针对Apache Hadoop的软件包,具体来说是CDH(Cloudera Distribution Including Apache Hadoop)5.14.2版本,它基于Hadoop 2.6.0。CDH是由Cloudera公司提供的一个开源...
包mapreduce.lib.map的Hadoop源代码分析
对于Hadoop来说,hadoop.dll是Windows版本Hadoop实现的一部分,它提供了与Hadoop生态系统交互所需的功能。这个库文件可能包含了Hadoop的JNI(Java Native Interface)实现,使得Java代码能够调用本地系统的服务,如...
hadoop-mapreduce-examples-2.6.5.jar 官方案例源码
《Packtpub.Hadoop.MapReduce.Cookbook.Jan.2013》是2013年出版的一本专门探讨Hadoop MapReduce技术的实战指南。这本书深入浅出地介绍了如何利用Hadoop MapReduce框架来处理大数据问题,是IT行业中针对大数据处理的...
在这个场景中,我们有两个安装包:`hadoop-2.6.0-cdh5.7.0.tar.gz` 和 `jdk-7u80-linux-x64.tar.gz`,分别代表了CDH5.7.0版本的Hadoop和Java 7的64位Linux版本。 首先,让我们深入了解一下Hadoop。Hadoop的核心由两...
5. **工具和API更新**:Hadoop的命令行工具和编程接口(如Hadoop Common、Hadoop MapReduce客户端库)可能会有新的功能和改进,以便开发者更好地利用Hadoop的功能。 6. **社区支持**:作为开源项目,Hadoop拥有活跃...