----------------------------说明------------------------------
此程序比HelloHadoopV1增加了:
检查输出信息文件夹内是否存在输出文件并删除
input文件夹内的文件若大于两个,则文件不会被覆盖
map与reduce拆开,便于函数再利用
---------------------------------------------------------------
HelloMapperV2.java
package HelloHadoopV2;
import java.io.IOException;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
public class HelloMapperV2 extends Mapper<LongWritable,Text,Text,Text>{
public void map(LongWritable key,Text value,Context context)
throws IOException, InterruptedException{
context.write(new Text(key.toString()), value);
}
}
HelloReducerV2.java
package HelloHadoopV2;
import java.io.IOException;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
public class HelloReducerV2 extends Reducer<Text,Text,Text,Text> {
public void reduce(Text key,Iterable<Text> values,Context context) throws IOException, InterruptedException{
String str = new String("");
Text final_key = new Text();
Text final_value = new Text();
//将key值相同的values,通过 && 符号分隔之
for(Text tmp : values){
str += tmp.toString()+ " &&";
}
final_key.set(key);
final_value.set(str);
context.write(final_key, final_value);
}
}
HelloHadoopV2.java
package HelloHadoopV2;
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import CheckAndDelete.CheckAndDelete;
public class HelloHadoopV2 {
/**
* @param args
* @throws IOException
* @throws ClassNotFoundException
* @throws InterruptedException
*/
public static void main(String[] args) throws IOException, InterruptedException, ClassNotFoundException {
Configuration conf = new Configuration();
Job job = new Job(conf,"Hadoop Hello World 2");
job.setJarByClass(HelloHadoopV2.class);
//设置map and reduce以及 combiner class
job.setMapperClass(HelloMapperV2.class);
job.setCombinerClass(HelloReducerV2.class);
job.setReducerClass(HelloReducerV2.class);
//设定map的输出类型
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(Text.class);
//设定reduce的输出类型
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
//设置文件输入路径
FileInputFormat.addInputPath(job,new Path("/user/hadoopinput"));
//设置文件输出路径
FileOutputFormat.setOutputPath(job,new Path("/user/hadoop/output-hh2"));
//调用CheckAndDelete函数。检查是否存在文件夹,若存在则删除
CheckAndDelete.checkAndDelete("/user/hadoop/output-hh2",conf);
boolean status = job.waitForCompletion(true);
if(status){
System.err.println("Integrate Alert Job Finished !");
}else{
System.err.println("Integrate Alert Job Failed");
System.exit(1);
}
}
}
测试方法:
1、启动hadoop
2、将该代码打包为HelloHadoop.jar,并将其拷贝到hadoop目录下
3、执行过程如 http://freewxy.iteye.com/blog/1102011
4、查看结果:


- 大小: 165.7 KB
分享到:
相关推荐
赠送jar包:hbase-hadoop2-compat-1.2.12.jar; 赠送原API文档:hbase-hadoop2-compat-1.2.12-javadoc.jar; 赠送源代码:hbase-hadoop2-compat-1.2.12-sources.jar; 赠送Maven依赖信息文件:hbase-hadoop2-compat-...
hadoop-annotations-3.1.1.jar hadoop-common-3.1.1.jar hadoop-mapreduce-client-core-3.1.1.jar hadoop-yarn-api-3.1.1.jar hadoop-auth-3.1.1.jar hadoop-hdfs-3.1.1.jar hadoop-mapreduce-client-hs-3.1.1.jar ...
Hadoop-Eclipse-Plugin 2.8.0的出现,反映了Hadoop生态系统从Hadoop 1到Hadoop 2的重大转变,尤其是在资源管理和任务调度方面的改进。同时,这也意味着对于那些已经习惯了Eclipse或MyEclipse的开发者来说,他们无需...
Apache Flink 是一个流行的开源大数据处理框架,而 `flink-shaded-hadoop-2-uber-2.7.5-10.0.jar.zip` 文件是针对 Flink 优化的一个特殊版本的 Hadoop 库。这个压缩包中的 `flink-shaded-hadoop-2-uber-2.7.5-10.0....
标题中的"apache-hadoop-3.1.0-winutils-master.zip"是一个针对Windows用户的Hadoop工具包,它包含了运行Hadoop所需的特定于Windows的工具和配置。`winutils.exe`是这个工具包的关键组件,它是Hadoop在Windows上的一...
赠送jar包:hbase-hadoop2-compat-1.1.3.jar; 赠送原API文档:hbase-hadoop2-compat-1.1.3-javadoc.jar; 赠送源代码:hbase-hadoop2-compat-1.1.3-sources.jar; 赠送Maven依赖信息文件:hbase-hadoop2-compat-...
# 解压命令 tar -zxvf flink-shaded-hadoop-2-uber-3.0.0-cdh6.2.0-7.0.jar.tar.gz # 介绍 用于CDH部署 Flink所依赖的jar包
赠送jar包:hadoop-yarn-client-2.6.5.jar; 赠送原API文档:hadoop-yarn-client-2.6.5-javadoc.jar; 赠送源代码:hadoop-yarn-client-2.6.5-sources.jar; 赠送Maven依赖信息文件:hadoop-yarn-client-2.6.5.pom;...
Hadoop-Eclipse-Plugin-3.1.1是一款专为Eclipse集成开发环境设计的插件,用于方便地在Hadoop分布式文件系统(HDFS)上进行开发和调试MapReduce程序。这款插件是Hadoop生态系统的组成部分,它使得Java开发者能够更加...
Ubuntu虚拟机HADOOP集群搭建eclipse环境 hadoop-eclipse-plugin-3.3.1.jar
赠送jar包:hadoop-mapreduce-client-jobclient-2.6.5.jar; 赠送原API文档:hadoop-mapreduce-client-jobclient-2.6.5-javadoc.jar; 赠送源代码:hadoop-mapreduce-client-jobclient-2.6.5-sources.jar; 赠送...
《Apache Flink 1.0.3:大数据处理框架与Hadoop 2.7和Scala 2的结合》 Apache Flink是一个开源流处理和批处理框架,它在大数据领域扮演着重要的角色。标题中的“flink-1.0.3-bin-hadoop27-scala_2”揭示了我们讨论...
2. `winutils.exe`:如前所述,这是Windows上的一个关键工具,用于执行Hadoop相关的系统任务,如设置HDFS的权限和管理本地文件系统。 3. `hadoop.exp`:这可能是一个导出文件,包含了Hadoop库对外公开的函数和符号...
- `hadoop2x-eclipse-plugin-master/ivy/library.properties` - `hadoop2x-eclipse-plugin-master/src/contrib/eclipse-plugin/build.xml` 开源源地址: https://github.com/winghc/hadoop2x-eclipse-plugin
赠送jar包:hadoop-auth-2.5.1.jar; 赠送原API文档:hadoop-auth-2.5.1-javadoc.jar; 赠送源代码:hadoop-auth-2.5.1-sources.jar; 赠送Maven依赖信息文件:hadoop-auth-2.5.1.pom; 包含翻译后的API文档:hadoop...
flink-shaded-hadoop-2-uber-2.7.5-10.0.jar
hadoop-eclipse-plugin-2.7.3和2.7.7的jar包 hadoop-eclipse-plugin-2.7.3和2.7.7的jar包 hadoop-eclipse-plugin-2.7.3和2.7.7的jar包 hadoop-eclipse-plugin-2.7.3和2.7.7的jar包
在这个特定的兼容包中,我们可以看到两个文件:flink-shaded-hadoop-3-uber-3.1.1.7.1.1.0-565-9.0.jar(实际的兼容库)和._flink-shaded-hadoop-3-uber-3.1.1.7.1.1.0-565-9.0.jar(可能是Mac OS的元数据文件,通常...
2. `hdfs`: 与Hadoop分布式文件系统(HDFS)交互的命令行工具,支持文件的创建、删除、复制等操作。 3. `yarn`: 用于管理和调度YARN(Yet Another Resource Negotiator)资源的命令行工具,YARN是Hadoop的第二代资源...
Flink1.10.1编译hadoop2.7.2 编译flink-shaded-hadoop-2-uber