hadoop(5)Eclipse and Example
Find the sample project hadoop-mapreduce-examples
Download the STS tool to work on the JAVA project.
The sample project is easyhadoop. It is built based on MAVEN.
Here is the pom.xml dependency.
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0
http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.sillycat</groupId>
<artifactId>easyhadoop</artifactId>
<version>1.0</version>
<description>Hadoop MapReduce Example</description>
<name>Hadoop MapReduce Examples</name>
<packaging>jar</packaging>
<properties>
<hadoop.version>2.4.1</hadoop.version>
</properties>
<dependencies>
<dependency>
<groupId>commons-logging</groupId>
<artifactId>commons-logging</artifactId>
<version>1.1.3</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>${hadoop.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-mapreduce-client-jobclient</artifactId>
<version>${hadoop.version}</version>
<scope>provided</scope>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-jar-plugin</artifactId>
<configuration>
<archive>
<manifest>
<mainClass>com.sillycat.easyhadoop.ExecutorDriver</mainClass>
</manifest>
</archive>
</configuration>
</plugin>
</plugins>
</build>
</project>
Here is the mapper class which will fetch the data from files and mapper to arrays
package com.sillycat.easyhadoop.wordcount;
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
public class WordCountMapper extends Mapper<Object, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(Object key, Text value, Context context)
throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
}
}
}
Here is the reducer class, based on the mapper array, it will reduce the data and get a result
package com.sillycat.easyhadoop.wordcount;
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
public class WordCountReducer extends
Reducer<Text, IntWritable, Text, IntWritable> {
private IntWritable result = new IntWritable();
public void reduce(Text key, Iterable<IntWritable> values, Context context)
throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
result.set(sum);
context.write(key, result);
}
}
Here is the main class which run the word count job.
package com.sillycat.easyhadoop.wordcount;
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
public class WordCount {
public static void main(String[] args) throws IOException,
ClassNotFoundException, InterruptedException {
Configuration conf = new Configuration();
String[] otherArgs = new GenericOptionsParser(conf, args)
.getRemainingArgs();
if (otherArgs.length != 2) {
System.err.println("Usage: wordcount <in> <out>");
System.exit(2);
}
Job job = Job.getInstance(conf, "word count");
job.setJarByClass(WordCount.class);
job.setMapperClass(WordCountMapper.class);
job.setCombinerClass(WordCountReducer.class);
job.setReducerClass(WordCountReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
Once we create the input/wordcount directory and output directory, we can directly run that on eclipse.
Just create a run configuration for Java Application, add Arguments
input/wordcount output/wordcount
Add Environment
HADOOP_HOME=/opt/hadoop
or
hadoop.home.dir=/opt/hadoop
If I want to run it from the multiple machine cluster. I need to create the jar based on maven
>mvn clean install
First put the jar under this directory /opt/hadoop/share/custom, Here is how it runs on local machine
>hadoop jar /opt/hadoop/share/custom/easyhadoop-1.0.jar wordcount input output
On the ubuntu-master, place the jar under the /opt/hadoop/share/custom directory.
Start all the servers.
>sbin/start-dfs.sh
>sbin/start-yarn.sh
>sbin/mr-jobhistory-daemon.sh start historyserver
Since I already put my files in the hdfs.
>hadoop fs -mkdir -p /data/worldcount
>hadoop fs -put /opt/hadoop/etc/hadoop/*.xml /data/worldcount/
I can directly run my jar
>hadoop jar /opt/hadoop/share/custom/easyhadoop-1.0.jar wordcount /data/worldcount /output/worldcount2
And this will show me the result
>hadoop fs -cat /output/worldcount2/*
Actually, I just want to know about hadoop and map reduce framework, finally, I thought I will use Hbase, Spark. So I did not try to mapping and reducing based on database.
References:
http://hadoop.apache.org/docs/r2.4.1/api/
http://java.dzone.com/articles/running-hadoop-mapreduce
http://www.cnblogs.com/shitouer/archive/2012/05/29/2522860.html
http://blog.csdn.net/zythy/article/details/17397153
https://github.com/romainr/hadoop-tutorials-examples
http://www.javaworld.com/article/2077907/open-source-tools/mapreduce-programming-with-apache-hadoop.html
http://wiki.apache.org/hadoop/Grep
http://www.osedu.net/article/nosql/2012-05-02/435.html
http://www.ibm.com/developerworks/cn/java/j-javadev2-15/
hadoop classpath
http://grepalex.com/2013/02/25/hadoop-libjars/
http://stackoverflow.com/questions/12940239/hadoop-hadoop-classpath-issues
mapper from and reducer to DB
http://archanaschangale.wordpress.com/2013/09/26/database-access-with-apache-hadoop/
http://shazsterblog.blogspot.com/2012/11/storing-hadoop-wordcount-example-with.html
- 浏览: 2560052 次
- 性别:
- 来自: 成都
文章分类
最新评论
-
nation:
你好,在部署Mesos+Spark的运行环境时,出现一个现象, ...
Spark(4)Deal with Mesos -
sillycat:
AMAZON Relatedhttps://www.godad ...
AMAZON API Gateway(2)Client Side SSL with NGINX -
sillycat:
sudo usermod -aG docker ec2-use ...
Docker and VirtualBox(1)Set up Shared Disk for Virtual Box -
sillycat:
Every Half an Hour30 * * * * /u ...
Build Home NAS(3)Data Redundancy -
sillycat:
3 List the Cron Job I Have>c ...
Build Home NAS(3)Data Redundancy
发表评论
-
Update Site will come soon
2021-06-02 04:10 1686I am still keep notes my tech n ... -
Hadoop Docker 2019 Version 3.2.1
2019-12-10 07:39 304Hadoop Docker 2019 Version 3.2. ... -
Nginx and Proxy 2019(1)Nginx Enable Lua and Parse JSON
2019-12-03 04:17 462Nginx and Proxy 2019(1)Nginx En ... -
Data Solution 2019(13)Docker Zeppelin Notebook and Memory Configuration
2019-11-09 07:15 307Data Solution 2019(13)Docker Ze ... -
Data Solution 2019(10)Spark Cluster Solution with Zeppelin
2019-10-29 08:37 259Data Solution 2019(10)Spark Clu ... -
AMAZON Kinesis Firehose 2019(1)Firehose Buffer to S3
2019-10-01 10:15 336AMAZON Kinesis Firehose 2019(1) ... -
Rancher and k8s 2019(3)Clean Installation on CentOS7
2019-09-19 23:25 329Rancher and k8s 2019(3)Clean In ... -
Pacemaker 2019(1)Introduction and Installation on CentOS7
2019-09-11 05:48 356Pacemaker 2019(1)Introduction a ... -
Crontab-UI installation and Introduction
2019-08-30 05:54 467Crontab-UI installation and Int ... -
Spiderkeeper 2019(1)Installation and Introduction
2019-08-29 06:49 524Spiderkeeper 2019(1)Installatio ... -
Supervisor 2019(2)Ubuntu and Multiple Services
2019-08-19 10:53 382Supervisor 2019(2)Ubuntu and Mu ... -
Supervisor 2019(1)CentOS 7
2019-08-19 09:33 340Supervisor 2019(1)CentOS 7 Ins ... -
Redis Cluster 2019(3)Redis Cluster on CentOS
2019-08-17 04:07 379Redis Cluster 2019(3)Redis Clus ... -
Amazon Lambda and Version Limit
2019-08-02 01:42 447Amazon Lambda and Version Limit ... -
MySQL HA Solution 2019(1)Master Slave on MySQL 5.7
2019-07-27 22:26 543MySQL HA Solution 2019(1)Master ... -
RabbitMQ Cluster 2019(2)Cluster HA and Proxy
2019-07-11 12:41 471RabbitMQ Cluster 2019(2)Cluster ... -
Running Zeppelin with Nginx Authentication
2019-05-25 21:35 328Running Zeppelin with Nginx Aut ... -
Running Zeppelin with Nginx Authentication
2019-05-25 21:34 330Running Zeppelin with Nginx Aut ... -
ElasticSearch(3)Version Upgrade and Cluster
2019-05-20 05:00 333ElasticSearch(3)Version Upgrade ... -
Jetty Server and Cookie Domain Name
2019-04-28 23:59 413Jetty Server and Cookie Domain ...
相关推荐
复制 Hadoop 安装目录/src/example/org/apache/hadoop/example/WordCount.java 到刚才新建的项目下面。 7. 上传模拟数据文件夹 新建 word.txt 文件,通过 Hadoop 的命令在 HDFS 上创建/tmp/wordcount 目录,然后 ...
3. **Example Projects**: 示例项目可以帮助初学者了解如何在Eclipse中构建和运行Hadoop作业。 4. **配置文件**: 可能包括了预配置的Hadoop集群连接信息,使得开发者能够快速连接到测试或生产环境。 5. **文档**: ...
2. 将Hadoop安装目录下的`src/example/org/apache/hadoop/example/WordCount.java`复制到新项目中。`WordCount`是一个基础的MapReduce示例,用于计算文本文件中单词的频率。 **步骤五:准备输入数据** 1. 创建一个...
"Hadoop 分析统计学生考试成绩" 本资源综合了 Hadoop 分析统计学生考试成绩的实现,涵盖了从开发环境到项目结构、代码文件说明、程序运行方式等方面。 一、开发环境 项目需要 Win 10 64 位或 macOS High Sierra ...
在Hadoop开发过程中,`.classpath`和`.project`是Eclipse等IDE的配置文件,它们定义了项目的类路径和构建设置。`.settings`目录则包含了项目的特定于IDE的配置。 总结来说,"运行hadoop jar"是一个关键步骤,它涵盖...
使用Java编写WordCount程序, import Hadoop的相关包,包括org.apache.hadoop.example.wordcount。 2. 配置WordCount程序 配置WordCount程序的运行参数,包括输入文件、输出文件和Mapper、Reducer的配置。 3. ...
- 使用`hadoop jar`命令执行JAR包中的主类,例如`hadoop jar /path/to/your-jar-file.jar com.example.HdfsTest`。 #### 六、配置Hadoop环境 如果你使用的是Hadoop 3.1.3或更高版本,可能需要额外配置`mapred-...
本资源“cloud_book-example”是针对Hadoop MapReduce的一份详尽示例,旨在帮助读者深入理解并掌握这一核心技术。 “cloud book examples hadoop”这一标签揭示了本资源的核心内容,即云环境下的Hadoop实例。Hadoop...
使用eclipse和maven的Hadoop Mapreduce示例:日志文件分析:本文简要概述了如何使用Eclipse和maven应用map reduce来计算每天发生的日志消息类型的数量。 先决条件: • 虚拟机上的Hadoop 设置。 • Java 版本 6 ...
5. **Eclipse集成**: 该项目可以直接导入到Eclipse IDE中,这使得开发者可以利用Eclipse的强大功能进行代码编辑、调试和测试,提升开发效率。 6. **项目结构**: 一个典型的Spark Maven项目通常包含`pom.xml`...
搭建CDH5后,自带的example也已经能够运行,为编写自定义的MapReduce作业打下基础。 编写MapReduce程序通常需要以下jar包:hadoop-client, hadoop-common, hadoop-hdfs, hadoop-mapreduce-client-core和commons-cli...
在您可以找到有关如何在 Eclipse 中导入项目并对其进行修改的说明,以及有关 Hadoop 作业的说明。自定义您的工作修改src/main/java/com/example文件夹中的 Mapper 和 Recuder 以及 maven 文件pom.xml 。编译你的...
Path dirPath = new Path("/example_dir"); boolean result = fs.mkdirs(dirPath); if (result) { System.out.println("Directory created successfully."); } else { System.out.println("Failed to create ...
开发者可以通过解压并导入到IDE(如Eclipse或IntelliJ IDEA)中,查看和运行示例代码,了解如何在实际项目中应用Mahout。 1. 示例项目结构: - src/main/java:包含Java源代码,展示了如何调用Mahout API创建机器...
5. **设置 Job**:在 Job 配置中设置 Reduce 的数量,以及其他必要的参数。 6. **运行 Job**:最后,提交 Job 给 Hadoop 集群进行执行。 在 `Example_MultipleInputs` 压缩包中,我们可以期待找到以下文件: - ...
1. 创建Java项目:使用IDE(如IntelliJ IDEA或Eclipse)创建一个新的Java项目,导入所需的Spark和Hadoop依赖。在`pom.xml`文件中添加以下Maven依赖: ```xml <groupId>org.apache.spark <artifactId>spark-...
- **测试数据**:example_data.log,包含10条样例数据,用于初步验证流程的正确性。 - **正式数据**:access_2017-05-30.log,包含548066条真实日志记录,用于正式分析。 #### 案例任务 1. **Linux系统安装**:...
为了在Eclipse或IDEA中运行此项目,你需要确保已经安装了Hadoop,并配置了环境变量,包括`HADOOP_HOME`和`JAVA_HOME`。然后,你可以使用Maven插件来构建和运行项目,或者将项目导入IDE,配置相应的运行配置,指定...
5. **Hadoop命令行参数** 当通过Hadoop命令行提交Giraph作业时,可以使用`-D`选项直接指定配置参数,但这不是从文件中读取,而是直接传递参数。例如: ``` hadoop jar your-jar-file.jar org.apache.giraph....
java8 看不到源码spark-cassandra-collabfiltering ...使用./bin/run-example SparkPi测试安装 有关设置的更多说明和教程,请参阅下面的快速入门。 获取Eclipse: 下载 Eclipse Luna 4.4.1 Ubuntu 64 位