`
bochuxt
  • 浏览: 21030 次
  • 来自: 北京
最近访客 更多访客>>
社区版块
存档分类
最新评论

job on hadoop

阅读更多
//http://distributed-agility.blogspot.com/2010/01/hadoop-0201-example-inverted-line-index.html

//https://portal.futuregrid.org/manual/hadoop-wordcount


import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;

/**
* LineIndexer Creates an inverted index over all the words in a document corpus, mapping each observed word to a list
* of filename@offset locations where it occurs.
*/
public class LineIndexer extends Configured implements Tool {

// where to put the data in hdfs when we're done
private static final String OUTPUT_PATH = "output";

// where to read the data from.
private static final String INPUT_PATH = "input";




public static void main(String[] args) throws Exception {
int res = ToolRunner.run(new Configuration(), new LineIndexer(), args);
System.exit(res);
}

public int run(String[] args) throws Exception {

Configuration conf = getConf();
Job job = new Job(conf, "Line Indexer 1");

job.setJarByClass(LineIndexer.class);
job.setMapperClass(LineIndexMapper.class);
job.setReducerClass(LineIndexReducer.class);

job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);

FileInputFormat.addInputPath(job, new Path(INPUT_PATH));
FileOutputFormat.setOutputPath(job, new Path(OUTPUT_PATH));

return job.waitForCompletion(true) ? 0 : 1;
}
}

After updating, make sure to run generate a new jar, remove anything under the directory "output" (since the program does not clean that up), and execute the new version.

training@training-vm:~/git/exercises/shakespeare$ ant jar
Buildfile: build.xml

compile:
[javac] Compiling 4 source files to /home/training/git/exercises/shakespeare/bin

jar:
[jar] Building jar: /home/training/git/exercises/shakespeare/indexer.jar

BUILD SUCCESSFUL
Total time: 1 second

I have added 2 ASCII books in the input directory: the works from Leonardo Da Vinci and the first volume of the book "The outline of science".

training@training-vm:~/git/exercises/shakespeare$ hadoop fs -ls input
Found 3 items
-rw-r--r-- 1 training supergroup 5342761 2009-12-30 11:57 /user/training/input/all-shakespeare
-rw-r--r-- 1 training supergroup 1427769 2010-01-04 17:42 /user/training/input/leornardo-davinci-all.txt
-rw-r--r-- 1 training supergroup 674762 2010-01-04 17:42 /user/training/input/the-outline-of-science-vol1.txt

The execution and output of running this example is shown as follows.

training@training-vm:~/git/exercises/shakespeare$ hadoop jar indexer.jar index.LineIndexer
10/01/04 21:11:55 INFO input.FileInputFormat: Total input paths to process : 3
10/01/04 21:11:56 INFO mapred.JobClient: Running job: job_200912301017_0017
10/01/04 21:11:57 INFO mapred.JobClient: map 0% reduce 0%
分享到:
评论

相关推荐

    hadoop2.7.0版本的hadoop.dll,winutils.exe,X64下编译

    你还需要学习如何使用Hadoop的API,如`FileSystem`类来与HDFS交互,以及`Job`类来提交和管理作业。 总的来说,虽然Hadoop在Windows上的配置和使用相比Linux更具挑战性,但通过编译`hadoop.dll`和`winutils.exe`,...

    eclipse配置hadoop

    Run As -> Run on Hadoop 选择之前配置好的 MapReduce 运行环境,点击“Finish”运行。控制台会输出相关的运行信息。 6.4 查看运行结果 在输出目录/mapreduce/wordcount/output/1 中,可以看见 WordCount 程序的...

    hadoop.dll-and-winutils.exe-for-hadoop2.7.7-on-windows_X64-master

    标题 "hadoop.dll-and-winutils.exe-for-hadoop2.7.7-on-windows_X64-master" 暗示了这是一个针对64位Windows系统优化的Hadoop 2.7.7版本的特定组件集合,主要包含`hadoop.dll`和`winutils.exe`两个关键文件。...

    Apache Hadoop---Giraph.docx

    - **提交Job**:Giraph向Hadoop提交Job后,Zookeeper选举一个MapTask作为Master。 - **初始化**:Master分配图,启动Workers,每个Worker负责一部分图数据。 - **执行Supersteps**:每个Worker在其分配的图上执行...

    hadoop-2.6.0版本所需插件.zip

    6. **运行和调试**:写好代码后,右键点击项目,选择"Hadoop" > "Run on Cluster"或"Debug on Cluster",Eclipse会自动将你的程序提交到Hadoop集群上运行。你可以在"Console"视图中查看运行日志,也可以在...

    单机Hadoop配置安装

    同时,下载Hadoop-2.5.2.tar.gz和hadooponwindows-master.zip压缩包。 步骤二:解压安装Hadoop-2.5.2 解压Hadoop-2.5.2.tar.gz压缩包,并将其解压到指定目录下,例如F:\OpenSources\hadoop\Hadoop-2.5.2。然后,...

    Hadoop_Data Processing and Modelling-Packt Publishing(2016).pdf

    A number of organizations are focusing on big data processing, particularly with Hadoop. This course will help you understand how Hadoop, as an ecosystem, helps us store, process, and analyze data. ...

    基于Hadoop技术的大数据就业岗位数据分析.docx

    Big Data Employment Data Analysis Based on Hadoop Technology LIANG Tian-you, QIU Min (School of Information Engineering, Nanning University, Nanning 530200,China) Abstract: Big data is a new ...

    hadoop 2.7.6 eclipse插件

    18/05/25 19:51:32 INFO mapreduce.JobSubmissionFiles: Permissions on staging directory /tmp/hadoop-yarn/staging/hadoop/.staging are incorrect: rwxrwxrwx. Fixing permissions to correct value rwx------ ...

    winutils.exe + hadoop.dll

    8. **提交和运行Job**:你可以使用Hadoop的MapReduce API编写Java程序,然后通过`hadoop jar`命令在本地运行测试。 在Windows上运行Hadoop虽然不如在Linux环境中顺畅,但通过`winutils.exe`和`hadoop.dll`的配合,...

    Data Analytics with Hadoop: An Introduction for Data Scientists

    This practical guide shows you why the Hadoop ecosystem is perfect for the job. Instead of deployment, operations, or software development usually associated with distributed computing, you’ll focus...

    Data Analytics with Hadoop(O'Reilly,2016)

    This practical guide shows you why the Hadoop ecosystem is perfect for the job. Instead of deployment, operations, or software development usually associated with distributed computing, you’ll focus...

    spring-hadoop.pdf

    - **使用Hadoop Job Tasklet**:通过`<hadoop:job-tasklet>`元素可以轻松运行Hadoop Job。 - **使用工具Runner**:对于需要调用特定Hadoop工具的情况,可以使用工具Runner代替shell命令进行调用。 ```xml <hadoop:...

    hadoop搭建

    - 使用 `Run As -> Run on Hadoop` 来运行程序。 - 注意:每次运行都会在 `.metadata/.plugins/org.apache.hadoop.eclipse` 目录下生成临时 jar 包。 #### 三、常见错误及其处理 - **安全模式问题**:当尝试删除...

    Hadoop基础架构与历史技术教程

    import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org....

Global site tag (gtag.js) - Google Analytics