In my work, I run a situation that I want to use A mapper reading a file with to fields (questionId, questionTags) and outpute format likes key: questionId value: questionTags, while B mapper reading a dir which contains many files named by questionId with questionContent as its file content and output format likes key: questionId/fileName value: questionContent. Then a reducer do some string operations.
The framework above is
A mapper
> reducer
B mapper
The problem can't be solved by ChainMapper.
I found that the two mapper's output format is the same. So, the other way is to adopt one mapper to read questions dir and tags file.
two problems;
a.
QuestionTagsWritable e1 = null, e2 = null;
for (QuestionTagsWritable e : values) {
System.out.println("xx = " + e.toString());
if (e.isTags) {
e1 = e;
} else {
e2 = e;
}
}
solution: e1 = new QuestionTagsWritable(true,tmp.content); //pass value not address
b.
FileSplit fileSplit = (FileSplit) context.getInputSplit();
solution:
InputSplit split = context.getInputSplit(); Class<? extends InputSplit> splitClass = split.getClass(); FileSplit fileSplit = null; if (splitClass.equals(FileSplit.class)) { fileSplit = (FileSplit) split; } else if (splitClass.getName().equals( "org.apache.hadoop.mapreduce.lib.input.TaggedInputSplit")) { // begin reflection hackery... try { Method getInputSplitMethod = splitClass .getDeclaredMethod("getInputSplit"); getInputSplitMethod.setAccessible(true); fileSplit = (FileSplit) getInputSplitMethod.invoke(split); } catch (Exception e) { // wrap and re-throw error throw new IOException(e); } // end reflection hackery }
see:http://stackoverflow.com/questions/11130145/hadoop-multipleinputs-fails-with-classcastexception
相关推荐
"Data Analytics with Hadoop: An Introduction for Data Scientists" ISBN: 1491913703 | 2016 | PDF | 288 pages | 7 MB Ready to use statistical and machine-learning techniques across large data sets? ...
Hadoop: The Definitive Guide, 4th Edition Get ready to unlock the power of your data. With the fourth edition of this comprehensive guide, you’ll learn how to build and maintain reliable, scalable,...
With this digital Early Release edition of Hadoop: The Definitive Guide, you get the entire book bundle in its earliest form – the author’s raw and unedited content – so you can take advantage of ...
本文献《SQL-on-Hadoop: Full Circle Back to Shared-Nothing Database Architectures》由Avrilia Floratou、Umar Farooq Minhas及Fatma Özcan三位来自IBM Almaden研究中心的研究员共同撰写。该文探讨了SQL在Hadoop...
- **书名**:《Hadoop:The Definitive Guide》(第二版) - **作者**:Tom White - **前言作者**:Doug Cutting - **出版社**:O'Reilly Media, Inc. - **出版日期**:2010年10月 - **版权**:版权所有 © 2011 Tom...
pdf+epub This book will teach you how to deploy large-scale datasets in deep neural networks with Hadoop for optimal...this book will then show you how to set up the Hadoop environment for deep learning.
实战Hadoop:开启通向云计算的捷径
资源名称:云计算Hadoop:快速部署Hadoop集群内容简介: 近来云计算越来越热门了,云计算已经被看作IT业的新趋势。云计算可以粗略地定义为使用自己环境之外的某一服务提供的可伸缩计算资源,并按使用量付费。可以...
Along with Hadoop 2.x and illustrates how it can be used to extend the capabilities of Hadoop. When you nish this course, you will be able to tackle the real-world scenarios and become a big data ...
Apache Hadoop:Hadoop集群运维与优化.docx
Apache Hadoop:Hadoop资源管理器YARN详解.docx
分布式存储系统hadoop:hbase安装经验,非常不错的hadoop之hbase,入门环境搭建。
Hadoop: The Definitive Guide helps you harness the power of your data. Ideal for processing large datasets, the Apache Hadoop framework is an open source implementation of the MapReduce algorithm on ...
Apache Hadoop:Hadoop数据仓库Hive入门与应用.docx
Apache Hadoop:Hadoop数据安全与权限管理技术教程.docx
Maven坐标:org.apache.hadoop:hadoop-mapreduce-client-common:2.6.5; 标签:apache、mapreduce、common、client、hadoop、jar包、java、API文档、中英对照版; 使用方法:解压翻译后的API文档,用浏览器打开...
实战Hadoop:开启通向云计算的捷径(刘鹏)PDF电子书,已添加目录。