- 浏览: 149243 次
- 性别:
- 来自: 北京
文章分类
最新评论
-
EclipseEye:
fair_jm 写道不错 蛮详细的 谢谢分享
SWT/JFace专题 --- SWT中Display和多线程 -
fair_jm:
不错 蛮详细的 谢谢分享
SWT/JFace专题 --- SWT中Display和多线程
@Public
@Stable
InputSplit represents the data to be processed by an individual Mapper.
Typically, it presents a byte-oriented view on the input and is the responsibility of RecordReader of the job to process this and present a record-oriented view.
---------------------
@InterfaceAudience.Public
@InterfaceStability.Stable
public abstract class InputSplit {
/**
* Get the size of the split, so that the input splits can be sorted by size.
* @return the number of bytes in the split
* @throws IOException
* @throws InterruptedException
*/
public abstract long getLength() throws IOException, InterruptedException;
/**
* Get the list of nodes by name where the data for the split would be local.
* The locations do not need to be serialized.
*
* @return a new array of the node nodes.
* @throws IOException
* @throws InterruptedException
*/
public abstract
String[] getLocations() throws IOException, InterruptedException;
/**
* Gets info about which nodes the input split is stored on and how it is
* stored at each location.
*
* @return list of <code>SplitLocationInfo</code>s describing how the split
* data is stored at each location. A null value indicates that all the
* locations have the data stored on disk.
* @throws IOException
*/
@Evolving
public SplitLocationInfo[] getLocationInfo() throws IOException {
return null;
}
---------------------------
/**
* A InputSplit that spans a set of rows
*/
@InterfaceStability.Evolving
public static class DBInputSplit extends InputSplit implements Writable {
private long end = 0;
private long start = 0;
/**
* Default Constructor
*/
public DBInputSplit() {
}
/**
* Convenience Constructor
* @param start the index of the first row to select
* @param end the index of the last row to select
*/
public DBInputSplit(long start, long end) {
this.start = start;
this.end = end;
}
/** {@inheritDoc} */
public String[] getLocations() throws IOException {
// TODO Add a layer to enable SQL "sharding" and support locality
return new String[] {};
}
/**
* @return The index of the first row to select
*/
public long getStart() {
return start;
}
/**
* @return The index of the last row to select
*/
public long getEnd() {
return end;
}
/**
* @return The total row count in this split
*/
public long getLength() throws IOException {
return end - start;
}
/** {@inheritDoc} */
public void readFields(DataInput input) throws IOException {
start = input.readLong();
end = input.readLong();
}
/** {@inheritDoc} */
public void write(DataOutput output) throws IOException {
output.writeLong(start);
output.writeLong(end);
}
}
---------
@Stable
InputSplit represents the data to be processed by an individual Mapper.
Typically, it presents a byte-oriented view on the input and is the responsibility of RecordReader of the job to process this and present a record-oriented view.
---------------------
@InterfaceAudience.Public
@InterfaceStability.Stable
public abstract class InputSplit {
/**
* Get the size of the split, so that the input splits can be sorted by size.
* @return the number of bytes in the split
* @throws IOException
* @throws InterruptedException
*/
public abstract long getLength() throws IOException, InterruptedException;
/**
* Get the list of nodes by name where the data for the split would be local.
* The locations do not need to be serialized.
*
* @return a new array of the node nodes.
* @throws IOException
* @throws InterruptedException
*/
public abstract
String[] getLocations() throws IOException, InterruptedException;
/**
* Gets info about which nodes the input split is stored on and how it is
* stored at each location.
*
* @return list of <code>SplitLocationInfo</code>s describing how the split
* data is stored at each location. A null value indicates that all the
* locations have the data stored on disk.
* @throws IOException
*/
@Evolving
public SplitLocationInfo[] getLocationInfo() throws IOException {
return null;
}
---------------------------
/**
* A InputSplit that spans a set of rows
*/
@InterfaceStability.Evolving
public static class DBInputSplit extends InputSplit implements Writable {
private long end = 0;
private long start = 0;
/**
* Default Constructor
*/
public DBInputSplit() {
}
/**
* Convenience Constructor
* @param start the index of the first row to select
* @param end the index of the last row to select
*/
public DBInputSplit(long start, long end) {
this.start = start;
this.end = end;
}
/** {@inheritDoc} */
public String[] getLocations() throws IOException {
// TODO Add a layer to enable SQL "sharding" and support locality
return new String[] {};
}
/**
* @return The index of the first row to select
*/
public long getStart() {
return start;
}
/**
* @return The index of the last row to select
*/
public long getEnd() {
return end;
}
/**
* @return The total row count in this split
*/
public long getLength() throws IOException {
return end - start;
}
/** {@inheritDoc} */
public void readFields(DataInput input) throws IOException {
start = input.readLong();
end = input.readLong();
}
/** {@inheritDoc} */
public void write(DataOutput output) throws IOException {
output.writeLong(start);
output.writeLong(end);
}
}
---------
发表评论
-
数据迁移相关(关系型数据库mysql,oracle和nosql数据库如hbase)
2015-04-01 15:15 737HBase数据迁移(1) http://www.importn ... -
zookeeper适用场景:如何竞选Master及代码实现
2015-04-01 14:53 794zookeeper适用场景:如何竞选Master及代码实现 h ... -
MR/hive 数据去重
2015-04-01 14:43 737海量数据去重的五大策略 http://www.ciotimes ... -
面试牛x题
2015-03-18 23:50 0hive、mr(各需三道) 1.分别使用Hadoop MapR ... -
使用shell并发上传文件到hdfs
2015-03-16 21:41 1274使用shell并发上传文件到hdfs http://mos19 ... -
hadoop集群监控工具Apache Ambari
2015-03-14 17:27 0Apache Ambari官网 http://ambari.a ... -
Hadoop MapReduce优化相关
2015-03-16 21:46 473[大牛翻译系列]Hadoop 翻译文章索引 http://ww ... -
数据倾斜问题 牛逼(1)数据倾斜之MapReduce&hive
2015-03-16 21:43 804数据倾斜总结 http://www.alidata.org/a ... -
MapReduce牛逼(4)WritableComparable接口
2015-03-12 08:57 607@Public @Stable A Writable whi ... -
MapReduce牛逼(3)(继承WritableComparable)实现自定义key键,实现二重排序
2015-03-12 08:57 649package sort; import jav ... -
MapReduce牛逼(2)MR简单实现 导入数据到hbase例子
2015-03-12 08:57 1283package cmd; /** * MapRe ... -
MapReduce牛逼(1)MR单词计数例子
2015-03-11 00:44 1213package cmd; import org. ... -
InputFormat牛逼(9)FileInputFormat实现类之SequenceFileInputFormat
2015-03-11 00:24 1410一、SequenceFileInputFormat及Seque ... -
InputFormat牛逼(8)FileInputFormat实现类之TextInputFormat
2015-03-11 00:19 583/** An {@link InputFormat} for ... -
InputFormat牛逼(6)org.apache.hadoop.mapreduce.lib.db.DBRecordReader<T>
2015-03-11 00:11 679@Public @Evolving A RecordRead ... -
InputFormat牛逼(5)org.apache.hadoop.mapreduce.lib.db.DBInputFormat<T>
2015-03-10 23:10 604@Public @Stable A InputFormat ... -
InputFormat牛逼(4)org.apache.hadoop.mapreduce.RecordReader<KEYIN, VALUEIN>
2015-03-10 22:50 372@Public @Stable The record rea ... -
InputFormat牛逼(3)org.apache.hadoop.mapreduce.InputFormat<K, V>
2015-03-10 22:46 663@Public @Stable InputFormat d ... -
InputFormat牛逼(1)org.apache.hadoop.mapreduce.lib.db.DBWritable
2015-03-10 22:07 557@Public @Stable Objects that a ... -
如何把hadoop2 的job作业 提交到 yarn平台
2015-01-08 21:09 0aaa萨芬撒点
相关推荐
- **新代码**:主要位于`org.apache.hadoop.mapreduce.*`,包含36,915行代码,这部分代码进行了重构,提高了代码质量和可维护性。 - 辅助类:分别位于`org.apache.hadoop.util.*`(148行)和`org.apache.hadoop.file...
import org.apache.hadoop.mapreduce.InputSplit; import org.apache.hadoop.mapreduce.JobContext; import org.apache.hadoop.mapreduce.RecordReader; import org.apache.hadoop.mapreduce.TaskAttemptContext; ...
在源码层面,org.apache.hadoop.mapreduce包包含了关键的接口和类。Writeable、Counter和ID相关类处理计数和标识,Context类提供Mapper和Reducer所需的上下文信息,Mapper、Reducer和Job类定义了MapReduce的基本操作...
在Hadoop的生态系统中,MapReduce是处理海量数据的一种编程模型,而InputFormat作为MapReduce编程模型的重要组成部分,是负责处理输入数据的关键接口。为了深入理解MapReduce工作原理,必须掌握InputFormat的设计和...
2. InputFormat 模块负责做 Map 前的预处理,主要包括验证输入的格式是否符合 JobConfig 的输入定义、将 input 的文件切分为逻辑上的输入 InputSplit。 3. 将 RecordReader 处理后的结果作为 Map 的输入,然后 Map ...
在MapReduce编程模型中,InputFormat是至关重要的组件,它负责将存储在HDFS(Hadoop Distributed File System)上的数据转化为可以被MapTask处理的键值对。本文将深入讲解MapReduce的InputFormat,特别是默认的...
2. 输入分片(InputSplit):InputSplit是逻辑上的分片,用于将输入文件分割成多个小块。每个InputSplit由一条条记录组成,通过RecordReader的实现来读取InputSplit中的每条记录,并将读取的记录交给Map函数来处理。...
《Hadoop权威指南》是大数据领域的一本经典之作,它深入浅出地介绍了Apache Hadoop这一分布式计算框架的原理和应用。"TDG2example"是该书中的一个示例代码集,旨在帮助读者更好地理解和实践Hadoop的核心概念。在这个...
InputSplit是指由InputFormat类分割的数据集的一部分,它可以对应于一个或多个HDFS数据块。 19. **NameNode的WebUI端口是50030,它通过jetty启动的Web服务** - **知识点说明**:NameNode的WebUI端口默认为50070。...
2. **实现RecordReader**:同时,你需要实现`org.apache.hadoop.mapreduce.RecordReader`接口,该接口负责读取split中的数据并返回键值对。在这个RecordReader中,你需要遍历合并后的文件,逐行读取数据。 3. **...
MapReduce是Google开发的一种分布式计算模型,广泛应用于大数据处理领域。它将大规模的数据处理任务分解为许多小的子任务,这些子任务...理解MapReduce的工作流程和InputFormat机制对于优化Hadoop作业性能至关重要。
Hadoop是Apache软件基金会的一个开源项目,它为处理和存储海量数据提供了可靠的解决方案。本文将详细讲解Hadoop的关键组件、架构原理、开发工具以及实际应用案例。 1. **Hadoop架构** - **HDFS(Hadoop ...
Hadoop是大数据处理领域的重要工具,其核心组件包括HDFS(Hadoop Distributed File System)和MapReduce计算框架。本文将深入探讨Hadoop API以及Hadoop的结构与设计,旨在为学习者提供全面的理解。 首先,我们来看...
- InputFormat阶段:此阶段负责将原始数据切分成多个InputSplit,每个InputSplit由RecordReader读取,并转化为key-value对的形式,供Map任务处理。RecordReader还会处理数据的压缩和类型转换。 - Map阶段:Map...
在MapReduce编程模型中,数据通常是以行的形式存储在文本文件中,每行的数据项之间由特定的分隔符(如制表符或逗号)隔开。默认情况下,Hadoop的`LineRecordReader`类将每一行作为一个记录进行处理,而对行内数据的...
在Hadoop框架中,MapReduce实现了一种简化的编程模型,使得开发者能够编写处理海量数据的应用程序。以下是MapReduce执行流程、Split切片、以及MapTask过程的详细解析。 1. MapReduce执行流程: MapReduce的工作流程...
Hadoop的TextInputFormat是默认的输入格式,它将文件按行拆分为InputSplit,并由LineRecordReader将每一行转换为键值对,键是行的偏移量,值是行的内容。开发者可以通过自定义InputFormat类,重写createRecordReader...
- **Hadoop MapReduce**:运行在Hadoop分布式文件系统(HDFS)上,相较于Google的MapReduce,Hadoop提供了更加易于使用的API和更低的使用门槛,即使是没有分布式编程经验的开发者也能快速上手。 #### 二、适合使用...
- 通过`InputFormat`接口实现类获取`RecordReader`对象,该对象负责将输入的`InputSplit`解析成一系列的键值对`(key/value)`。 - 这些键值对被依次传递给`map`函数进行处理。 2. **映射阶段** (Map): - `map`...