Hadoop的Mapper是怎么从HDFS上读取TextInputFormat数据的

coderplay

浏览: 581121 次
性别:
来自: 广州杭州

最近访客更多访客>>

x_h_j123

liuxiao723846

汀雨晓洛

springcdma

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

mapreduce&parallel

Hadoop Apache SUN .net JDK

LineRecordReader.next(LongWritable key, Text value)
  LineReader.readLine(Text str, int maxLineLength, int maxBytesToConsume)
    DataInputStream.read(byte b[])  /* DFSDataInputStream继承此方法 */
      DFSInputStream.read(long position, byte[] buffer, int offset, int length)
        DFSInputStream.fetchBlockByteRange(LocatedBlock block, long start,long end, byte[] buf, int offset)
          BlockReader.readAll(byte[] buf, int offset, int len)
            FSInputChecker.readFully(InputStream stm, byte[] buf, int offset, int len)
              BlockReader.read(byte[] buf, int off, int len)
                FSInputChecker.read(byte[] b, int off, int len)
                  FSInputChecker.read1(byte b[], int off, int len)
                    FSInputChecker.readChecksumChunk(byte b[], final int off, final int len)
                      BlockReader.readChunk(long pos, byte[] buf, int offset, int len, byte[] checksumBuf)
                        IOUtils.readFullyreadFully( InputStream in, byte buf[], int off, int len)
                          DataInputStream.read(byte b[], int off, int len)
                            BufferedInputStream.read(byte b[], int off, int len)
                              BufferedInputStream.read1(byte[] b, int off, int len)
                                org.apache.hadoop.net.SocketInputStream.read(byte[] b, int off, int len)
                                  org.apache.hadoop.net.SocketInputStream.read(ByteBuffer dst)
                                    org.apache.hadoop.net.SocketIOWithTimeout.doIO(ByteBuffer buf, int ops)
                                      org.apache.hadoop.net.SocketInputStream.Reader.performIO(ByteBuffer buf)
                                        sun.nio.ch.SocketChannelImpl.read(ByteBuffer buf)
                                          sun.nio.ch.IOUtiil.read(FileDescriptor fd, ByteBuffer dst, long position, NativeDispatcher nd, Object lock)
                                            sun.nio.ch.IOUtiil.readIntoNativeBuffer(FileDescriptor fd, ByteBuffer bb, long position, NativeDispatcher nd,Object lock)
                                              sun.nio.ch.SocketDispatcher.read(FileDescriptor fd, long address, int len)
                                                 sun.nio.ch.SocketDispatcher.read0(FileDescriptor fd, long address, int len) /* Native Method，根据不同的JDK实现不同 */

分享到：

演讲: Hadoop与数据分析 | Anthill: 一种基于MapReduce的分布式DB ...

2010-05-29 11:46
浏览 8011
评论(8)
分类:编程语言
查看更多

8 楼左门马 2013-03-21

hey！
我在源码里插标记测试了一下，好像并不会调用 DFSInputStream.read(long position, byte[] buffer, int offset, int length) 和
DFSInputStream.fetchBlockByteRange(LocatedBlock block, long start,long end, byte[] buf, int offset)

而是 DFSInputStream.read(byte[] buffer, int offset, int length) 。

也就是不指定position的read方法。

我测试的源码是0.20.2的，不知楼主是哪个版本

7 楼 zk279444107 2012-03-20

herry 写道

如果jobconf指定的FS是本地的文件系统，不上传到HDFS中，那如何能够实现在各个节点实现分布式的计算?

这个问题也在困扰我，能否解答下？

6 楼 herry 2011-03-15

如果jobconf指定的FS是本地的文件系统，不上传到HDFS中，那如何能够实现在各个节点实现分布式的计算?

5 楼 coderplay 2011-02-28

herry 写道

提交Job任务，输入文件的问题
通过它:FileInputFormat.addInputPath(jobconf, new Path(args[0]));
指定执行M/R作业的文件的路径。
这些文件可能有不同的来源，如本地文件系统，或者分布式文件系统。
1.最终都会上传到Hadoop的HDFS中吗？
不管Hadoop使用的是何种FS, input path是不上传的。它就是在该jobconf指定的FS之上. 一般情况下是指HDFS.

2.这个过程是啥时候上传上去的？
作业提交会上传job.jar, job.split和jobconf.xml

3.如果数据来源已经存在了HDFS中，那么还会再上传一次吗？

从不会上传.

4 楼 herry 2011-01-27

提交Job任务，输入文件的问题
通过它:FileInputFormat.addInputPath(jobconf, new Path(args[0]));
指定执行M/R作业的文件的路径。
这些文件可能有不同的来源，如本地文件系统，或者分布式文件系统。
1.最终都会上传到Hadoop的HDFS中吗？
2.这个过程是啥时候上传上去的？
3.如果数据来源已经存在了HDFS中，那么还会再上传一次吗？

3 楼 bupt04406 2010-06-02

coderplay 写道

bupt04406 写道

请问这个怎么弄出来的呢？

人肉跟踪, 手工打出来的~

嗯，:-)

2 楼 coderplay 2010-06-02

bupt04406 写道

请问这个怎么弄出来的呢？

人肉跟踪, 手工打出来的~

1 楼 bupt04406 2010-06-01

请问这个怎么弄出来的呢？

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论