InputFormat牛逼（5）org.apache.hadoop.mapreduce.lib.db.DBInputFormat<T>

EclipseEye

浏览: 151601 次
性别:
来自: 北京

最近访客更多访客>>

chenqisdfx

xiaohuohaoxiao

The魂狩

小小云麓

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

Hadoop/MapReaduce

@Public
@Stable

A InputFormat that reads input data from an SQL table.

DBInputFormat emits LongWritables containing the record number as key and DBWritables as value. The SQL query, and input class can be using one of the two setInput methods.

InputFormat核心方法有两个：getSplits和createRecordReader
核心方法一

  /** {@inheritDoc} */
  public List<InputSplit> getSplits(JobContext job) throws IOException {

    ResultSet results = null;  
    Statement statement = null;
    try {
      statement = connection.createStatement();

      results = statement.executeQuery(getCountQuery());
      results.next();

      long count = results.getLong(1);
      int chunks = job.getConfiguration().getInt(MRJobConfig.NUM_MAPS, 1);
      long chunkSize = (count / chunks);

      results.close();
      statement.close();

      List<InputSplit> splits = new ArrayList<InputSplit>();

      // Split the rows into n-number of chunks and adjust the last chunk
      // accordingly
      for (int i = 0; i < chunks; i++) {
        DBInputSplit split;

        if ((i + 1) == chunks)
          split = new DBInputSplit(i * chunkSize, count);
        else
          split = new DBInputSplit(i * chunkSize, (i * chunkSize)
              + chunkSize);

        splits.add(split);
      }

      connection.commit();
      return splits;
    } catch (SQLException e) {
      throw new IOException("Got SQLException", e);
    } finally {
      try {
        if (results != null) { results.close(); }
      } catch (SQLException e1) {}
      try {
        if (statement != null) { statement.close(); }
      } catch (SQLException e1) {}

      closeConnection();
    }
  }

核心方法二

public RecordReader<LongWritable, T> createRecordReader(InputSplit split,
      TaskAttemptContext context) throws IOException, InterruptedException {  

    return createDBRecordReader((DBInputSplit) split, context.getConfiguration());
  }

protected RecordReader<LongWritable, T> createDBRecordReader(DBInputSplit split,
      Configuration conf) throws IOException {

    @SuppressWarnings("unchecked")
    Class<T> inputClass = (Class<T>) (dbConf.getInputClass());
    try {
      // use database product name to determine appropriate record reader.
      if (dbProductName.startsWith("ORACLE")) {
        // use Oracle-specific db reader.
        return new OracleDBRecordReader<T>(split, inputClass,
            conf, getConnection(), getDBConf(), conditions, fieldNames,
            tableName);
      } else if (dbProductName.startsWith("MYSQL")) {
        // use MySQL-specific db reader.
        return new MySQLDBRecordReader<T>(split, inputClass,
            conf, getConnection(), getDBConf(), conditions, fieldNames,
            tableName);
      } else {
        // Generic reader.
        return new DBRecordReader<T>(split, inputClass,
            conf, getConnection(), getDBConf(), conditions, fieldNames,
            tableName);
      }
    } catch (SQLException ex) {
      throw new IOException(ex.getMessage());
    }
  }

分享到：

InputFormat牛逼（7）抽象类FileInputFor ... | InputFormat牛逼（4）org.apache.hadoop. ...

2015-03-10 23:10
浏览 615
评论(0)
分类:互联网
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

InputFormat牛逼（5）org.apache.hadoop.mapreduce.lib.db.DBInputFormat<T>

评论

发表评论

相关推荐

最近访客 更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

InputFormat牛逼（5）org.apache.hadoop.mapreduce.lib.db.DBInputFormat<T>

评论

发表评论

相关推荐

数据迁移相关（关系型数据库mysql，oracle和nosql数据库如hbase）

zookeeper适用场景：如何竞选Master及代码实现

MR/hive 数据去重

面试牛x题

使用shell并发上传文件到hdfs

hadoop集群监控工具Apache Ambari

Hadoop MapReduce优化相关

数据倾斜问题 牛逼（1）数据倾斜之MapReduce&hive

MapReduce牛逼（4）WritableComparable接口

MapReduce牛逼（3）（继承WritableComparable)实现自定义key键，实现二重排序

MapReduce牛逼（2）MR简单实现 导入数据到hbase例子

MapReduce牛逼（1）MR单词计数例子

InputFormat牛逼（9）FileInputFormat实现类之SequenceFileInputFormat

InputFormat牛逼（8）FileInputFormat实现类之TextInputFormat

InputFormat牛逼（6）org.apache.hadoop.mapreduce.lib.db.DBRecordReader<T>

InputFormat牛逼（4）org.apache.hadoop.mapreduce.RecordReader<KEYIN, VALUEIN>

InputFormat牛逼（3）org.apache.hadoop.mapreduce.InputFormat<K, V>

InputFormat牛逼（2）org.apache.hadoop.mapreduce.InputSplit & DBInputSplit

InputFormat牛逼（1）org.apache.hadoop.mapreduce.lib.db.DBWritable

如何把hadoop2 的job作业 提交到 yarn平台

最近访客更多访客>>

数据倾斜问题牛逼（1）数据倾斜之MapReduce&hive

MapReduce牛逼（2）MR简单实现导入数据到hbase例子

如何把hadoop2 的job作业提交到 yarn平台