hive的distribute by如何partition long型的数据

bupt04406

浏览: 353375 次
性别:
来自: 杭州

最近访客更多访客>>

rotkNirvana

zhangyi0618

xuhai0605

pengcong90

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

Hive

有用户问：hive的distribute by分桶是怎么分的。如果distribute by sellerId , reduce个数设的20，那是按照sellerId mod 20来分桶吗？sellerId 是bigint类型。

原先我也以为是这样子的，但是用户提出了疑问：很奇怪，跑出来的结果，有一小部分数据不对。这是sellerId，按照 mod的话，应该在第8分文件，但是却在第0份文件中

  public static int hashCode(Object o, ObjectInspector objIns) {
    if (o == null) {
      return 0;
    }
    switch (objIns.getCategory()) {
    case PRIMITIVE: {
      PrimitiveObjectInspector poi = ((PrimitiveObjectInspector) objIns);
      switch (poi.getPrimitiveCategory()) {
      case VOID:
        return 0;
      case BOOLEAN:
        return ((BooleanObjectInspector) poi).get(o) ? 1 : 0;
      case BYTE:
        return ((ByteObjectInspector) poi).get(o);
      case SHORT:
        return ((ShortObjectInspector) poi).get(o);
      case INT:
        return ((IntObjectInspector) poi).get(o);
      case LONG: {
        long a = ((LongObjectInspector) poi).get(o);
        return (int) ((a >>> 32) ^ a);
      }
      case FLOAT:
        return Float.floatToIntBits(((FloatObjectInspector) poi).get(o));
      case DOUBLE: {
        // This hash function returns the same result as Double.hashCode()
        // while DoubleWritable.hashCode returns a different result.
        long a = Double.doubleToLongBits(((DoubleObjectInspector) poi).get(o));
        return (int) ((a >>> 32) ^ a);
      }
      case STRING: {
        // This hash function returns the same result as String.hashCode() when
        // all characters are ASCII, while Text.hashCode() always returns a
        // different result.
        Text t = ((StringObjectInspector) poi).getPrimitiveWritableObject(o);
        int r = 0;
        for (int i = 0; i < t.getLength(); i++) {
          r = r * 31 + t.getBytes()[i];
        }
        return r;
      }
      case TIMESTAMP:
        TimestampWritable t = ((TimestampObjectInspector) poi)
            .getPrimitiveWritableObject(o);
        return t.hashCode();
      default: {
        throw new RuntimeException("Unknown type: "
            + poi.getPrimitiveCategory());
      }
      }
    }
    case STRUCT:
    case LIST:
    case MAP:
    case UNION:
    default:
      throw new RuntimeException(
          "Hash code on complex types not supported yet.");
    }
  }

hive的Partitioner是DefaultHivePartitioner

  /** Use {@link Object#hashCode()} to partition. */
  public int getBucket(K2 key, V2 value, int numBuckets) {
    return (key.hashCode() & Integer.MAX_VALUE) % numBuckets;
  }

写了个java程序测试一下发现3591111568这个id的数据确实是分到了reduce0去了

    long a = 3591111568L;
    int hashcode = (int) ((a >>> 32) ^ a);
    System.out.println((hashcode & Integer.MAX_VALUE) % 20);

所以如果要达到用户的目的那么需要改成 distribute by sellerId%20

分享到：

cdh4 vs cdh3 client处理DataNode异常的不 ... | hdfs 升级，cdh3 升级 cdh4

2013-08-20 10:15
浏览 2511
评论(0)
分类:开源软件
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

hive的distribute by如何partition long型的数据

评论

发表评论

相关推荐

最近访客 更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

hive的distribute by如何partition long型的数据

评论

发表评论

相关推荐

hive rename table name

hive like vs rlike vs regexp

hive sql where条件很简单，但是太多

insert into时(string->bigint)自动类型转换

通过复合结构来优化udf的调用

RegexSerDe

Hive 的 OutputCommitter

hive LATERAL VIEW 行转列

hive complex type

hive转义字符

hive 两个不同类型的columns进行比较

lateral view

udf 中获得 FileSystem

hive union mapjoin

hive eclipse

hive join filter

hive limit

hive convertMapJoin MapJoinProcessor

hive hive.merge.mapfiles hive.merge.mapredfiles

hive mapjoin分析

最近访客更多访客>>