set hive.groupby.skewindata与数据倾斜

daizj

浏览: 795713 次
性别:
来自: 广州

最近访客更多访客>>

guwq2014

snowolf

junes_yu

yuanyuan7891

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

hive

hive hive.groupby.skewindata 数据倾斜 distinct count

hive和其它关系数据库一样，支持count(distinct)操作，但是对于大数据量中，如果出现数据倾斜时，会使得性能非常差，解决办法为设置数据负载均衡，其设置方法为设置hive.groupby.skewindata参数

hive (default)> set hive.groupby.skewindata;

hive.groupby.skewindata=false

默认该参数的值为false，表示不启用，要启用时，可以set hive.groupby.skewindata=ture;进行启用。

当启用时，能够解决数据倾斜的问题，但如果要在查询语句中对多个字段进行去重统计时会报错。

hive> set hive.groupby.skewindata=true;

hive> select count(distinct id),count(distinct x) from test;

FAILED: SemanticException [Error 10022]: DISTINCT on different columns not supported with skew in data

下面这种方式是可以正常查询

hive>select count(distinct id, x) from test;

分享到：

hive打开调试信息方法 | hive:[Fatal Error] Operator FS_14 (id=14 ...

2016-03-16 10:03
浏览 12421
评论(0)
分类:编程语言
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论