hive 备忘录

085567

浏览: 224203 次
性别:
来自: 北京

最近访客更多访客>>

zouhuiying

isy

zzr1000

bianqi

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

hive

1 hive结果用gzip压缩输出

在运行查询命令之前，设置下面参数：

set mapred.output.compress=true;

set hive.exec.compress.output=true;

set mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec;

set io.compression.codecs=org.apache.hadoop.io.compress.GzipCodec;

INSERT OVERWRITE DIRECTORY 'hive_out' select * from tables limit 10000;

2 应用cloudera 的cdh3进行 hive left outer join，并且两个表都有分区的时候：

方法一：用子查询

方法二：select a.*,b.* from table a left outer join table b on(a.uid=b.uuid and b.dt='2011-08-21') where a.dt='2011-08-21'；

3 hive写sql的时候注意数据类型：

当uid是string的时候

select count(distinct uid) from table where dt = '2011-08-28' and type=2 and loginflag='3' and (uid<'23000000' or (uid>'50000000' and uid<'1500000000'))

select count(distinct uid) from newbehavior_table where dt='2011-08-28' and type=2 and (uid<23000000 or (uid<1500000000 and uid>50000000)) and loginflag='3';

两个sql的结果是不一样的。。。。。

4 在hive建立一个存储apache 日志的表

add jar ../build/contrib/hive_contrib.jar;

CREATE TABLE apachelog (

host STRING,

identity STRING,

user STRING,

time STRING,

request STRING,

status STRING,

size STRING,

referer STRING,

agent STRING)

ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'

WITH SERDEPROPERTIES (

"input.regex" = "([^ ]*) ([^ ]*) ([^ ]*) (-|\\[[^\\]]*\\]) ([^ \"]*|\"[^\"]*\") (-|[0-9]*) (-|[0-9]*)(?: ([^ \"]*|\"[^\"]*\") ([^ \"]*|\"[^\"]*\"))?",

"output.format.string" = "%1$s %2$s %3$s %4$s %5$s %6$s %7$s %8$s %9$s"

)

STORED AS TEXTFILE;

分享到：

sqoop could not find any valid local dir ... | Hive User Defined Functions

2011-08-24 14:56
浏览 1314
评论(0)
分类:行业应用
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

hive 备忘录

评论

发表评论

相关推荐

最近访客 更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

hive 备忘录

评论

发表评论

相关推荐

hive中分组取前N个值的实现

Hive User Defined Functions

hive数组使用

如何获取hive建表语句

写好Hive 程序的五个提示

hive JDBC 连接

hive优化

hive综合

Hive 的扩展特性

hive与hbase整合

Hive与并行数据仓库的体系结构比较

定时将数据导入到hive中

Hive0.5中Partition简述

Hive SQL语法解读

应用mysql保存hive的metastore

hiveQL 优化

hive深入资料

hive 相关

基于Hive的日志数据统计实战

Hive-0.5中UDF和UDAF简述

最近访客更多访客>>