Hive使用

bupt04406

浏览: 351630 次
性别:
来自: 杭州

最近访客更多访客>>

rotkNirvana

zhangyi0618

xuhai0605

pengcong90

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

Hive

Hadoop SQL Eclipse log4j 脚本

修改 conf/hadoop-env.sh 的相关设置如：
export HADOOP_HEAPSIZE=64
export HADOOP_CLIENT_OPTS="-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/home/tianzhao/oom.hprof"
hive启动的时候会应用上面的设置，当OOM的时候，会dump映像到oom.hprof文件，可用java的VisualVM来查看内存相关的信息

partition相关：
hive在设置的内存相对比较小（64或128等）的时候会，对partition数有限制，写了一些脚本测试partition.

for ((i=1; i<=10000; i++));
   do echo "alter table tablenamexxx add partition(pt='${i}');" >> parit_test.sql;
done

建表语句
create table tablenamexxx(s string) partitioned by (pt string);

生成的添加partition语句是
alter table partition2 add if not exists partition(pt='1');
alter table partition2 add if not exists partition(pt='2');
alter table partition2 add if not exists partition(pt='3');

运行parit_test.sql
cd到hive目录下面 bin/hive -f parit_test.sql 即可

修改表名：
ALTER TABLE table_name RENAME TO new_table_name

hive> select distinct value from src;
hive> select max(key) from src;

log日志：
目录下面的文件 conf/hive-log4j.properties
#hive.root.logger=WARN,DRFA
hive.root.logger=DEBUG,DRFA
修改log级别为debug，日志存储在下面的文件中 /tmp/tianzhao/hive.log
hive.log.dir=/tmp/${user.name}
hive.log.file=hive.log

user.name 为tianzhao
运行的过程中可以打开 hive.log文件 tail -f hive.log，在日志生成的过程中会在终端打印出来

hive命令记录：
hive每次执行的命令都会记录到当前用户主目录的 .hivehistory 文件中
tianzhao@tianzhao-VirtualBox:~$ less .hivehistory

代码在CliDriver的main函数中
    final String HISTORYFILE = ".hivehistory";
    String historyFile = System.getProperty("user.home") + File.separator + HISTORYFILE;
    reader.setHistory(new History(new File(historyFile)));

[-count[-q] <path>]
$hadoop fs -count /history/ 目录下的文件数

(1)查看表的信息
hive> describe extended partition2;
OK
s string
pt string

Detailed Table Information Table(tableName:partition2, dbName:default, owner:tianzhao, createTime:1304566227, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:s, type:string, comment:null)], location:hdfs://localhost:54310/user/hive/warehouse/partition2, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format=1}), bucketCols:[], sortCols:[], parameters:{}), partitionKeys:[FieldSchema(name:pt, type:string, comment:null)], parameters:{transient_lastDdlTime=1304566227}, viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE)
Time taken: 0.054 seconds

hive> describe partition2;
OK
s string
pt string
Time taken: 0.104 seconds

hive> show functions hash;
OK
hash
Time taken: 0.062 seconds
hive> describe function hash;
OK
hash(a1, a2, ...) - Returns a hash value of the arguments
Time taken: 0.049 seconds
hive> describe function extended hash;
OK
hash(a1, a2, ...) - Returns a hash value of the arguments
Time taken: 0.05 seconds

输入数据形式：
1&&&&2&&&&4

CREATE TABLE IF NOT EXISTS rtable1 (
   str1 string,
   str2 string,
   str3 string
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
With SERDEPROPERTIES (
"input.regex"="(\\d+)&&&&(\\d+)&&&&(\\d+)"
);

load data local inpath '/home/tianzhao/sql/data/RegexSerDe' into table rtable1;

select * from rtable1;

http://search-hadoop.com/m/WBuaH1Z4TKu1/partition+%252B++filter+%252B+udf&subj=+ANNOUNCE+Apache+Hive+0+7+0+Released

https://issues.apache.org/jira/browse/HIVE-1750
[HIVE-1609] - Support partition filtering in metastore
https://issues.apache.org/jira/browse/HIVE-1862
https://issues.apache.org/jira/browse/HIVE-1849
https://issues.apache.org/jira/browse/HIVE-1738
https://issues.apache.org/jira/browse/HIVE-1758
https://issues.apache.org/jira/browse/HIVE-1642

https://issues.apache.org/jira/browse/HIVE-1913

https://issues.apache.org/jira/browse/HIVE-1430

https://issues.apache.org/jira/browse/HIVE-1305

https://issues.apache.org/jira/browse/HIVE-1462

https://issues.apache.org/jira/browse/HIVE-1790

https://issues.apache.org/jira/browse/HIVE-1514

https://issues.apache.org/jira/browse/HIVE-1971

https://issues.apache.org/jira/browse/HIVE-1361

https://issues.apache.org/jira/browse/HIVE-138

https://issues.apache.org/jira/browse/HIVE-1835

https://issues.apache.org/jira/browse/HIVE-1815

https://issues.apache.org/jira/browse/HIVE-1943

https://issues.apache.org/jira/browse/HIVE-2056

https://issues.apache.org/jira/browse/HIVE-2028

https://issues.apache.org/jira/browse/HIVE-1918

https://issues.apache.org/jira/browse/HIVE-1803

https://issues.apache.org/jira/browse/HIVE-558
https://issues.apache.org/jira/browse/HIVE-1658

https://issues.apache.org/jira/browse/HIVE-1731

https://issues.apache.org/jira/browse/HIVE-138

https://issues.apache.org/jira/browse/HIVE-1408

在eclipse里面debug hive

未完待续

分享到：

svn使用 | hadoop in action

2011-05-06 09:19
浏览 3245
评论(3)
分类:企业架构
查看更多

3 楼 kezhon 2011-09-06

bupt04406 写道

add file

这些files也要像ADD JAR那样add吗？但是我试过，没有效果啊？

2 楼 bupt04406 2011-09-06

add file

1 楼 kezhon 2011-09-06

前辈能不能请教个问题，我在使用hive写UDF时，函数需要引用外部resource中的文件，但是运行时就报
java.io.FileNotFoundException: resource/placeMap.txt (No such file or directory)。
但是在本地运行无误。
请问可以怎么解决？非常感谢！

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

Hive使用

评论

发表评论

相关推荐

最近访客 更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

Hive使用

评论

发表评论

相关推荐

hive rename table name

hive的distribute by如何partition long型的数据

hive like vs rlike vs regexp

hive sql where条件很简单，但是太多

insert into时(string->bigint)自动类型转换

通过复合结构来优化udf的调用

RegexSerDe

Hive 的 OutputCommitter

hive LATERAL VIEW 行转列

hive complex type

hive转义字符

hive 两个不同类型的columns进行比较

lateral view

udf 中获得 FileSystem

hive union mapjoin

hive eclipse

hive join filter

hive limit

hive convertMapJoin MapJoinProcessor

hive hive.merge.mapfiles hive.merge.mapredfiles

最近访客更多访客>>