- 浏览: 348379 次
- 性别:
- 来自: 杭州
文章分类
最新评论
-
lvyuan1234:
你好,你那个sample.txt文件可以分享给我吗
hive insert overwrite into -
107x:
不错,谢谢!
hive 表的一些默认值 -
on_way_:
赞
Hadoop相关书籍 -
bupt04406:
dengkanghua 写道出来这个问题该怎么解决?hbase ...
Unexpected state导致HMaster abort -
dengkanghua:
出来这个问题该怎么解决?hbase master启动不起来。
Unexpected state导致HMaster abort
src/data 目录下面是输入数据
src/ql/src/test/queries下面是测试用例,clientpositive是运行成功的用例,clientnegative是运行失败,返回非0的用例。
src/ql/src/test/results 下面是测试用例对应的输出结果。如src/ql/src/test/queries/case_sensitivity.q对应的输出结果是src/ql/src/test/results/case_sensitivity.q.out
测试会运行case_sensitivity.q产生的输出放入build/ql/test/logs/clientpositive/case_sensitivity.q.out文件中,然后通过diff比对build/ql/test/logs/clientpositive/case_sensitivity.q.out src/ql/src/test/results/case_sensitivity.q.out两个文件,如果相同则测试通过,如果不同则测试失败。
CREATE TABLE sales (name STRING, id INT)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t’;
CREATE TABLE things (id INT, name STRING) partitioned by (ds string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t’;
load data local inpath 'examples/files/sales.txt' INTO TABLE sales;
load data local inpath 'examples/files/things.txt' INTO TABLE things partition(ds='2011-10-23’);
load data local inpath 'examples/files/things2.txt' INTO TABLE things partition(ds='2011-10-24’);
SELECT name,id FROM sales ORDER BY name ASC, id ASC;
SELECT id,name FROM things ORDER BY id ASC, name ASC;
SELECT name,id FROM sales LEFT SEMI JOIN things ON (sales.id = things.id) ORDER BY name ASC, id ASC;
测试前会事先create一些table,并load一些数据进去。
比如:case_sensitivity.q 的代码是
tianzhao@tianzhao-VirtualBox:~/hive/hive-1.1.2/src/ql/src/test/queries/clientpositive$ cat case_sensitivity.q
CREATE TABLE DEST1(Key INT, VALUE STRING) STORED AS TEXTFILE;
EXPLAIN
FROM SRC_THRIFT
INSERT OVERWRITE TABLE dest1 SELECT src_Thrift.LINT[1], src_thrift.lintstring[0].MYSTRING where src_thrift.liNT[0] > 0;
FROM SRC_THRIFT
INSERT OVERWRITE TABLE dest1 SELECT src_Thrift.LINT[1], src_thrift.lintstring[0].MYSTRING where src_thrift.liNT[0] > 0;
SELECT DEST1.* FROM Dest1;
/// 上面的HSQL中并没有创建SRC_THRIFT 表,但是能够用,那是因为在执行该文件的语句前已经创建好了这个表。
代码在org.apache.hadoop.hive.ql.QTestUtil.java中,它的方法init、createSources等。
从ant test -Dtestcase=TestCliDriver -Dqfile=case_sensitivity.q -Dtest.silent=false 在终端的输出也可以看出:
[junit] PREHOOK: query: LOAD DATA LOCAL INPATH '/home/tianzhao/apache/hive-trunk/data/files/kv1.txt' OVERWRITE INTO TABLE srcpart PARTITION (ds='2008-04-08',hr='11')
[junit] PREHOOK: type: LOAD
[junit] PREHOOK: Output: default@srcpart
[junit] Copying data from file:/home/tianzhao/apache/hive-trunk/data/files/kv1.txt
[junit] Copying file: file:/home/tianzhao/apache/hive-trunk/data/files/kv1.txt
[junit] Loading data to table default.srcpart partition (ds=2008-04-08, hr=11)
[junit] POSTHOOK: query: LOAD DATA LOCAL INPATH '/home/tianzhao/apache/hive-trunk/data/files/kv1.txt' OVERWRITE INTO TABLE srcpart PARTITION (ds='2008-04-08',hr='11')
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@srcpart
[junit] POSTHOOK: Output: default@srcpart@ds=2008-04-08/hr=11
[junit] OK
[junit] PREHOOK: query: LOAD DATA LOCAL INPATH '/home/tianzhao/apache/hive-trunk/data/files/kv1.txt' OVERWRITE INTO TABLE srcpart PARTITION (ds='2008-04-08',hr='12')
[junit] PREHOOK: type: LOAD
[junit] PREHOOK: Output: default@srcpart
[junit] Copying data from file:/home/tianzhao/apache/hive-trunk/data/files/kv1.txt
[junit] Copying file: file:/home/tianzhao/apache/hive-trunk/data/files/kv1.txt
[junit] Loading data to table default.srcpart partition (ds=2008-04-08, hr=12)
[junit] POSTHOOK: query: LOAD DATA LOCAL INPATH '/home/tianzhao/apache/hive-trunk/data/files/kv1.txt' OVERWRITE INTO TABLE srcpart PARTITION (ds='2008-04-08',hr='12')
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@srcpart
[junit] POSTHOOK: Output: default@srcpart@ds=2008-04-08/hr=12
[junit] OK
[junit] PREHOOK: query: LOAD DATA LOCAL INPATH '/home/tianzhao/apache/hive-trunk/data/files/kv1.txt' OVERWRITE INTO TABLE srcpart PARTITION (ds='2008-04-09',hr='11')
[junit] PREHOOK: type: LOAD
[junit] PREHOOK: Output: default@srcpart
[junit] Copying data from file:/home/tianzhao/apache/hive-trunk/data/files/kv1.txt
[junit] Copying file: file:/home/tianzhao/apache/hive-trunk/data/files/kv1.txt
[junit] Loading data to table default.srcpart partition (ds=2008-04-09, hr=11)
[junit] POSTHOOK: query: LOAD DATA LOCAL INPATH '/home/tianzhao/apache/hive-trunk/data/files/kv1.txt' OVERWRITE INTO TABLE srcpart PARTITION (ds='2008-04-09',hr='11')
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@srcpart
[junit] POSTHOOK: Output: default@srcpart@ds=2008-04-09/hr=11
[junit] OK
[junit] PREHOOK: query: LOAD DATA LOCAL INPATH '/home/tianzhao/apache/hive-trunk/data/files/kv1.txt' OVERWRITE INTO TABLE srcpart PARTITION (ds='2008-04-09',hr='12')
[junit] PREHOOK: type: LOAD
[junit] PREHOOK: Output: default@srcpart
[junit] Copying data from file:/home/tianzhao/apache/hive-trunk/data/files/kv1.txt
[junit] Copying file: file:/home/tianzhao/apache/hive-trunk/data/files/kv1.txt
[junit] Loading data to table default.srcpart partition (ds=2008-04-09, hr=12)
[junit] POSTHOOK: query: LOAD DATA LOCAL INPATH '/home/tianzhao/apache/hive-trunk/data/files/kv1.txt' OVERWRITE INTO TABLE srcpart PARTITION (ds='2008-04-09',hr='12')
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@srcpart
[junit] POSTHOOK: Output: default@srcpart@ds=2008-04-09/hr=12
[junit] OK
[junit] PREHOOK: query: CREATE TABLE srcbucket(key int, value string) CLUSTERED BY (key) INTO 2 BUCKETS STORED AS TEXTFILE
[junit] PREHOOK: type: CREATETABLE
[junit] POSTHOOK: query: CREATE TABLE srcbucket(key int, value string) CLUSTERED BY (key) INTO 2 BUCKETS STORED AS TEXTFILE
[junit] POSTHOOK: type: CREATETABLE
[junit] POSTHOOK: Output: default@srcbucket
[junit] OK
[junit] PREHOOK: query: LOAD DATA LOCAL INPATH '/home/tianzhao/apache/hive-trunk/data/files/srcbucket0.txt' INTO TABLE srcbucket
[junit] PREHOOK: type: LOAD
[junit] PREHOOK: Output: default@srcbucket
[junit] Copying data from file:/home/tianzhao/apache/hive-trunk/data/files/srcbucket0.txt
[junit] Copying file: file:/home/tianzhao/apache/hive-trunk/data/files/srcbucket0.txt
[junit] Loading data to table default.srcbucket
[junit] POSTHOOK: query: LOAD DATA LOCAL INPATH '/home/tianzhao/apache/hive-trunk/data/files/srcbucket0.txt' INTO TABLE srcbucket
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@srcbucket
[junit] OK
[junit] PREHOOK: query: LOAD DATA LOCAL INPATH '/home/tianzhao/apache/hive-trunk/data/files/srcbucket1.txt' INTO TABLE srcbucket
[junit] PREHOOK: type: LOAD
[junit] PREHOOK: Output: default@srcbucket
[junit] Copying data from file:/home/tianzhao/apache/hive-trunk/data/files/srcbucket1.txt
[junit] Copying file: file:/home/tianzhao/apache/hive-trunk/data/files/srcbucket1.txt
[junit] Loading data to table default.srcbucket
[junit] POSTHOOK: query: LOAD DATA LOCAL INPATH '/home/tianzhao/apache/hive-trunk/data/files/srcbucket1.txt' INTO TABLE srcbucket
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@srcbucket
[junit] OK
[junit] PREHOOK: query: CREATE TABLE srcbucket2(key int, value string) CLUSTERED BY (key) INTO 4 BUCKETS STORED AS TEXTFILE
[junit] PREHOOK: type: CREATETABLE
[junit] POSTHOOK: query: CREATE TABLE srcbucket2(key int, value string) CLUSTERED BY (key) INTO 4 BUCKETS STORED AS TEXTFILE
[junit] POSTHOOK: type: CREATETABLE
[junit] POSTHOOK: Output: default@srcbucket2
[junit] OK
[junit] PREHOOK: query: LOAD DATA LOCAL INPATH '/home/tianzhao/apache/hive-trunk/data/files/srcbucket20.txt' INTO TABLE srcbucket2
[junit] PREHOOK: type: LOAD
[junit] PREHOOK: Output: default@srcbucket2
[junit] Copying data from file:/home/tianzhao/apache/hive-trunk/data/files/srcbucket20.txt
[junit] Copying file: file:/home/tianzhao/apache/hive-trunk/data/files/srcbucket20.txt
[junit] Loading data to table default.srcbucket2
[junit] POSTHOOK: query: LOAD DATA LOCAL INPATH '/home/tianzhao/apache/hive-trunk/data/files/srcbucket20.txt' INTO TABLE srcbucket2
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@srcbucket2
[junit] OK
[junit] PREHOOK: query: LOAD DATA LOCAL INPATH '/home/tianzhao/apache/hive-trunk/data/files/srcbucket21.txt' INTO TABLE srcbucket2
[junit] PREHOOK: type: LOAD
[junit] PREHOOK: Output: default@srcbucket2
[junit] Copying data from file:/home/tianzhao/apache/hive-trunk/data/files/srcbucket21.txt
[junit] Copying file: file:/home/tianzhao/apache/hive-trunk/data/files/srcbucket21.txt
[junit] Loading data to table default.srcbucket2
[junit] POSTHOOK: query: LOAD DATA LOCAL INPATH '/home/tianzhao/apache/hive-trunk/data/files/srcbucket21.txt' INTO TABLE srcbucket2
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@srcbucket2
[junit] OK
[junit] PREHOOK: query: LOAD DATA LOCAL INPATH '/home/tianzhao/apache/hive-trunk/data/files/srcbucket22.txt' INTO TABLE srcbucket2
[junit] PREHOOK: type: LOAD
[junit] PREHOOK: Output: default@srcbucket2
[junit] Copying data from file:/home/tianzhao/apache/hive-trunk/data/files/srcbucket22.txt
[junit] Copying file: file:/home/tianzhao/apache/hive-trunk/data/files/srcbucket22.txt
[junit] Loading data to table default.srcbucket2
[junit] POSTHOOK: query: LOAD DATA LOCAL INPATH '/home/tianzhao/apache/hive-trunk/data/files/srcbucket22.txt' INTO TABLE srcbucket2
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@srcbucket2
[junit] OK
[junit] PREHOOK: query: LOAD DATA LOCAL INPATH '/home/tianzhao/apache/hive-trunk/data/files/srcbucket23.txt' INTO TABLE srcbucket2
[junit] PREHOOK: type: LOAD
[junit] PREHOOK: Output: default@srcbucket2
[junit] Copying data from file:/home/tianzhao/apache/hive-trunk/data/files/srcbucket23.txt
[junit] Copying file: file:/home/tianzhao/apache/hive-trunk/data/files/srcbucket23.txt
[junit] Loading data to table default.srcbucket2
[junit] POSTHOOK: query: LOAD DATA LOCAL INPATH '/home/tianzhao/apache/hive-trunk/data/files/srcbucket23.txt' INTO TABLE srcbucket2
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@srcbucket2
[junit] OK
[junit] PREHOOK: query: LOAD DATA LOCAL INPATH '/home/tianzhao/apache/hive-trunk/data/files/kv1.txt' INTO TABLE src
[junit] PREHOOK: type: LOAD
[junit] PREHOOK: Output: default@src
[junit] Copying data from file:/home/tianzhao/apache/hive-trunk/data/files/kv1.txt
[junit] Copying file: file:/home/tianzhao/apache/hive-trunk/data/files/kv1.txt
[junit] Loading data to table default.src
[junit] POSTHOOK: query: LOAD DATA LOCAL INPATH '/home/tianzhao/apache/hive-trunk/data/files/kv1.txt' INTO TABLE src
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@src
[junit] OK
[junit] PREHOOK: query: LOAD DATA LOCAL INPATH '/home/tianzhao/apache/hive-trunk/data/files/kv3.txt' INTO TABLE src1
[junit] PREHOOK: type: LOAD
[junit] PREHOOK: Output: default@src1
[junit] Copying data from file:/home/tianzhao/apache/hive-trunk/data/files/kv3.txt
[junit] Copying file: file:/home/tianzhao/apache/hive-trunk/data/files/kv3.txt
[junit] Loading data to table default.src1
[junit] POSTHOOK: query: LOAD DATA LOCAL INPATH '/home/tianzhao/apache/hive-trunk/data/files/kv3.txt' INTO TABLE src1
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@src1
[junit] OK
[junit] PREHOOK: query: LOAD DATA LOCAL INPATH '/home/tianzhao/apache/hive-trunk/data/files/kv1.seq' INTO TABLE src_sequencefile
[junit] PREHOOK: type: LOAD
[junit] PREHOOK: Output: default@src_sequencefile
[junit] Copying data from file:/home/tianzhao/apache/hive-trunk/data/files/kv1.seq
[junit] Copying file: file:/home/tianzhao/apache/hive-trunk/data/files/kv1.seq
[junit] Loading data to table default.src_sequencefile
[junit] POSTHOOK: query: LOAD DATA LOCAL INPATH '/home/tianzhao/apache/hive-trunk/data/files/kv1.seq' INTO TABLE src_sequencefile
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@src_sequencefile
[junit] OK
[junit] PREHOOK: query: LOAD DATA LOCAL INPATH '/home/tianzhao/apache/hive-trunk/data/files/complex.seq' INTO TABLE src_thrift
[junit] PREHOOK: type: LOAD
[junit] PREHOOK: Output: default@src_thrift
[junit] Copying data from file:/home/tianzhao/apache/hive-trunk/data/files/complex.seq
[junit] Copying file: file:/home/tianzhao/apache/hive-trunk/data/files/complex.seq
[junit] Loading data to table default.src_thrift
[junit] POSTHOOK: query: LOAD DATA LOCAL INPATH '/home/tianzhao/apache/hive-trunk/data/files/complex.seq' INTO TABLE src_thrift
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@src_thrift
[junit] OK
[junit] PREHOOK: query: LOAD DATA LOCAL INPATH '/home/tianzhao/apache/hive-trunk/data/files/json.txt' INTO TABLE src_json
[junit] PREHOOK: type: LOAD
[junit] PREHOOK: Output: default@src_json
[junit] Copying data from file:/home/tianzhao/apache/hive-trunk/data/files/json.txt
[junit] Copying file: file:/home/tianzhao/apache/hive-trunk/data/files/json.txt
[junit] Loading data to table default.src_json
[junit] POSTHOOK: query: LOAD DATA LOCAL INPATH '/home/tianzhao/apache/hive-trunk/data/files/json.txt' INTO TABLE src_json
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@src_json
[junit] OK
上面显示了创建的表和load的数据。
测试case_sensitivity.q的显示是:
[junit] Running org.apache.hadoop.hive.cli.TestCliDriver
[junit] Begin query: case_sensitivity.q
[junit] Deleted file:/home/tianzhao/apache/hive-trunk/build/ql/test/data/warehouse/dest1
[junit] diff -a -I file: -I pfile: -I hdfs: -I /tmp/ -I invalidscheme: -I lastUpdateTime -I lastAccessTime -I [Oo]wner -I CreateTime -I LastAccessTime -I Location -I transient_lastDdlTime -I last_modified_ -I java.lang.RuntimeException -I at org -I at sun -I at java -I at junit -I Caused by: -I LOCK_QUERYID: -I grantTime -I [.][.][.] [0-9]* more -I USING 'java -cp /home/tianzhao/apache/hive-trunk/build/ql/test/logs/clientpositive/case_sensitivity.q.out /home/tianzhao/apache/hive-trunk/ql/src/test/results/clientpositive/case_sensitivity.q.out
[junit] Done query: case_sensitivity.q
[junit] Cleaning up TestCliDriver
生成的结果在build/ql/test/logs/clientpositive/case_sensitivity.q.out 中,通过diff比对它与ql/src/test/results/clientpositive/case_sensitivity.q.out是否相同来判断该UT是否通过。
我们来看如何手工创建src_thrift表,load数据,并执行case_sensitivity.q中的HSQL。
org.apache.hadoop.hive.ql.QTestUtil.createSources()中创建src_thrift表的语句是:
Table srcThrift = new Table(db.getCurrentDatabase(), "src_thrift");
srcThrift.setInputFormatClass(SequenceFileInputFormat.class.getName());
srcThrift.setOutputFormatClass(SequenceFileOutputFormat.class.getName());
srcThrift.setSerializationLib(ThriftDeserializer.class.getName());
srcThrift.setSerdeParam(Constants.SERIALIZATION_CLASS, Complex.class
.getName());
srcThrift.setSerdeParam(Constants.SERIALIZATION_FORMAT,
TBinaryProtocol.class.getName());
db.createTable(srcThrift);
存储格式是SequenceFile,serde是ThriftDeserializer,serde的两个属性是SERIALIZATION_CLASS和SERIALIZATION_FORMAT。创建表的时候需要知道表的字段,这里没有写。但是在(Constants.SERIALIZATION_CLASS, Complex.class.getName())中已经定义了。
那么看org.apache.hadoop.hive.serde2.thrift.test.Complex的定义:
public int aint;
public static final int AINT = 1;
public String aString;
public static final int ASTRING = 2;
public List<Integer> lint;
public static final int LINT = 3;
public List<String> lString;
public static final int LSTRING = 4;
public List<IntString> lintString;
public static final int LINTSTRING = 5;
public Map<String,String> mStringString;
public static final int MSTRINGSTRING = 6;
IntString的定义:
public int myint;
public static final int MYINT = 1;
public String myString;
public static final int MYSTRING = 2;
public int underscore_int;
public static final int UNDERSCORE_INT = 3;
可以还原src_thrift的字段:
建表语句是:
hive> create table src_thrift(aint int, aString string,lint array<int>, lString array<string>, lintString array<struct<myint:int, mString:string, underscore_int:int>>, mStringString map<string, string>)
> row format serde 'org.apache.hadoop.hive.serde2.thrift.ThriftDeserializer' with serdeproperties("serialization.class"="org.apache.hadoop.hive.serde2.thrift.test.Complex", "serialization.format"="org.apache.thrift.protocol.TBinaryProtocol")
> stored as sequencefile;
OK
Time taken: 0.462 seconds
load数据是:
hive> load data local inpath 'src/data/files/complex.seq' into table src_thrift;
Copying data from file:/home/tianzhao/hive/hive-1.1.2/src/data/files/complex.seq
Loading data to table src_thrift
OK
Time taken: 0.286 seconds
查看数据是:
hive> select * from src_thrift limit 2;
OK
1712634731 record_0 [0,0,0] ["0","0","0"] [{"myint":0,"mystring":"0","underscore_int":0}] {"key_0":"value_0"}
465985200 record_1 [1,2,3] ["10","100","1000"] [{"myint":1,"mystring":"1","underscore_int":1}] {"key_1":"value_1"}
Time taken: 0.34 seconds
查看表的信息是:
hive> desc src_thrift;
OK
aint int from deserializer
astring string from deserializer
lint array<int> from deserializer
lstring array<string> from deserializer
lintstring array<org.apache.hadoop.hive.serde2.thrift.test.IntString> from deserializer
mstringstring map<string,string> from deserializer
Time taken: 0.085 seconds
运行case_sensitivity.q里面的一个语句:(截取了一部分)
hive> from src_thrift SELECT src_Thrift.LINT[1], src_thrift.lintstring[0].MYSTRING where src_thrift.liNT[0] > 0;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201105281127_0001, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201105281127_0001
Kill Command = /home/tianzhao/hive/hadoop-0.20.2/bin/../bin/hadoop job -Dmapred.job.tracker=localhost:54311 -kill job_201105281127_0001
2011-05-28 12:04:52,869 Stage-1 map = 0%, reduce = 0%
2011-05-28 12:04:55,921 Stage-1 map = 100%, reduce = 0%
2011-05-28 12:04:58,962 Stage-1 map = 100%, reduce = 100%
Ended Job = job_201105281127_0001
OK
2 1
4 8
6 27
8 64
10 125
12 216
14 343
16 512
18 729
Time taken: 12.147 seconds
src表
org.apache.hadoop.hive.ql.QTestUtil.createSources() :
LinkedList<String> cols = new LinkedList<String>();
cols.add("key");
cols.add("value");
for (String tname : new String[] {"src", "src1"}) {
db.createTable(tname, cols, null, TextInputFormat.class,
IgnoreKeyTextOutputFormat.class);
}
src、src1两表都是两个columns:
create table src(key string, value string);
load数据:
// load the input data into the src table
fpath = new Path(testFiles, "kv1.txt");
newfpath = new Path(tmppath, "kv1.txt");
fs.copyFromLocalFile(false, true, fpath, newfpath);
// db.loadTable(newfpath, "src", false);
runLoadCmd("LOAD DATA INPATH '" + newfpath.toString() + "' INTO TABLE src");
load data local inpath 'src/data/files/kv1.txt' into table src;
有了这些东西,我们可以手工测试使用src/ql/src/test/queries下面的各个UT里面的语句。
目录下面的文件 hive-0.7.0\src\ql\src\test\queries\clientpositive
(1) add_part_exist.q 创建有partition的表,增加partition,显示partition
CREATE TABLE add_part_test (key STRING, value STRING) PARTITIONED BY (ds STRING);
SHOW PARTITIONS add_part_test;
ALTER TABLE add_part_test ADD PARTITION (ds='2010-01-01');
SHOW PARTITIONS add_part_test;
ALTER TABLE add_part_test ADD IF NOT EXISTS PARTITION (ds='2010-01-01');
SHOW PARTITIONS add_part_test;
ALTER TABLE add_part_test ADD IF NOT EXISTS PARTITION (ds='2010-01-02');
SHOW PARTITIONS add_part_test;
ALTER TABLE add_part_test ADD IF NOT EXISTS PARTITION (ds='2010-01-01') PARTITION (ds='2010-01-02') PARTITION (ds='2010-01-03');
SHOW PARTITIONS add_part_test;
DROP TABLE add_part_test;
SHOW TABLES;
desc extended add_part_test partition(ds='2010-01-03');
需要注意的是:
SHOW TABLES , SHOW PARTITIONS 时,PARTITIONS和TABLES 都需要加上S,复数形式.
关于'2010-01-02'的一些困惑。
hive> alter table add_part_test add if not exists partition(ds=2010-01-02);
FAILED: Parse Error: line 1:61 mismatched input '-' expecting ) in add partition statement
hive> alter table add_part_test add if not exists partition(ds='2010-01-02');
OK
Time taken: 0.294 seconds
hive> alter table add_part_test add if not exists partition(ds=2011102);
OK
Time taken: 0.178 seconds
hive> show partitions add_part_test;
OK
ds=2010-01-01
ds=2010-01-02
ds=2011102
Time taken: 0.057 seconds
(2)alter1.q 修改一个表的属性,表的serde以及serde属性。修改column信息。
(3)create_default_prop.q 创建表
create_1.q 创建表的操作
create_*
DESCRIBE table3;
DESCRIBE EXTENDED table3;
DESC table3;
DESC EXTENDED table3;
是一样的,大小写不区分。
desc 0.7版本支持,0.5不支持
hive.map.aggr hive.groupby.skewindata (倾斜)
https://issues.apache.org/jira/browse/HIVE-223
src/ql/src/test/queries下面是测试用例,clientpositive是运行成功的用例,clientnegative是运行失败,返回非0的用例。
src/ql/src/test/results 下面是测试用例对应的输出结果。如src/ql/src/test/queries/case_sensitivity.q对应的输出结果是src/ql/src/test/results/case_sensitivity.q.out
测试会运行case_sensitivity.q产生的输出放入build/ql/test/logs/clientpositive/case_sensitivity.q.out文件中,然后通过diff比对build/ql/test/logs/clientpositive/case_sensitivity.q.out src/ql/src/test/results/case_sensitivity.q.out两个文件,如果相同则测试通过,如果不同则测试失败。
CREATE TABLE sales (name STRING, id INT)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t’;
CREATE TABLE things (id INT, name STRING) partitioned by (ds string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t’;
load data local inpath 'examples/files/sales.txt' INTO TABLE sales;
load data local inpath 'examples/files/things.txt' INTO TABLE things partition(ds='2011-10-23’);
load data local inpath 'examples/files/things2.txt' INTO TABLE things partition(ds='2011-10-24’);
SELECT name,id FROM sales ORDER BY name ASC, id ASC;
SELECT id,name FROM things ORDER BY id ASC, name ASC;
SELECT name,id FROM sales LEFT SEMI JOIN things ON (sales.id = things.id) ORDER BY name ASC, id ASC;
测试前会事先create一些table,并load一些数据进去。
比如:case_sensitivity.q 的代码是
tianzhao@tianzhao-VirtualBox:~/hive/hive-1.1.2/src/ql/src/test/queries/clientpositive$ cat case_sensitivity.q
CREATE TABLE DEST1(Key INT, VALUE STRING) STORED AS TEXTFILE;
EXPLAIN
FROM SRC_THRIFT
INSERT OVERWRITE TABLE dest1 SELECT src_Thrift.LINT[1], src_thrift.lintstring[0].MYSTRING where src_thrift.liNT[0] > 0;
FROM SRC_THRIFT
INSERT OVERWRITE TABLE dest1 SELECT src_Thrift.LINT[1], src_thrift.lintstring[0].MYSTRING where src_thrift.liNT[0] > 0;
SELECT DEST1.* FROM Dest1;
/// 上面的HSQL中并没有创建SRC_THRIFT 表,但是能够用,那是因为在执行该文件的语句前已经创建好了这个表。
代码在org.apache.hadoop.hive.ql.QTestUtil.java中,它的方法init、createSources等。
从ant test -Dtestcase=TestCliDriver -Dqfile=case_sensitivity.q -Dtest.silent=false 在终端的输出也可以看出:
[junit] PREHOOK: query: LOAD DATA LOCAL INPATH '/home/tianzhao/apache/hive-trunk/data/files/kv1.txt' OVERWRITE INTO TABLE srcpart PARTITION (ds='2008-04-08',hr='11')
[junit] PREHOOK: type: LOAD
[junit] PREHOOK: Output: default@srcpart
[junit] Copying data from file:/home/tianzhao/apache/hive-trunk/data/files/kv1.txt
[junit] Copying file: file:/home/tianzhao/apache/hive-trunk/data/files/kv1.txt
[junit] Loading data to table default.srcpart partition (ds=2008-04-08, hr=11)
[junit] POSTHOOK: query: LOAD DATA LOCAL INPATH '/home/tianzhao/apache/hive-trunk/data/files/kv1.txt' OVERWRITE INTO TABLE srcpart PARTITION (ds='2008-04-08',hr='11')
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@srcpart
[junit] POSTHOOK: Output: default@srcpart@ds=2008-04-08/hr=11
[junit] OK
[junit] PREHOOK: query: LOAD DATA LOCAL INPATH '/home/tianzhao/apache/hive-trunk/data/files/kv1.txt' OVERWRITE INTO TABLE srcpart PARTITION (ds='2008-04-08',hr='12')
[junit] PREHOOK: type: LOAD
[junit] PREHOOK: Output: default@srcpart
[junit] Copying data from file:/home/tianzhao/apache/hive-trunk/data/files/kv1.txt
[junit] Copying file: file:/home/tianzhao/apache/hive-trunk/data/files/kv1.txt
[junit] Loading data to table default.srcpart partition (ds=2008-04-08, hr=12)
[junit] POSTHOOK: query: LOAD DATA LOCAL INPATH '/home/tianzhao/apache/hive-trunk/data/files/kv1.txt' OVERWRITE INTO TABLE srcpart PARTITION (ds='2008-04-08',hr='12')
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@srcpart
[junit] POSTHOOK: Output: default@srcpart@ds=2008-04-08/hr=12
[junit] OK
[junit] PREHOOK: query: LOAD DATA LOCAL INPATH '/home/tianzhao/apache/hive-trunk/data/files/kv1.txt' OVERWRITE INTO TABLE srcpart PARTITION (ds='2008-04-09',hr='11')
[junit] PREHOOK: type: LOAD
[junit] PREHOOK: Output: default@srcpart
[junit] Copying data from file:/home/tianzhao/apache/hive-trunk/data/files/kv1.txt
[junit] Copying file: file:/home/tianzhao/apache/hive-trunk/data/files/kv1.txt
[junit] Loading data to table default.srcpart partition (ds=2008-04-09, hr=11)
[junit] POSTHOOK: query: LOAD DATA LOCAL INPATH '/home/tianzhao/apache/hive-trunk/data/files/kv1.txt' OVERWRITE INTO TABLE srcpart PARTITION (ds='2008-04-09',hr='11')
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@srcpart
[junit] POSTHOOK: Output: default@srcpart@ds=2008-04-09/hr=11
[junit] OK
[junit] PREHOOK: query: LOAD DATA LOCAL INPATH '/home/tianzhao/apache/hive-trunk/data/files/kv1.txt' OVERWRITE INTO TABLE srcpart PARTITION (ds='2008-04-09',hr='12')
[junit] PREHOOK: type: LOAD
[junit] PREHOOK: Output: default@srcpart
[junit] Copying data from file:/home/tianzhao/apache/hive-trunk/data/files/kv1.txt
[junit] Copying file: file:/home/tianzhao/apache/hive-trunk/data/files/kv1.txt
[junit] Loading data to table default.srcpart partition (ds=2008-04-09, hr=12)
[junit] POSTHOOK: query: LOAD DATA LOCAL INPATH '/home/tianzhao/apache/hive-trunk/data/files/kv1.txt' OVERWRITE INTO TABLE srcpart PARTITION (ds='2008-04-09',hr='12')
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@srcpart
[junit] POSTHOOK: Output: default@srcpart@ds=2008-04-09/hr=12
[junit] OK
[junit] PREHOOK: query: CREATE TABLE srcbucket(key int, value string) CLUSTERED BY (key) INTO 2 BUCKETS STORED AS TEXTFILE
[junit] PREHOOK: type: CREATETABLE
[junit] POSTHOOK: query: CREATE TABLE srcbucket(key int, value string) CLUSTERED BY (key) INTO 2 BUCKETS STORED AS TEXTFILE
[junit] POSTHOOK: type: CREATETABLE
[junit] POSTHOOK: Output: default@srcbucket
[junit] OK
[junit] PREHOOK: query: LOAD DATA LOCAL INPATH '/home/tianzhao/apache/hive-trunk/data/files/srcbucket0.txt' INTO TABLE srcbucket
[junit] PREHOOK: type: LOAD
[junit] PREHOOK: Output: default@srcbucket
[junit] Copying data from file:/home/tianzhao/apache/hive-trunk/data/files/srcbucket0.txt
[junit] Copying file: file:/home/tianzhao/apache/hive-trunk/data/files/srcbucket0.txt
[junit] Loading data to table default.srcbucket
[junit] POSTHOOK: query: LOAD DATA LOCAL INPATH '/home/tianzhao/apache/hive-trunk/data/files/srcbucket0.txt' INTO TABLE srcbucket
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@srcbucket
[junit] OK
[junit] PREHOOK: query: LOAD DATA LOCAL INPATH '/home/tianzhao/apache/hive-trunk/data/files/srcbucket1.txt' INTO TABLE srcbucket
[junit] PREHOOK: type: LOAD
[junit] PREHOOK: Output: default@srcbucket
[junit] Copying data from file:/home/tianzhao/apache/hive-trunk/data/files/srcbucket1.txt
[junit] Copying file: file:/home/tianzhao/apache/hive-trunk/data/files/srcbucket1.txt
[junit] Loading data to table default.srcbucket
[junit] POSTHOOK: query: LOAD DATA LOCAL INPATH '/home/tianzhao/apache/hive-trunk/data/files/srcbucket1.txt' INTO TABLE srcbucket
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@srcbucket
[junit] OK
[junit] PREHOOK: query: CREATE TABLE srcbucket2(key int, value string) CLUSTERED BY (key) INTO 4 BUCKETS STORED AS TEXTFILE
[junit] PREHOOK: type: CREATETABLE
[junit] POSTHOOK: query: CREATE TABLE srcbucket2(key int, value string) CLUSTERED BY (key) INTO 4 BUCKETS STORED AS TEXTFILE
[junit] POSTHOOK: type: CREATETABLE
[junit] POSTHOOK: Output: default@srcbucket2
[junit] OK
[junit] PREHOOK: query: LOAD DATA LOCAL INPATH '/home/tianzhao/apache/hive-trunk/data/files/srcbucket20.txt' INTO TABLE srcbucket2
[junit] PREHOOK: type: LOAD
[junit] PREHOOK: Output: default@srcbucket2
[junit] Copying data from file:/home/tianzhao/apache/hive-trunk/data/files/srcbucket20.txt
[junit] Copying file: file:/home/tianzhao/apache/hive-trunk/data/files/srcbucket20.txt
[junit] Loading data to table default.srcbucket2
[junit] POSTHOOK: query: LOAD DATA LOCAL INPATH '/home/tianzhao/apache/hive-trunk/data/files/srcbucket20.txt' INTO TABLE srcbucket2
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@srcbucket2
[junit] OK
[junit] PREHOOK: query: LOAD DATA LOCAL INPATH '/home/tianzhao/apache/hive-trunk/data/files/srcbucket21.txt' INTO TABLE srcbucket2
[junit] PREHOOK: type: LOAD
[junit] PREHOOK: Output: default@srcbucket2
[junit] Copying data from file:/home/tianzhao/apache/hive-trunk/data/files/srcbucket21.txt
[junit] Copying file: file:/home/tianzhao/apache/hive-trunk/data/files/srcbucket21.txt
[junit] Loading data to table default.srcbucket2
[junit] POSTHOOK: query: LOAD DATA LOCAL INPATH '/home/tianzhao/apache/hive-trunk/data/files/srcbucket21.txt' INTO TABLE srcbucket2
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@srcbucket2
[junit] OK
[junit] PREHOOK: query: LOAD DATA LOCAL INPATH '/home/tianzhao/apache/hive-trunk/data/files/srcbucket22.txt' INTO TABLE srcbucket2
[junit] PREHOOK: type: LOAD
[junit] PREHOOK: Output: default@srcbucket2
[junit] Copying data from file:/home/tianzhao/apache/hive-trunk/data/files/srcbucket22.txt
[junit] Copying file: file:/home/tianzhao/apache/hive-trunk/data/files/srcbucket22.txt
[junit] Loading data to table default.srcbucket2
[junit] POSTHOOK: query: LOAD DATA LOCAL INPATH '/home/tianzhao/apache/hive-trunk/data/files/srcbucket22.txt' INTO TABLE srcbucket2
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@srcbucket2
[junit] OK
[junit] PREHOOK: query: LOAD DATA LOCAL INPATH '/home/tianzhao/apache/hive-trunk/data/files/srcbucket23.txt' INTO TABLE srcbucket2
[junit] PREHOOK: type: LOAD
[junit] PREHOOK: Output: default@srcbucket2
[junit] Copying data from file:/home/tianzhao/apache/hive-trunk/data/files/srcbucket23.txt
[junit] Copying file: file:/home/tianzhao/apache/hive-trunk/data/files/srcbucket23.txt
[junit] Loading data to table default.srcbucket2
[junit] POSTHOOK: query: LOAD DATA LOCAL INPATH '/home/tianzhao/apache/hive-trunk/data/files/srcbucket23.txt' INTO TABLE srcbucket2
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@srcbucket2
[junit] OK
[junit] PREHOOK: query: LOAD DATA LOCAL INPATH '/home/tianzhao/apache/hive-trunk/data/files/kv1.txt' INTO TABLE src
[junit] PREHOOK: type: LOAD
[junit] PREHOOK: Output: default@src
[junit] Copying data from file:/home/tianzhao/apache/hive-trunk/data/files/kv1.txt
[junit] Copying file: file:/home/tianzhao/apache/hive-trunk/data/files/kv1.txt
[junit] Loading data to table default.src
[junit] POSTHOOK: query: LOAD DATA LOCAL INPATH '/home/tianzhao/apache/hive-trunk/data/files/kv1.txt' INTO TABLE src
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@src
[junit] OK
[junit] PREHOOK: query: LOAD DATA LOCAL INPATH '/home/tianzhao/apache/hive-trunk/data/files/kv3.txt' INTO TABLE src1
[junit] PREHOOK: type: LOAD
[junit] PREHOOK: Output: default@src1
[junit] Copying data from file:/home/tianzhao/apache/hive-trunk/data/files/kv3.txt
[junit] Copying file: file:/home/tianzhao/apache/hive-trunk/data/files/kv3.txt
[junit] Loading data to table default.src1
[junit] POSTHOOK: query: LOAD DATA LOCAL INPATH '/home/tianzhao/apache/hive-trunk/data/files/kv3.txt' INTO TABLE src1
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@src1
[junit] OK
[junit] PREHOOK: query: LOAD DATA LOCAL INPATH '/home/tianzhao/apache/hive-trunk/data/files/kv1.seq' INTO TABLE src_sequencefile
[junit] PREHOOK: type: LOAD
[junit] PREHOOK: Output: default@src_sequencefile
[junit] Copying data from file:/home/tianzhao/apache/hive-trunk/data/files/kv1.seq
[junit] Copying file: file:/home/tianzhao/apache/hive-trunk/data/files/kv1.seq
[junit] Loading data to table default.src_sequencefile
[junit] POSTHOOK: query: LOAD DATA LOCAL INPATH '/home/tianzhao/apache/hive-trunk/data/files/kv1.seq' INTO TABLE src_sequencefile
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@src_sequencefile
[junit] OK
[junit] PREHOOK: query: LOAD DATA LOCAL INPATH '/home/tianzhao/apache/hive-trunk/data/files/complex.seq' INTO TABLE src_thrift
[junit] PREHOOK: type: LOAD
[junit] PREHOOK: Output: default@src_thrift
[junit] Copying data from file:/home/tianzhao/apache/hive-trunk/data/files/complex.seq
[junit] Copying file: file:/home/tianzhao/apache/hive-trunk/data/files/complex.seq
[junit] Loading data to table default.src_thrift
[junit] POSTHOOK: query: LOAD DATA LOCAL INPATH '/home/tianzhao/apache/hive-trunk/data/files/complex.seq' INTO TABLE src_thrift
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@src_thrift
[junit] OK
[junit] PREHOOK: query: LOAD DATA LOCAL INPATH '/home/tianzhao/apache/hive-trunk/data/files/json.txt' INTO TABLE src_json
[junit] PREHOOK: type: LOAD
[junit] PREHOOK: Output: default@src_json
[junit] Copying data from file:/home/tianzhao/apache/hive-trunk/data/files/json.txt
[junit] Copying file: file:/home/tianzhao/apache/hive-trunk/data/files/json.txt
[junit] Loading data to table default.src_json
[junit] POSTHOOK: query: LOAD DATA LOCAL INPATH '/home/tianzhao/apache/hive-trunk/data/files/json.txt' INTO TABLE src_json
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@src_json
[junit] OK
上面显示了创建的表和load的数据。
测试case_sensitivity.q的显示是:
[junit] Running org.apache.hadoop.hive.cli.TestCliDriver
[junit] Begin query: case_sensitivity.q
[junit] Deleted file:/home/tianzhao/apache/hive-trunk/build/ql/test/data/warehouse/dest1
[junit] diff -a -I file: -I pfile: -I hdfs: -I /tmp/ -I invalidscheme: -I lastUpdateTime -I lastAccessTime -I [Oo]wner -I CreateTime -I LastAccessTime -I Location -I transient_lastDdlTime -I last_modified_ -I java.lang.RuntimeException -I at org -I at sun -I at java -I at junit -I Caused by: -I LOCK_QUERYID: -I grantTime -I [.][.][.] [0-9]* more -I USING 'java -cp /home/tianzhao/apache/hive-trunk/build/ql/test/logs/clientpositive/case_sensitivity.q.out /home/tianzhao/apache/hive-trunk/ql/src/test/results/clientpositive/case_sensitivity.q.out
[junit] Done query: case_sensitivity.q
[junit] Cleaning up TestCliDriver
生成的结果在build/ql/test/logs/clientpositive/case_sensitivity.q.out 中,通过diff比对它与ql/src/test/results/clientpositive/case_sensitivity.q.out是否相同来判断该UT是否通过。
我们来看如何手工创建src_thrift表,load数据,并执行case_sensitivity.q中的HSQL。
org.apache.hadoop.hive.ql.QTestUtil.createSources()中创建src_thrift表的语句是:
Table srcThrift = new Table(db.getCurrentDatabase(), "src_thrift");
srcThrift.setInputFormatClass(SequenceFileInputFormat.class.getName());
srcThrift.setOutputFormatClass(SequenceFileOutputFormat.class.getName());
srcThrift.setSerializationLib(ThriftDeserializer.class.getName());
srcThrift.setSerdeParam(Constants.SERIALIZATION_CLASS, Complex.class
.getName());
srcThrift.setSerdeParam(Constants.SERIALIZATION_FORMAT,
TBinaryProtocol.class.getName());
db.createTable(srcThrift);
存储格式是SequenceFile,serde是ThriftDeserializer,serde的两个属性是SERIALIZATION_CLASS和SERIALIZATION_FORMAT。创建表的时候需要知道表的字段,这里没有写。但是在(Constants.SERIALIZATION_CLASS, Complex.class.getName())中已经定义了。
那么看org.apache.hadoop.hive.serde2.thrift.test.Complex的定义:
public int aint;
public static final int AINT = 1;
public String aString;
public static final int ASTRING = 2;
public List<Integer> lint;
public static final int LINT = 3;
public List<String> lString;
public static final int LSTRING = 4;
public List<IntString> lintString;
public static final int LINTSTRING = 5;
public Map<String,String> mStringString;
public static final int MSTRINGSTRING = 6;
IntString的定义:
public int myint;
public static final int MYINT = 1;
public String myString;
public static final int MYSTRING = 2;
public int underscore_int;
public static final int UNDERSCORE_INT = 3;
可以还原src_thrift的字段:
建表语句是:
hive> create table src_thrift(aint int, aString string,lint array<int>, lString array<string>, lintString array<struct<myint:int, mString:string, underscore_int:int>>, mStringString map<string, string>)
> row format serde 'org.apache.hadoop.hive.serde2.thrift.ThriftDeserializer' with serdeproperties("serialization.class"="org.apache.hadoop.hive.serde2.thrift.test.Complex", "serialization.format"="org.apache.thrift.protocol.TBinaryProtocol")
> stored as sequencefile;
OK
Time taken: 0.462 seconds
load数据是:
hive> load data local inpath 'src/data/files/complex.seq' into table src_thrift;
Copying data from file:/home/tianzhao/hive/hive-1.1.2/src/data/files/complex.seq
Loading data to table src_thrift
OK
Time taken: 0.286 seconds
查看数据是:
hive> select * from src_thrift limit 2;
OK
1712634731 record_0 [0,0,0] ["0","0","0"] [{"myint":0,"mystring":"0","underscore_int":0}] {"key_0":"value_0"}
465985200 record_1 [1,2,3] ["10","100","1000"] [{"myint":1,"mystring":"1","underscore_int":1}] {"key_1":"value_1"}
Time taken: 0.34 seconds
查看表的信息是:
hive> desc src_thrift;
OK
aint int from deserializer
astring string from deserializer
lint array<int> from deserializer
lstring array<string> from deserializer
lintstring array<org.apache.hadoop.hive.serde2.thrift.test.IntString> from deserializer
mstringstring map<string,string> from deserializer
Time taken: 0.085 seconds
运行case_sensitivity.q里面的一个语句:(截取了一部分)
hive> from src_thrift SELECT src_Thrift.LINT[1], src_thrift.lintstring[0].MYSTRING where src_thrift.liNT[0] > 0;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201105281127_0001, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201105281127_0001
Kill Command = /home/tianzhao/hive/hadoop-0.20.2/bin/../bin/hadoop job -Dmapred.job.tracker=localhost:54311 -kill job_201105281127_0001
2011-05-28 12:04:52,869 Stage-1 map = 0%, reduce = 0%
2011-05-28 12:04:55,921 Stage-1 map = 100%, reduce = 0%
2011-05-28 12:04:58,962 Stage-1 map = 100%, reduce = 100%
Ended Job = job_201105281127_0001
OK
2 1
4 8
6 27
8 64
10 125
12 216
14 343
16 512
18 729
Time taken: 12.147 seconds
src表
org.apache.hadoop.hive.ql.QTestUtil.createSources() :
LinkedList<String> cols = new LinkedList<String>();
cols.add("key");
cols.add("value");
for (String tname : new String[] {"src", "src1"}) {
db.createTable(tname, cols, null, TextInputFormat.class,
IgnoreKeyTextOutputFormat.class);
}
src、src1两表都是两个columns:
create table src(key string, value string);
load数据:
// load the input data into the src table
fpath = new Path(testFiles, "kv1.txt");
newfpath = new Path(tmppath, "kv1.txt");
fs.copyFromLocalFile(false, true, fpath, newfpath);
// db.loadTable(newfpath, "src", false);
runLoadCmd("LOAD DATA INPATH '" + newfpath.toString() + "' INTO TABLE src");
load data local inpath 'src/data/files/kv1.txt' into table src;
有了这些东西,我们可以手工测试使用src/ql/src/test/queries下面的各个UT里面的语句。
目录下面的文件 hive-0.7.0\src\ql\src\test\queries\clientpositive
(1) add_part_exist.q 创建有partition的表,增加partition,显示partition
CREATE TABLE add_part_test (key STRING, value STRING) PARTITIONED BY (ds STRING);
SHOW PARTITIONS add_part_test;
ALTER TABLE add_part_test ADD PARTITION (ds='2010-01-01');
SHOW PARTITIONS add_part_test;
ALTER TABLE add_part_test ADD IF NOT EXISTS PARTITION (ds='2010-01-01');
SHOW PARTITIONS add_part_test;
ALTER TABLE add_part_test ADD IF NOT EXISTS PARTITION (ds='2010-01-02');
SHOW PARTITIONS add_part_test;
ALTER TABLE add_part_test ADD IF NOT EXISTS PARTITION (ds='2010-01-01') PARTITION (ds='2010-01-02') PARTITION (ds='2010-01-03');
SHOW PARTITIONS add_part_test;
DROP TABLE add_part_test;
SHOW TABLES;
desc extended add_part_test partition(ds='2010-01-03');
需要注意的是:
SHOW TABLES , SHOW PARTITIONS 时,PARTITIONS和TABLES 都需要加上S,复数形式.
关于'2010-01-02'的一些困惑。
hive> alter table add_part_test add if not exists partition(ds=2010-01-02);
FAILED: Parse Error: line 1:61 mismatched input '-' expecting ) in add partition statement
hive> alter table add_part_test add if not exists partition(ds='2010-01-02');
OK
Time taken: 0.294 seconds
hive> alter table add_part_test add if not exists partition(ds=2011102);
OK
Time taken: 0.178 seconds
hive> show partitions add_part_test;
OK
ds=2010-01-01
ds=2010-01-02
ds=2011102
Time taken: 0.057 seconds
(2)alter1.q 修改一个表的属性,表的serde以及serde属性。修改column信息。
(3)create_default_prop.q 创建表
create_1.q 创建表的操作
create_*
DESCRIBE table3;
DESCRIBE EXTENDED table3;
DESC table3;
DESC EXTENDED table3;
是一样的,大小写不区分。
desc 0.7版本支持,0.5不支持
hive.map.aggr hive.groupby.skewindata (倾斜)
https://issues.apache.org/jira/browse/HIVE-223
发表评论
-
hive rename table name
2013-09-18 14:28 2600hive rename tablename hive re ... -
hive的distribute by如何partition long型的数据
2013-08-20 10:15 2480有用户问:hive的distribute by分桶是怎么分 ... -
hive like vs rlike vs regexp
2013-04-11 18:53 11216like vs rlike vs regexp r ... -
hive sql where条件很简单,但是太多
2012-07-18 15:51 8744insert overwrite table aaaa ... -
insert into时(string->bigint)自动类型转换
2012-06-14 12:30 8281原表src: hive> desc src; ... -
通过复合结构来优化udf的调用
2012-05-11 14:07 1210select split("accba&quo ... -
RegexSerDe
2012-03-14 09:58 1552官方示例在: https://cwiki.apache.or ... -
Hive 的 OutputCommitter
2012-01-30 19:44 1820Hive 的 OutputCommitter publi ... -
hive LATERAL VIEW 行转列
2011-11-09 14:49 5452drop table lateralview; create ... -
hive complex type
2011-11-08 19:56 1371数据: 1,100|3,20|2,70|5,100 建表: ... -
hive转义字符
2011-10-25 16:41 6245CREATE TABLE escape (id STRING, ... -
hive 两个不同类型的columns进行比较
2011-09-19 13:46 3039select case when "ab1234&q ... -
lateral view
2011-09-18 04:04 0lateral view与udtf相关 -
udf 中获得 FileSystem
2011-09-14 10:28 0在udf中获得FileSystem,需要获得知道fs.defa ... -
hive union mapjoin
2011-09-09 16:29 0union union.q union2.q ... -
hive eclipse
2011-09-08 17:42 0eclipse-templates$ vi .classpat ... -
hive join filter
2011-09-07 23:05 0join16.q.out hive.optimize.ppd ... -
hive limit
2011-09-07 21:02 0limit 关键字: input4_limit.q.out ... -
hive convertMapJoin MapJoinProcessor
2011-09-06 21:17 0join25.q join26 ... -
hive hive.merge.mapfiles hive.merge.mapredfiles
2011-09-06 19:14 0HiveConf: HIVEMERGEMAPFILES ...
相关推荐
《Hive Testbench与TPCDS:大数据性能测试与99条SQL脚本解析》 在大数据处理领域,Hive作为一个分布式数据仓库系统,被广泛应用于海量数据的存储和分析。而TPCDS(Transaction Processing Performance Council ...
`Hive Testbench` 是一个专门针对 Hive 进行性能测试的开源项目,由 Hortonworks 维护,它为开发人员和数据工程师提供了评估和优化 Hive 性能的平台。 在编译 `Hive Testbench` 之前,我们需要了解以下关键知识点:...
在大数据处理领域,Hive是一个非常重要的工具,它提供了一个基于Hadoop的数据仓库基础设施,用于数据查询、分析和管理大规模数据集。本教程将详细讲解如何在Linux环境下安装Hive客户端,以便进行数据操作和分析。 ...
标题中的“test_hive.rar_ThriftHive.php_hive_hive php_php hive_php hive Thr”暗示了我们讨论的主题是关于使用PHP通过Thrift接口与Hive数据仓库进行交互的一个库,具体来说,是一个名为“ThriftHive.php”的文件...
标题中的“Hive Pkg STGY TACT TEST UTLpkg”可能是指一个关于Hive的打包、策略、战术、测试和实用工具的项目或者模块。Hive是大数据处理领域的一个重要组件,主要用于结构化数据的查询和分析。在这个场景中,“Pkg...
在大数据处理领域,Apache Hive是一个基于Hadoop的数据仓库工具,它允许用户使用SQL(HQL,Hive Query Language)查询存储在Hadoop集群中的大型数据集。Hive JDBC(Java Database Connectivity)是Hive提供的一种...
"HIVE安装及详解" HIVE是一种基于Hadoop的数据仓库工具,主要用于处理和分析大规模数据。下面是关于HIVE的安装及详解。 HIVE基本概念 HIVE是什么?HIVE是一种数据仓库工具,主要用于处理和分析大规模数据。它将...
在大数据处理领域,Hive是一个基于Hadoop的数据仓库工具,它允许用户使用SQL(HQL,Hive Query Language)查询和管理存储在Hadoop分布式文件系统(HDFS)中的大量结构化数据。Hive 1.1.0是Hive的一个版本,提供了...
### Hive的一些报错及解决方法 #### 一、Hive连接报错 在更换网络接口卡后重新启动服务时,可能会遇到Hive连接失败的问题。此类问题通常与元数据存储中的异常有关,例如存在多个版本的元数据或者端口被其他服务...
hdfs dfs -mv /user/hive/radius/test/province_id=105 /user/hive/radius/test/province_id=hunan; ``` 这条命令会将分区`province_id=105`重命名为`province_id=hunan`。 3. **修复Hive表的元数据信息**:在...
在大数据处理领域,Hive是一个基于Hadoop的数据仓库工具,它可以将结构化的数据文件映射为一张数据库表,并提供SQL查询功能,使得用户可以使用SQL语句来处理存储在Hadoop分布式文件系统(HDFS)上的大数据。...
04_HIve中的case when、cast及unix_timestamp的使用 05_复杂日志分析-需求分析 06_复杂日志分析-需求字段讲解及过滤 07_复杂日志分析-字段提取及临时表的创建 08_复杂日志分析-指标结果的分析实现 09_Hive中数据文件...
5. **测试连接**:保存配置后,点击“Test Connection”按钮验证是否能够成功连接到Hive服务器。如果一切正常,你应该能看到“Connection successful”这样的提示。 6. **使用Hive**:现在你可以使用DataGrip的SQL...
Hive是Apache Hadoop生态系统中的一个数据仓库工具,它允许用户使用SQL(称为HQL,Hive Query Language)对存储在HDFS上的大型数据集进行分析。Hive 2.1.1是Hive的一个重要版本,而CDH6.3.2是Cloudera发行的商业版...
在大数据领域,Apache Ambari 是一个用于 Hadoop 集群管理和监控的开源工具,而 Hive 是一个基于 Hadoop 的数据仓库系统,用于处理和分析大规模数据集。本话题聚焦于如何在 Ambari 环境下将 Hive 3.0 升级到 Hive ...
使用hive3.1.2和spark3.0.0配置hive on spark的时候,发现官方下载的hive3.1.2和spark3.0.0不兼容,hive3.1.2对应的版本是spark2.3.0,而spark3.0.0对应的hadoop版本是hadoop2.6或hadoop2.7。 所以,如果想要使用高...
Test_user = pd.read_csv(StringIO(unicode(output1.read(),'utf-8')), sep="\t", header=0) ``` 如果有多个查询,可以按类似方式逐个执行并合并结果。 此外,为了显示查询结果的表头,可以在Hive脚本中添加`...
hive-jdbc
hive+hadoop配置文件hive+hadoop配置文件hive+hadoop配置文件hive+hadoop配置文件hive+hadoop配置文件hive+hadoop配置文件hive+hadoop配置文件hive+hadoop配置文件hive+hadoop配置文件hive+hadoop配置文件hive+...
Hive表生成工具,Hive表生成工具Hive表生成工具