之前部门实现row_number是使用的transform,我觉得用UDF实现后,平时的使用会更方便,免去了transform相对繁琐的语法。
<wbr><wbr><wbr>用到的测试表为:</wbr></wbr></wbr>
hive> desc row_number_test;
OK
id1<wbr><wbr><wbr><wbr>int<br>
id2<wbr><wbr><wbr><wbr>string<br>
age<wbr><wbr><wbr><wbr>int<br>
score<wbr><wbr> double<br>
name<wbr><wbr><wbr>string</wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr>
<wbr></wbr>
hive> select * from row_number_test;
OK
2<wbr><wbr><wbr><wbr><wbr><wbr>t04<wbr><wbr><wbr><wbr>25<wbr><wbr><wbr><wbr><wbr>60.0<wbr><wbr><wbr>youlia<br>
1<wbr><wbr><wbr><wbr><wbr><wbr>t01<wbr><wbr><wbr><wbr>20<wbr><wbr><wbr><wbr><wbr>85.0<wbr><wbr><wbr>liujiannan<br>
1<wbr><wbr><wbr><wbr><wbr><wbr>t02<wbr><wbr><wbr><wbr>24<wbr><wbr><wbr><wbr><wbr>70.0<wbr><wbr><wbr>zengqiu<br>
2<wbr><wbr><wbr><wbr><wbr><wbr>t03<wbr><wbr><wbr><wbr>30<wbr><wbr><wbr><wbr><wbr>88.0<wbr><wbr><wbr>hongqu<br>
2<wbr><wbr><wbr><wbr><wbr><wbr>t03<wbr><wbr><wbr><wbr>27<wbr><wbr><wbr><wbr><wbr>70.0<wbr><wbr><wbr>yongqi<br>
1<wbr><wbr><wbr><wbr><wbr><wbr>t02<wbr><wbr><wbr><wbr>19<wbr><wbr><wbr><wbr><wbr>75.0<wbr><wbr><wbr>wangdong<br>
1<wbr><wbr><wbr><wbr><wbr><wbr>t02<wbr><wbr><wbr><wbr>24<wbr><wbr><wbr><wbr><wbr>70.0<wbr><wbr><wbr>zengqiu</wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr>
<wbr></wbr>
使用时要先在子查询中进行分区与排序,比如oracle中这样一句SQL:
select row_number() over (partition by id1 order by age desc)from row_number_test;
转换为hive语句应该是:
select row_number(id1) from<wbr><wbr>--partition by的字段传到row_number函数中去</wbr></wbr>
<wbr><wbr><wbr> (select *from row_number_test distribute by id1 sort by id1,age desc) a;</wbr></wbr></wbr>
<wbr></wbr>
如果partition by 两个字段:
select row_number() over (partition by id1,id2 orderby<wbr>score) from row_number_test;</wbr>
转换为hive语句应该是:
select row_number(id1,id2)<wbr><wbr>--partition by的字段传到row_number函数中去</wbr></wbr>
<wbr><wbr><wbr> from(select * from row_number_test distribute by id1,id2 sort byid1,id2,score) a;</wbr></wbr></wbr>
<wbr></wbr>
展示一下查询结果:
1.
select id1,id2,age,score,name,row_number(id1) rn from (select *from row_number_test distribute by id1 sort by id1,age desc) a;
<wbr></wbr>
OK
2<wbr><wbr><wbr><wbr><wbr><wbr>t03<wbr><wbr><wbr><wbr>30<wbr><wbr><wbr><wbr><wbr>88.0<wbr><wbr><wbr>hongqu<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>1<br>
2<wbr><wbr><wbr><wbr><wbr><wbr>t03<wbr><wbr><wbr><wbr>27<wbr><wbr><wbr><wbr><wbr>70.0<wbr><wbr><wbr>yongqi<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>2<br>
2<wbr><wbr><wbr><wbr><wbr><wbr>t04<wbr><wbr><wbr><wbr>25<wbr><wbr><wbr><wbr><wbr>60.0<wbr><wbr><wbr>youlia<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>3<br>
1<wbr><wbr><wbr><wbr><wbr><wbr>t02<wbr><wbr><wbr><wbr>24<wbr><wbr><wbr><wbr><wbr>70.0<wbr><wbr><wbr>zengqiu<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>1<br>
1<wbr><wbr><wbr><wbr><wbr><wbr>t02<wbr><wbr><wbr><wbr>24<wbr><wbr><wbr><wbr><wbr>70.0<wbr><wbr><wbr>zengqiu<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>2<br>
1<wbr><wbr><wbr><wbr><wbr><wbr>t01<wbr><wbr><wbr><wbr>20<wbr><wbr><wbr><wbr><wbr>85.0<wbr><wbr><wbr>liujiannan<wbr><wbr><wbr><wbr><wbr>3<br>
1<wbr><wbr><wbr><wbr><wbr><wbr>t02<wbr><wbr><wbr><wbr>19<wbr><wbr><wbr><wbr><wbr>75.0<wbr><wbr><wbr>wangdong<wbr><wbr><wbr><wbr><wbr><wbr><wbr>4</wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr>
<wbr></wbr>
2.
select id1,id2,age,score,name,row_number(id1,id2) rn from(select * from row_number_test distribute by id1,id2 sortby id1,id2,score) a;
<wbr></wbr>
OK
2<wbr><wbr><wbr><wbr><wbr><wbr>t04<wbr><wbr><wbr><wbr>25<wbr><wbr><wbr><wbr><wbr>60.0<wbr><wbr><wbr>youlia<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>1<br>
1<wbr><wbr><wbr><wbr><wbr><wbr>t02<wbr><wbr><wbr><wbr>24<wbr><wbr><wbr><wbr><wbr>70.0<wbr><wbr><wbr>zengqiu<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>1<br>
2<wbr><wbr><wbr><wbr><wbr><wbr>t03<wbr><wbr><wbr><wbr>27<wbr><wbr><wbr><wbr><wbr>70.0<wbr><wbr><wbr>yongqi<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>1<br>
1<wbr><wbr><wbr><wbr><wbr><wbr>t02<wbr><wbr><wbr><wbr>24<wbr><wbr><wbr><wbr><wbr>70.0<wbr><wbr><wbr> zengqiu<wbr><wbr><wbr><wbr> 2<br>
1<wbr><wbr><wbr><wbr><wbr><wbr>t02<wbr><wbr><wbr><wbr>19<wbr><wbr><wbr><wbr><wbr>75.0<wbr><wbr><wbr> wangdong<wbr><wbr><wbr><wbr>3<br>
1<wbr><wbr><wbr><wbr><wbr><wbr>t01<wbr><wbr><wbr><wbr>20<wbr><wbr><wbr><wbr><wbr>85.0<wbr><wbr><wbr>liujiannan<wbr><wbr><wbr><wbr><wbr>1<br>
2<wbr><wbr><wbr><wbr><wbr><wbr>t03<wbr><wbr><wbr><wbr>30<wbr><wbr><wbr><wbr><wbr>88.0<wbr><wbr><wbr> hongqu<wbr><wbr><wbr><wbr><wbr>2</wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr>
<wbr></wbr>
下面是代码,只实现了接收1个参数和2个参数的evaluator方法,参数再多的照搬代码就可以了,代码仅供参考:
package org.rowincrement;
import java.util.ArrayList;
import org.apache.hadoop.hive.ql.exec.UDFArgumentException;
import org.apache.hadoop.hive.ql.metadata.HiveException;
import org.apache.hadoop.hive.ql.exec.UDFArgumentLengthException;
import org.apache.hadoop.hive.ql.udf.generic.GenericUDTF;
import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory;
import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector;
import org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory;
public class RowIncrement extends GenericUDTF {
Object[] result = new Object[1];
@Override
public void close() throws HiveException {
}
@Override
public StructObjectInspector initialize(ObjectInspector[] args)
throws UDFArgumentException {
if (args.length != 1) {
throw new UDFArgumentLengthException("RowIncrement takes only one argument");
}
if (!args[0].getTypeName().equals("int")) {
throw new UDFArgumentException("RowIncrement only takes an integer as a parameter");
}
ArrayList<String> fieldNames = new ArrayList<String>();
ArrayList<ObjectInspector> fieldOIs = new ArrayList<ObjectInspector>();
fieldNames.add("col1");
fieldOIs.add(PrimitiveObjectInspectorFactory.javaIntObjectInspector);
return ObjectInspectorFactory.getStandardStructObjectInspector(fieldNames,fieldOIs);
}
@Override
public void process(Object[] args) throws HiveException {
try
{
int n = Integer.parseInt(args[0].toString());
for (int i=0;i<n;i++)
{
result[0] = i+1;
forward(result);
}
}
catch (Exception e) {
throw new HiveException("RowIncrement has an exception");
}
}
public static void main(String args[])
{
Row_number t = new Row_number();
System.out.println(t.evaluate(123));
System.out.println(t.evaluate(123));
System.out.println(t.evaluate(123));
System.out.println(t.evaluate(1234));
System.out.println(t.evaluate(1234));
System.out.println(t.evaluate(1234));
System.out.println(t.evaluate(1235));
}
}
分享到:
相关推荐
至于“rownum”,在标准SQL中,`ROW_NUMBER()`是一个窗口函数,但在Hive中,你需要使用`ROW_NUMBER()`或者`RANK()`等内置函数来实现类似的功能,为每一行分配一个唯一的行号。如果Hive内置的函数无法满足需求,你...
Hive函数实例数据文件通过列出不同NBA球队及其夺冠年份的方式,展现了如何在Hive中使用函数对数据进行处理。通过这一实例,我们可以了解到Hive中数据的组织形式、函数的应用以及如何处理复杂数据。 Hive是基于...
10. **UDAF(用户自定义聚合函数)**:用于实现自定义的聚合操作,如统计特定条件下的数据个数。 11. **UDTF(用户自定义表生成函数)**:能够返回多行数据的函数,可以将一行数据转化为多行。 以上是Hive函数的...
5. **高级特性**:可能会涉及UDF(用户定义函数)、UDAF(用户定义聚合函数)和UDTF(用户定义表生成函数)的开发和使用,这些允许用户自定义函数来扩展Hive的功能。 6. **性能优化**:如何通过分区、桶、缓存等...
- **窗口函数**:ROW_NUMBER、RANK、LEAD和LAG等窗口函数在分析排序数据时非常有用,例如计算排名或滞后/领先值。 6. **Hive函数全攻略** - **内置函数**:包括字符串函数(如CONCAT、SUBSTRING)、数学函数(如...
在大数据处理领域,Hive作为一个广泛使用的数据仓库工具,提供了丰富的内置函数来支持数据的处理与分析。本文档旨在介绍Hive中的常用函数及其用法,帮助用户更好地理解和应用这些函数。需要注意的是,由于Hive的不同版本...
本资料“hive的函数大全.rar”包含了Hive的中英文两个版本的函数大全详细介绍,是学习和使用Hive进行大数据分析的重要参考资料。 Hive函数是Hive查询语言(HQL)的核心组成部分,它们允许用户执行各种数据操作,如...
此外,还可以使用子查询、窗口函数(ROW_NUMBER、RANK、DENSE_RANK 等)进行复杂的数据分析。 5. **PARTITION**:在Hive中,分区是一种组织大量数据的有效方式,可以提高查询性能。通过在表上定义一个或多个分区列...
窗口函数是数据分析中的利器,如`RANK()`、`DENSE_RANK()`和`ROW_NUMBER()`。它们在`OVER()`定义的窗口内工作,可以实现动态排序。`LAG()`和`LEAD()`用于获取当前行前后的数据,而`NTILE()`将数据分配到固定数量的组...
- 窗口和分析函数的使用(SUM、AVG、MIN、MAX、NTILE、ROW_NUMBER、RANK等); - 虚拟列(INPUT__FILE__NAME、BLOCK__OFFSET__INSIDE__FILE)的含义与应用场景。 6. **Hive存储格式与压缩** - 行式存储与列式...
8. **Hive UDF(用户自定义函数)**:包括UDF(单行函数)、UDAF(聚合函数)和UDTF(多行函数),允许用户扩展Hive的功能,处理特殊数据类型或复杂逻辑。 9. **Hive与Spark集成**:使用Spark作为执行引擎,可以...
4. **视图与UDF**: 视图简化了复杂查询,而用户自定义函数(UDF)允许用户扩展Hive的功能。 5. **Hive与Hadoop的交互**: 了解如何在Hadoop集群上部署和配置Hive,以及如何通过HDFS交互数据。 **Hive从入门到精通** ...
- **UDF(User Defined Function)**:用户可以定义自己的函数来扩展HQL的功能,实现更复杂的数据处理逻辑。 ### 数据加载与导出 Hive支持多种方式来加载数据到表中,如使用`LOAD DATA INPATH`命令将HDFS中的数据...
开窗函数如`ROW_NUMBER()`, `RANK()`, `LEAD()`, `LAG()`等在SQL中用于在数据窗口上执行计算,比如计算排名或移动平均。自连接SQL是将一个表与自身进行连接,常用于处理具有层级关系的数据或查找相关记录。 2. **...
- **窗口函数**:提供类似SQL的窗口函数,如row_number、lead、lag、rank等,用于处理时间序列数据和排名问题。 - **UDF(用户自定义函数)**:可以注册自定义函数,扩展Spark SQL的功能。 - **Join操作**:支持...
- **窗口函数**: 支持窗口函数,如`row_number()`, `lead()`, `lag()`等,进行复杂的数据分析。 - **Joins**: 提供不同类型的join操作,包括内连接、外连接和全连接。 - **UDF(用户自定义函数)**: 可以注册Java...