内置的操作符和功能
内置操作符
- 关系操作符 - 下列操作通过比较操作数并生成一个TRUE或FALSE值。
A!=B
A<B
A<=B
A>B
A>=B
A IS NULL
A IS NOT NULL
A LIKE B strings
A RLIKE B strings
A REGEXPB strings
算术操作符
A + B all number types
A - B all number types
A * B all number types
A / B all number types
A % B all number types
A & B all number types
A | B
A ^ B
~A
逻辑操作
A AND B, A && B, A OR B, A | B, NOT A, !A
复合类型操作
A[n] A是一个数组,n是一个整数
M[key] M--Map<K,V>
S.x S是一个结构类型
内置函数
BIGINT | round(double a) | returns the rounded BIGINT value of the double |
BIGINT | floor(double a) | returns the maximum BIGINT value that is equal or less than the double |
BIGINT | ceil(double a) | returns the minimum BIGINT value that is equal or greater than the double |
double | rand(), rand(int seed) | returns a random number (that changes from row to row). Specifiying the seed will make sure the generated random number sequence is deterministic. |
string | concat(string A, string B,...) | returns the string resulting from concatenating B after A. For example, concat('foo', 'bar') results in 'foobar'. This function accepts arbitrary number of arguments and return the concatenation of all of them. |
string | substr(string A, int start) | returns the substring of A starting from start position till the end of string A. For example, substr('foobar', 4) results in 'bar' |
string | substr(string A, int start, int length) | returns the substring of A starting from start position with the given length e.g. substr('foobar', 4, 2) results in 'ba' |
string | upper(string A) | returns the string resulting from converting all characters of A to upper case e.g. upper('fOoBaR') results in 'FOOBAR' |
string | ucase(string A) | Same as upper |
string | lower(string A) | returns the string resulting from converting all characters of B to lower case e.g. lower('fOoBaR') results in 'foobar' |
string | lcase(string A) | Same as lower |
string | trim(string A) | returns the string resulting from trimming spaces from both ends of A e.g. trim(' foobar ') results in 'foobar' |
string | ltrim(string A) | returns the string resulting from trimming spaces from the beginning(left hand side) of A. For example, ltrim(' foobar ') results in 'foobar ' |
string | rtrim(string A) | returns the string resulting from trimming spaces from the end(right hand side) of A. For example, rtrim(' foobar ') results in ' foobar' |
string | regexp_replace(string A, string B, string C) | returns the string resulting from replacing all substrings in B that match the Java regular expression syntax(See Java regular expressions syntax) with C. For example, regexp_replace('foobar', 'oo|ar', ) returns 'fb' |
int | size(Map<K.V>) | returns the number of elements in the map type |
int | size(Array<T>) | returns the number of elements in the array type |
value of <type> | cast(<expr> as <type>) | converts the results of the expression expr to <type> e.g. cast('1' as BIGINT) will convert the string '1' to it integral representation. A null is returned if the conversion does not succeed. |
string | from_unixtime(int unixtime) | convert the number of seconds from unix epoch (1970-01-01 00:00:00 UTC) to a string representing the timestamp of that moment in the current system time zone in the format of "1970-01-01 00:00:00" |
string | to_date(string timestamp) | Return the date part of a timestamp string: to_date("1970-01-01 00:00:00") = "1970-01-01" |
int | year(string date) | Return the year part of a date or a timestamp string: year("1970-01-01 00:00:00") = 1970, year("1970-01-01") = 1970 |
int | month(string date) | Return the month part of a date or a timestamp string: month("1970-11-01 00:00:00") = 11, month("1970-11-01") = 11 |
int | day(string date) | Return the day part of a date or a timestamp string: day("1970-11-01 00:00:00") = 1, day("1970-11-01") = 1 |
string | get_json_object(string json_string, string path) | Extract json object from a json string based on json path specified, and return json string of the extracted json object. It will return null if the input json string is invalid |
- The following built in aggregate functions are supported in Hive:
BIGINT | count(*), count(expr), count(DISTINCT expr[, expr_.]) | count(*) - Returns the total number of retrieved rows, including rows containing NULL values; count(expr) - Returns the number of rows for which the supplied expression is non-NULL; count(DISTINCT expr[, expr]) - Returns the number of rows for which the supplied expression(s) are unique and non-NULL. |
DOUBLE | sum(col), sum(DISTINCT col) | returns the sum of the elements in the group or the sum of the distinct values of the column in the group |
DOUBLE | avg(col), avg(DISTINCT col) | returns the average of the elements in the group or the average of the distinct values of the column in the group |
DOUBLE | min(col) | returns the minimum value of the column in the group |
DOUBLE | max(col) | returns the maximum value of the column in the group |
语言功能
Hive查询语言提供基本的类SQL操作,这些操作工作在表或分区上,它们是:
- 有使用where条件从表中过滤行的能力。
- 有使用select条件从表中选择指定的列的能力
- 在两个表中自然连接
- 使用group by列进行聚合计算
- 存储一个查贸易的结果到另外一张表
- 下载表内容到本地目录
- 存储一个查询结果到hadoop dfs目录
- 管理表和分区
- 插入自定义的脚本语言选择自定义的map/reduce工作。
用法和例子
下面的例子突出一些显著的特性,一个详细的查询数据集测试用例在Hive Query Test Cases找到并且在Query Test Case Results找到一致的结果
创建表
相关推荐
《Hive权威指南》是一本深入探讨Hive技术的书籍,专为大数据处理和分析而设计。Hive是建立在Hadoop生态系统上的一个数据仓库工具,它允许通过SQL-like语言(HQL)对大规模分布式数据集进行查询和管理。这本书为读者...
hive编程, 高清,可以查找相关函数等
Hive编程指南.pdf
二、Hive安装与配置 在Java环境下,首先需要安装Hadoop环境,然后配置Hive的相关环境变量,如HIVE_HOME、HADOOP_HOME等,并设置相应的classpath。接着创建Hive的元数据库,可以选择MySQL或其他支持JDBC的数据库系统...
《Hive编程指南》是一本Apache Hive的编程指南 旨在介绍如何使用Hive的SQL方法 HiveQL来汇总 查询和分析存储在Hadoop分布式文件系统上的大数据集合 全书通过大量的实例 首先介绍如何在用户环境下安装和配置Hive 并对...
Hive编程指南PDF
《Hive用户指南》中文版是针对大数据处理领域的一款强大工具——Hive的详细使用手册。Hive是由Facebook开发并开源的一种基于Hadoop的数据仓库工具,它允许使用SQL(HQL,Hive Query Language)对大规模数据集进行...
HIVE 编程指南
Hive调优全方位指南,总结了25条关于Hive调优的经验,对于大数据及hive工程师是不可多得的资源。
**Hive编程指南** Hive的核心理念是将SQL语句转换为MapReduce任务进行执行,这使得非Java背景的分析师也能方便地处理大数据。Hive提供了多种组件,包括HiveQL(Hive的SQL方言)、元数据存储、查询解析器、优化器和...
Hive编程指南.中文版.完整.高清 PDF
《Hive编程指南》是一本深入探讨Hive技术的专业书籍,尤其对于初学者和开发者来说,它是理解并掌握Hive的宝贵资源。Hive作为大数据处理领域中的重要工具,是基于Hadoop生态系统的数据仓库解决方案,它允许用户使用...