1. 查询语句组成
2. 查询语句关键字含义
2.1 LIMIT
类似于MySQL的LIMIT,用于限定查询记录数
2.2 WHERE
类似于MySQL的WHERE,用于指定查询条件
2.3 GROUP BY
分组查询
2.4 ORDER BY
- 全局排序
- 仅仅动一个reduce task
- 速度可能会非常慢
- Strict模式下,必须与limit连用
2.5 SORT BY
- 可以有多个reduce task(个数如何确定?)
- 每个Reduce Task内部数据有序,但全局无序
- 通常与distribute by联合使用,用于指定数据由哪个reduce task产生
2.6 DISTRIBUTE BY
- 相当于MapReduce中的paritioner,默认是基于hash实现的;
- 与sort by连用,可发挥很好的作用
2.7 CLUSTER BY
- 当distribute by与sort by(降序)连用,且跟随的字段相同时,可使用cluster by简写
2.8 SORT BY、DISTRIBUTE BY、CLUSTER BY举例
3. 关联查询
3.1 Hive支持的关联查询
- INNER JOIN
- LEFT OUTER JOIN
- RIGHT OUTER JOIN
- FULL OUTER JOIN
- LEFT SEMI-JOIN
- Map-side Joins
- 仅支持等值连接,不支持不等值连接
实例:
hive> > > SELECT w.id FROM word w join my_word m on w.id = m.id; Query ID = hadoop_20150310022828_c826a379-81d7-4d8b-a299-3f163ee4079a Total jobs = 1 15/03/10 02:28:37 WARN conf.Configuration: file:/home/hadoop/software/apache-hive-0.14.0-bin/iotmp/9a4f11ed-42a4-44cc-a405-2bcd87bce0b7/hive_2015-03-10_02-28-14_555_4164322138343464793-1/-local-10006/jobconf.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring. 15/03/10 02:28:37 WARN conf.Configuration: file:/home/hadoop/software/apache-hive-0.14.0-bin/iotmp/9a4f11ed-42a4-44cc-a405-2bcd87bce0b7/hive_2015-03-10_02-28-14_555_4164322138343464793-1/-local-10006/jobconf.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring. SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/home/hadoop/software/hadoop-2.5.2/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/home/hadoop/software/apache-hive-0.14.0-bin/lib/hive-jdbc-0.14.0-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] Execution log at: /tmp/hadoop/hadoop_20150310022828_c826a379-81d7-4d8b-a299-3f163ee4079a.log 2015-03-10 02:28:41 Starting to launch local task to process map join; maximum memory = 477102080 2015-03-10 02:28:49 Dump the side-table for tag: 1 with group count: 3 into file: file:/home/hadoop/software/apache-hive-0.14.0-bin/iotmp/9a4f11ed-42a4-44cc-a405-2bcd87bce0b7/hive_2015-03-10_02-28-14_555_4164322138343464793-1/-local-10003/HashTable-Stage-3/MapJoin-mapfile11--.hashtable 2015-03-10 02:28:49 Uploaded 1 File to: file:/home/hadoop/software/apache-hive-0.14.0-bin/iotmp/9a4f11ed-42a4-44cc-a405-2bcd87bce0b7/hive_2015-03-10_02-28-14_555_4164322138343464793-1/-local-10003/HashTable-Stage-3/MapJoin-mapfile11--.hashtable (320 bytes) 2015-03-10 02:28:49 End of local task; Time Taken: 7.816 sec. Execution completed successfully MapredLocal task succeeded Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_1425868733189_0004, Tracking URL = http://hadoop.master:8088/proxy/application_1425868733189_0004/ Kill Command = /home/hadoop/software/hadoop-2.5.2/bin/hadoop job -kill job_1425868733189_0004 Hadoop job information for Stage-3: number of mappers: 1; number of reducers: 0 2015-03-10 02:29:17,976 Stage-3 map = 0%, reduce = 0% 2015-03-10 02:29:32,438 Stage-3 map = 100%, reduce = 0%, Cumulative CPU 3.33 sec MapReduce Total cumulative CPU time: 3 seconds 330 msec Ended Job = job_1425868733189_0004 MapReduce Jobs Launched: Stage-Stage-3: Map: 1 Cumulative CPU: 3.33 sec HDFS Read: 254 HDFS Write: 13 SUCCESS Total MapReduce CPU Time Spent: 3 seconds 330 msec OK 1 10 10 1000 Time taken: 80.261 seconds, Fetched: 4 row(s)
3.2 Map side Join
- Join操作在map task中完成,因此无需启动reduce task;
- 适合一个大表,一个小表的连接操作
- 思想:小表复制到各个节点上,并加载到内存中;大表分片,与小表完成连接操作
3.3 Reduce side Join
- 适合两个大表连接操作
- 思想:map端按照连接字段进行hash,reduce 端完成连接操作
举例:
SELECT /*+ MAPJOIN(b) */ a.key, a.value FROM a join b on a.key = b.key
3.4 LEFT SEMI-JOIN(左半连接)
select word.id from word left semi join my_word on (word.id=my_word.id);
实例:
hive> select word.id from word left semi join my_word on (word.id=my_word.id); Query ID = hadoop_20150310020606_41b5d13c-a83e-4878-823c-d9911d0c274b Total jobs = 1 15/03/10 02:08:54 WARN conf.Configuration: file:/home/hadoop/software/apache-hive-0.14.0-bin/iotmp/9a4f11ed-42a4-44cc-a405-2bcd87bce0b7/hive_2015-03-10_02-06-52_379_8334166551786931789-1/-local-10006/jobconf.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring. 15/03/10 02:08:54 WARN conf.Configuration: file:/home/hadoop/software/apache-hive-0.14.0-bin/iotmp/9a4f11ed-42a4-44cc-a405-2bcd87bce0b7/hive_2015-03-10_02-06-52_379_8334166551786931789-1/-local-10006/jobconf.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring. SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/home/hadoop/software/hadoop-2.5.2/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/home/hadoop/software/apache-hive-0.14.0-bin/lib/hive-jdbc-0.14.0-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] Execution log at: /tmp/hadoop/hadoop_20150310020606_41b5d13c-a83e-4878-823c-d9911d0c274b.log 2015-03-10 02:09:34 Starting to launch local task to process map join; maximum memory = 477102080 2015-03-10 02:09:42 Dump the side-table for tag: 1 with group count: 3 into file: file:/home/hadoop/software/apache-hive-0.14.0-bin/iotmp/9a4f11ed-42a4-44cc-a405-2bcd87bce0b7/hive_2015-03-10_02-06-52_379_8334166551786931789-1/-local-10003/HashTable-Stage-3/MapJoin-mapfile01--.hashtable 2015-03-10 02:09:43 Uploaded 1 File to: file:/home/hadoop/software/apache-hive-0.14.0-bin/iotmp/9a4f11ed-42a4-44cc-a405-2bcd87bce0b7/hive_2015-03-10_02-06-52_379_8334166551786931789-1/-local-10003/HashTable-Stage-3/MapJoin-mapfile01--.hashtable (316 bytes) 2015-03-10 02:09:43 End of local task; Time Taken: 8.098 sec. Execution completed successfully MapredLocal task succeeded Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_1425868733189_0003, Tracking URL = http://hadoop.master:8088/proxy/application_1425868733189_0003/ Kill Command = /home/hadoop/software/hadoop-2.5.2/bin/hadoop job -kill job_1425868733189_0003 Hadoop job information for Stage-3: number of mappers: 1; number of reducers: 0 2015-03-10 02:12:42,201 Stage-3 map = 0%, reduce = 0% 2015-03-10 02:13:42,866 Stage-3 map = 0%, reduce = 0% 2015-03-10 02:14:17,089 Stage-3 map = 100%, reduce = 0%, Cumulative CPU 13.16 sec MapReduce Total cumulative CPU time: 13 seconds 160 msec Ended Job = job_1425868733189_0003 MapReduce Jobs Launched: Stage-Stage-3: Map: 1 Cumulative CPU: 13.16 sec HDFS Read: 254 HDFS Write: 10 SUCCESS Total MapReduce CPU Time Spent: 13 seconds 160 msec OK 1 10 1000 Time taken: 451.347 seconds, Fetched: 3 row(s)
相关推荐
Hadoop+ZooKeeper+HBase+hive(HQL)安装步骤
最后,我们来看看如何对 Hive 表进行简单的查询操作。 ```sql SELECT * FROM db_0309.emp_nopart; ``` 运行此命令后,可以看到表 `db_0309.emp_nopart` 中的数据。 ```sql OK 7369 SMITH CLERK 7 ``` 这里只展示...
在Java开发中,有时我们需要与大数据处理平台如Hive进行交互,执行SQL查询来获取或处理数据。本教程将深入探讨如何使用Java API与Hive进行集成,实现简单的查询操作。我们将围绕以下知识点展开讨论: 1. **Hive简介...
它提供了一个SQL-like的查询语言,称为HQL(Hive Query Language),使得用户可以使用类似SQL的语句来查询和管理数据。在Hive开发中,遵循一定的规范和要点非常重要,以下是Hive开发规范及要点: 一、基本操作 1. ...
Hadoop是一个开源框架,主要用于分布式存储和计算大规模数据集,而Hive则是建立在Hadoop之上的一种数据仓库工具,它提供了一种SQL-like的语言(称为HiveQL或HQL)来方便数据查询和分析。在这个"Day08-Hive函数与HQL...
3. **Hive SQL(HQL)**:HQL是Hive提供的SQL方言,用于查询和管理数据。它支持SELECT、INSERT、UPDATE、DELETE等基本操作,以及JOIN、GROUP BY、HAVING等复杂查询。 4. **编译与执行计划**:Hive将HQL语句转换为...
Hive 是一个构建于 Hadoop 之上的数据仓库工具,它提供了类 SQL 的查询语言(HQL),使得用户可以方便地处理存储在 Hadoop 分布式文件系统(HDFS)中的大规模数据集。Hive 的设计初衷在于降低大数据处理的门槛,让...
Hive处理HQL查询语句的解释、优化和查询计划生成,而Hadoop负责执行查询计划。Hive的所有数据存储在Hadoop的HDFS中。值得注意的是,Hadoop和Hive默认使用UTF-8编码。 ### Hive与传统关系数据库的异同 - **查询语言...
在大数据处理领域,Hive是一个基于Hadoop的数据仓库工具,它允许用户使用SQL(HQL,Hive Query Language)查询和管理存储在Hadoop分布式文件系统(HDFS)中的大量结构化数据。Hive 1.1.0是Hive的一个版本,提供了...
* 查询语言:Hive使用类SQL的查询语言HQL,而数据库使用SQL * 数据更新:Hive是不建议对数据进行改写的,而数据库中的数据通常是需要经常进行修改的 * 执行延迟:Hive的执行延迟比较高,数据库的执行延迟比较低 ...
Hadoop-2.8.0-HA-Hive安装部署与HQL07.hive.mp4
Hive则是基于Hadoop的数据仓库工具,它提供了SQL-like查询语言HQL,使得非程序员也能方便地处理和分析存储在Hadoop集群中的大量数据。 首先,我们要了解Hadoop的高可用性(High Availability, HA)。HA是指系统在遇到...
在大数据处理领域,Hive是一个基于Hadoop的数据仓库工具,它允许用户使用SQL(HQL,Hive Query Language)查询和管理存储在Hadoop分布式文件系统(HDFS)中的大量数据。Hive提供了数据整合、元数据管理、查询和分析...
Hive的主要功能是将结构化的数据文件映射为一张数据库表,并提供SQL(HQL,Hive SQL)查询语言来查询这些数据。以下是对"windows hive cmd下载"相关知识点的详细解释: 1. **Apache Hive**: Apache Hive是由...
Hadoop-2.8.0-HA-Hive安装部署与HQL05.mysql的安装详细步骤.mp4
Hive是一个基于Hadoop的数据仓库工具,它本身并不存储数据,部署在Hadoop集群上,数据是存储在HDFS上的. Hive所建的表在HDFS上对应的是...Hive可以将结构化的数据映射为一张数据库表,并提供HQL(HiveSQL)查询功能。
本文将对 Hive on Spark 的源码进行深入分析,涵盖其基本原理、运行模式、Hive 解析 HQL、Spark 上下文创建、任务执行等方面。 1. 运行模式 Hive on Spark 支持两种运行模式:本地(LOCAL)和远程(REMOTE)。当...
Hive是Apache软件基金会的一个开源项目,它提供了一种基于Hadoop的数据仓库工具,能够将结构化的数据文件映射为一张数据库表,并提供SQL(HQL)查询功能,方便大数据的分析处理。Hive-1.1.0-cdh5.9.3是Cloudera ...