`
bit1129
  • 浏览: 1069559 次
  • 性别: Icon_minigender_1
  • 来自: 北京
社区版块
存档分类
最新评论

【Hive五】HQL查询

    博客分类:
  • Hive
 
阅读更多

1. 查询语句组成



 

2. 查询语句关键字含义

2.1 LIMIT

类似于MySQL的LIMIT,用于限定查询记录数

 

2.2 WHERE

类似于MySQL的WHERE,用于指定查询条件

 

2.3 GROUP BY

分组查询

 

2.4 ORDER BY

  • 全局排序
  • 仅仅动一个reduce task
  • 速度可能会非常慢
  • Strict模式下,必须与limit连用

2.5 SORT BY

  • 可以有多个reduce task(个数如何确定?)
  • 每个Reduce Task内部数据有序,但全局无序
  • 通常与distribute by联合使用,用于指定数据由哪个reduce task产生

2.6 DISTRIBUTE BY

  • 相当于MapReduce中的paritioner,默认是基于hash实现的;
  • 与sort by连用,可发挥很好的作用

2.7 CLUSTER BY

  • 当distribute by与sort by(降序)连用,且跟随的字段相同时,可使用cluster by简写

2.8 SORT BY、DISTRIBUTE BY、CLUSTER BY举例

 

 

 

3. 关联查询

3.1 Hive支持的关联查询

  • INNER JOIN
  • LEFT OUTER JOIN
  • RIGHT OUTER JOIN
  • FULL OUTER JOIN
  • LEFT SEMI-JOIN
  • Map-side Joins
  • 仅支持等值连接,不支持不等值连接

实例:

hive> 
    > 
    > SELECT  w.id  FROM word w join my_word m on w.id = m.id;
Query ID = hadoop_20150310022828_c826a379-81d7-4d8b-a299-3f163ee4079a
Total jobs = 1
15/03/10 02:28:37 WARN conf.Configuration: file:/home/hadoop/software/apache-hive-0.14.0-bin/iotmp/9a4f11ed-42a4-44cc-a405-2bcd87bce0b7/hive_2015-03-10_02-28-14_555_4164322138343464793-1/-local-10006/jobconf.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval;  Ignoring.
15/03/10 02:28:37 WARN conf.Configuration: file:/home/hadoop/software/apache-hive-0.14.0-bin/iotmp/9a4f11ed-42a4-44cc-a405-2bcd87bce0b7/hive_2015-03-10_02-28-14_555_4164322138343464793-1/-local-10006/jobconf.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts;  Ignoring.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/hadoop/software/hadoop-2.5.2/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/hadoop/software/apache-hive-0.14.0-bin/lib/hive-jdbc-0.14.0-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Execution log at: /tmp/hadoop/hadoop_20150310022828_c826a379-81d7-4d8b-a299-3f163ee4079a.log
2015-03-10 02:28:41	Starting to launch local task to process map join;	maximum memory = 477102080
2015-03-10 02:28:49	Dump the side-table for tag: 1 with group count: 3 into file: file:/home/hadoop/software/apache-hive-0.14.0-bin/iotmp/9a4f11ed-42a4-44cc-a405-2bcd87bce0b7/hive_2015-03-10_02-28-14_555_4164322138343464793-1/-local-10003/HashTable-Stage-3/MapJoin-mapfile11--.hashtable
2015-03-10 02:28:49	Uploaded 1 File to: file:/home/hadoop/software/apache-hive-0.14.0-bin/iotmp/9a4f11ed-42a4-44cc-a405-2bcd87bce0b7/hive_2015-03-10_02-28-14_555_4164322138343464793-1/-local-10003/HashTable-Stage-3/MapJoin-mapfile11--.hashtable (320 bytes)
2015-03-10 02:28:49	End of local task; Time Taken: 7.816 sec.
Execution completed successfully
MapredLocal task succeeded
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1425868733189_0004, Tracking URL = http://hadoop.master:8088/proxy/application_1425868733189_0004/
Kill Command = /home/hadoop/software/hadoop-2.5.2/bin/hadoop job  -kill job_1425868733189_0004
Hadoop job information for Stage-3: number of mappers: 1; number of reducers: 0
2015-03-10 02:29:17,976 Stage-3 map = 0%,  reduce = 0%
2015-03-10 02:29:32,438 Stage-3 map = 100%,  reduce = 0%, Cumulative CPU 3.33 sec
MapReduce Total cumulative CPU time: 3 seconds 330 msec
Ended Job = job_1425868733189_0004
MapReduce Jobs Launched: 
Stage-Stage-3: Map: 1   Cumulative CPU: 3.33 sec   HDFS Read: 254 HDFS Write: 13 SUCCESS
Total MapReduce CPU Time Spent: 3 seconds 330 msec
OK
1
10
10
1000
Time taken: 80.261 seconds, Fetched: 4 row(s)

 

 

 

3.2 Map side Join

  • Join操作在map task中完成,因此无需启动reduce task;
  • 适合一个大表,一个小表的连接操作
  • 思想:小表复制到各个节点上,并加载到内存中;大表分片,与小表完成连接操作

3.3 Reduce side Join

  • 适合两个大表连接操作
  • 思想:map端按照连接字段进行hash,reduce 端完成连接操作

举例:

SELECT /*+ MAPJOIN(b) */ a.key, a.value FROM a join b on a.key = b.key

 

3.4 LEFT SEMI-JOIN(左半连接)

select word.id from word left semi join my_word on (word.id=my_word.id); 

 

 实例:

hive> select word.id from word left semi join my_word on (word.id=my_word.id); 
Query ID = hadoop_20150310020606_41b5d13c-a83e-4878-823c-d9911d0c274b
Total jobs = 1
15/03/10 02:08:54 WARN conf.Configuration: file:/home/hadoop/software/apache-hive-0.14.0-bin/iotmp/9a4f11ed-42a4-44cc-a405-2bcd87bce0b7/hive_2015-03-10_02-06-52_379_8334166551786931789-1/-local-10006/jobconf.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval;  Ignoring.
15/03/10 02:08:54 WARN conf.Configuration: file:/home/hadoop/software/apache-hive-0.14.0-bin/iotmp/9a4f11ed-42a4-44cc-a405-2bcd87bce0b7/hive_2015-03-10_02-06-52_379_8334166551786931789-1/-local-10006/jobconf.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts;  Ignoring.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/hadoop/software/hadoop-2.5.2/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/hadoop/software/apache-hive-0.14.0-bin/lib/hive-jdbc-0.14.0-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Execution log at: /tmp/hadoop/hadoop_20150310020606_41b5d13c-a83e-4878-823c-d9911d0c274b.log
2015-03-10 02:09:34	Starting to launch local task to process map join;	maximum memory = 477102080
2015-03-10 02:09:42	Dump the side-table for tag: 1 with group count: 3 into file: file:/home/hadoop/software/apache-hive-0.14.0-bin/iotmp/9a4f11ed-42a4-44cc-a405-2bcd87bce0b7/hive_2015-03-10_02-06-52_379_8334166551786931789-1/-local-10003/HashTable-Stage-3/MapJoin-mapfile01--.hashtable
2015-03-10 02:09:43	Uploaded 1 File to: file:/home/hadoop/software/apache-hive-0.14.0-bin/iotmp/9a4f11ed-42a4-44cc-a405-2bcd87bce0b7/hive_2015-03-10_02-06-52_379_8334166551786931789-1/-local-10003/HashTable-Stage-3/MapJoin-mapfile01--.hashtable (316 bytes)
2015-03-10 02:09:43	End of local task; Time Taken: 8.098 sec.
Execution completed successfully
MapredLocal task succeeded
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1425868733189_0003, Tracking URL = http://hadoop.master:8088/proxy/application_1425868733189_0003/
Kill Command = /home/hadoop/software/hadoop-2.5.2/bin/hadoop job  -kill job_1425868733189_0003
Hadoop job information for Stage-3: number of mappers: 1; number of reducers: 0
2015-03-10 02:12:42,201 Stage-3 map = 0%,  reduce = 0%
2015-03-10 02:13:42,866 Stage-3 map = 0%,  reduce = 0%
2015-03-10 02:14:17,089 Stage-3 map = 100%,  reduce = 0%, Cumulative CPU 13.16 sec
MapReduce Total cumulative CPU time: 13 seconds 160 msec
Ended Job = job_1425868733189_0003
MapReduce Jobs Launched: 
Stage-Stage-3: Map: 1   Cumulative CPU: 13.16 sec   HDFS Read: 254 HDFS Write: 10 SUCCESS
Total MapReduce CPU Time Spent: 13 seconds 160 msec
OK
1
10
1000
Time taken: 451.347 seconds, Fetched: 3 row(s)

 

 

 

 

 

 

 

 

  • 大小: 74.5 KB
  • 大小: 23.5 KB
  • 大小: 15.9 KB
分享到:
评论

相关推荐

    Hadoop+ZooKeeper+HBase+hive(HQL)安装步骤

    Hadoop+ZooKeeper+HBase+hive(HQL)安装步骤

    hive数据加载导出查询

    最后,我们来看看如何对 Hive 表进行简单的查询操作。 ```sql SELECT * FROM db_0309.emp_nopart; ``` 运行此命令后,可以看到表 `db_0309.emp_nopart` 中的数据。 ```sql OK 7369 SMITH CLERK 7 ``` 这里只展示...

    java_hive简单查询

    在Java开发中,有时我们需要与大数据处理平台如Hive进行交互,执行SQL查询来获取或处理数据。本教程将深入探讨如何使用Java API与Hive进行集成,实现简单的查询操作。我们将围绕以下知识点展开讨论: 1. **Hive简介...

    Hive开发规范及要点

    它提供了一个SQL-like的查询语言,称为HQL(Hive Query Language),使得用户可以使用类似SQL的语句来查询和管理数据。在Hive开发中,遵循一定的规范和要点非常重要,以下是Hive开发规范及要点: 一、基本操作 1. ...

    Hadoop-2.8.0-Day08-Hive函数与HQL详解-课件与资料.zip

    Hadoop是一个开源框架,主要用于分布式存储和计算大规模数据集,而Hive则是建立在Hadoop之上的一种数据仓库工具,它提供了一种SQL-like的语言(称为HiveQL或HQL)来方便数据查询和分析。在这个"Day08-Hive函数与HQL...

    apache-hive-2.3.9-bin.tar大数据HIVE.zip

    3. **Hive SQL(HQL)**:HQL是Hive提供的SQL方言,用于查询和管理数据。它支持SELECT、INSERT、UPDATE、DELETE等基本操作,以及JOIN、GROUP BY、HAVING等复杂查询。 4. **编译与执行计划**:Hive将HQL语句转换为...

    Hive原理与实现

    Hive 是一个构建于 Hadoop 之上的数据仓库工具,它提供了类 SQL 的查询语言(HQL),使得用户可以方便地处理存储在 Hadoop 分布式文件系统(HDFS)中的大规模数据集。Hive 的设计初衷在于降低大数据处理的门槛,让...

    hive学习笔记-比较全的知识

    Hive处理HQL查询语句的解释、优化和查询计划生成,而Hadoop负责执行查询计划。Hive的所有数据存储在Hadoop的HDFS中。值得注意的是,Hadoop和Hive默认使用UTF-8编码。 ### Hive与传统关系数据库的异同 - **查询语言...

    Hive驱动1.1.0.zip

    在大数据处理领域,Hive是一个基于Hadoop的数据仓库工具,它允许用户使用SQL(HQL,Hive Query Language)查询和管理存储在Hadoop分布式文件系统(HDFS)中的大量结构化数据。Hive 1.1.0是Hive的一个版本,提供了...

    大数据技术之Hive.docx

    * 查询语言:Hive使用类SQL的查询语言HQL,而数据库使用SQL * 数据更新:Hive是不建议对数据进行改写的,而数据库中的数据通常是需要经常进行修改的 * 执行延迟:Hive的执行延迟比较高,数据库的执行延迟比较低 ...

    Hadoop-2.8.0-HA-Hive安装部署与HQL07.hive.mp4

    Hadoop-2.8.0-HA-Hive安装部署与HQL07.hive.mp4

    Hadoop-2.8.0-Day07-HA-Hive安装部署与HQL-课件与资料.zip

    Hive则是基于Hadoop的数据仓库工具,它提供了SQL-like查询语言HQL,使得非程序员也能方便地处理和分析存储在Hadoop集群中的大量数据。 首先,我们要了解Hadoop的高可用性(High Availability, HA)。HA是指系统在遇到...

    hive相关jar包

    在大数据处理领域,Hive是一个基于Hadoop的数据仓库工具,它允许用户使用SQL(HQL,Hive Query Language)查询和管理存储在Hadoop分布式文件系统(HDFS)中的大量数据。Hive提供了数据整合、元数据管理、查询和分析...

    windows hive cmd 下载

    Hive的主要功能是将结构化的数据文件映射为一张数据库表,并提供SQL(HQL,Hive SQL)查询语言来查询这些数据。以下是对"windows hive cmd下载"相关知识点的详细解释: 1. **Apache Hive**: Apache Hive是由...

    Hadoop-2.8.0-HA-Hive安装部署与HQL05.mysql的安装详细步骤.mp4

    Hadoop-2.8.0-HA-Hive安装部署与HQL05.mysql的安装详细步骤.mp4

    hive-3.1.1安装包

    Hive是一个基于Hadoop的数据仓库工具,它本身并不存储数据,部署在Hadoop集群上,数据是存储在HDFS上的. Hive所建的表在HDFS上对应的是...Hive可以将结构化的数据映射为一张数据库表,并提供HQL(HiveSQL)查询功能。

    Hive on Spark源码分析DOC

    本文将对 Hive on Spark 的源码进行深入分析,涵盖其基本原理、运行模式、Hive 解析 HQL、Spark 上下文创建、任务执行等方面。 1. 运行模式 Hive on Spark 支持两种运行模式:本地(LOCAL)和远程(REMOTE)。当...

    hive-1.1.0-cdh5.9.3

    Hive是Apache软件基金会的一个开源项目,它提供了一种基于Hadoop的数据仓库工具,能够将结构化的数据文件映射为一张数据库表,并提供SQL(HQL)查询功能,方便大数据的分析处理。Hive-1.1.0-cdh5.9.3是Cloudera ...

Global site tag (gtag.js) - Google Analytics