`
k_lb
  • 浏览: 834060 次
  • 性别: Icon_minigender_1
  • 来自: 郑州
社区版块
存档分类
最新评论
  • kitleer: 据我所知,国内有款ETL调度监控工具TaskCTL,支持ket ...
    kettle调度

HIVE Queryes 查询

 
阅读更多

SELECT … FROM Clauses


hive> SELECT name, salary FROM employees;


表别名

hive> SELECT   name,   salary FROM employees;
hive> SELECT e.name, e.salary FROM employees e;


Specify Columns with Regular Expressions


hive> SELECT symbol, `price.*` FROM stocks;
AAPL    195.69  197.88  194.0   194.12  194.12
AAPL    192.63  196.0   190.85  195.46  195.46
AAPL    196.73  198.37  191.57  192.05  192.05
AAPL    195.17  200.2   194.42  199.23  199.23
AAPL    195.91  196.32  193.38  195.86  195.86

Computing with Column Values

hive> SELECT upper(name), salary, deductions["Federal Taxes"],
    > round(salary * (1 - deductions["Federal Taxes"])) FROM employees;

JOHN DOE 100000.0 0.2 80000
MARY SMITH 80000.0 0.2 64000
TODD JONES 70000.0 0.15 59500
BILL KING 60000.0 0.15 51000

Arithmetic Operators --算数运算



Mathematical functions






Aggregate functions -- 集合函数





hive> SET hive.map.aggr=true;
hive> SELECT count(*), avg(salary) FROM employees;

Table generating functions


hive> SELECT explode(subordinates) AS sub FROM employees;
Mary Smith
Todd Jones
Bill King



Other built-in functions




LIMIT Clause --限制行数


hive> SELECT upper(name), salary, deductions["Federal Taxes"],
    > round(salary * (1 - deductions["Federal Taxes"])) FROM employees
    > LIMIT 2;
JOHN DOE    100000.0  0.2   80000
MARY SMITH   80000.0  0.2   64000

Column Aliases --列别名


hive> SELECT upper(name), salary, deductions["Federal Taxes"] as fed_taxes,
    > round(salary * (1 - deductions["Federal Taxes"])) as salary_minus_fed_taxes
    > FROM employees LIMIT 2;
JOHN DOE    100000.0  0.2   80000
MARY SMITH   80000.0  0.2   64000


Nested SELECT Statements --子查询


hive> FROM (
    >   SELECT upper(name), salary, deductions["Federal Taxes"] as fed_taxes,
    >   round(salary * (1 - deductions["Federal Taxes"])) as salary_minus_fed_taxes
    >   FROM employees
    > ) e
    > SELECT e.name, e.salary_minus_fed_taxes
    > WHERE e.salary_minus_fed_taxes > 70000;
JOHN DOE    100000.0  0.2   80000

CASE … WHEN … THEN Statements --case 关键字


hive> SELECT name, salary,
    >   CASE
    >     WHEN salary <  50000.0 THEN 'low'
   > WHEN salary >= 50000.0 AND salary < 70000.0 THEN 'middle'
 > WHEN salary >= 70000.0 AND salary < 100000.0 THEN 'high'
 > ELSE 'very high'
 > END AS bracket FROM employees;
John Doe 100000.0 very high
Mary Smith 80000.0 high
Todd Jones 70000.0 high
Bill King 60000.0 middle
Boss Man 200000.0 very high
Fred Finance 150000.0 very high
Stacy Accountant 60000.0 middle

WHERE Clauses -- 筛选


SELECT * FROM employees
WHERE country = 'US' AND state = 'CA';

hive> SELECT name, salary, deductions["Federal Taxes"],
    >   salary * (1 - deductions["Federal Taxes"])
    > FROM employees
    > WHERE round(salary * (1 - deductions["Federal Taxes"])) > 70000;
John Doe    100000.0  0.2   80000.0

hive>  SELECT name, salary, deductions["Federal Taxes"],
    >    salary * (1 - deductions["Federal Taxes"]) as salary_minus_fed_taxes
    >  FROM employees
    >  WHERE round(salary_minus_fed_taxes) > 70000;
FAILED: Error in semantic analysis: Line 4:13 Invalid table alias or
column reference 'salary_minus_fed_taxes': (possible column names are:
name, salary, subordinates, deductions, address)

hive> SELECT e.* FROM
    > (SELECT name, salary, deductions["Federal Taxes"] as ded,
    >    salary * (1 - deductions["Federal Taxes"]) as salary_minus_fed_taxes
    >  FROM employees) e
    > WHERE round(e.salary_minus_fed_taxes) > 70000;
John Doe        100000.0        0.2     80000.0
Boss Man        200000.0        0.3     140000.0
Fred Finance    150000.0        0.3     105000.0


Predicate Operators


LIKE and RLIKE


hive> SELECT name, address.street FROM employees WHERE address.street LIKE '%Ave.';
John Doe        1 Michigan Ave.
Todd Jones      200 Chicago Ave.
hive> SELECT name, address.city FROM employees WHERE address.city LIKE 'O%';
Todd Jones      Oak Park
Bill King       Obscuria
hive> SELECT name, address.street FROM employees WHERE address.street LIKE '%Chi%';
Todd Jones      200 Chicago Ave.

hive> SELECT name, address.street
    > FROM employees WHERE address.street RLIKE '.*(Chicago|Ontario).*';
Mary Smith      100 Ontario St.
Todd Jones      200 Chicago Ave.

SELECT name, address FROM employees
WHERE address.street LIKE '%Chicago%' OR address.street LIKE '%Ontario%';


GROUP BY Clauses


hive> SELECT year(ymd), avg(price_close) FROM stocks
    > WHERE exchange = 'NASDAQ' AND symbol = 'AAPL'
    > GROUP BY year(ymd);
1984    25.578625440597534
1985    20.193676221040867
1986    32.46102808021274
1987    53.88968399108163
1988    41.540079275138766
1989    41.65976212516664
1990    37.56268799823263
1991    52.49553383386182
1992    54.80338610251119
1993    41.02671956450572
1994    34.0813495847914

HAVING Clauses


hive> SELECT year(ymd), avg(price_close) FROM stocks
    > WHERE exchange = 'NASDAQ' AND symbol = 'AAPL'
    > GROUP BY year(ymd)
        > HAVING avg(price_close) > 50.0;
1987    53.88968399108163
1991    52.49553383386182
1992    54.80338610251119
1999    57.77071460844979
2000    71.74892876261757
2005    52.401745992993554

Inner JOIN

hive> SELECT a.ymd, a.price_close, b.price_close
    > FROM stocks a JOIN stocks b ON a.ymd = b.ymd
    > WHERE a.symbol = 'AAPL' AND b.symbol = 'IBM';
2010-01-04      214.01  132.45
2010-01-05      214.38  130.85
2010-01-06      210.97  130.0
2010-01-07      210.58  129.55
2010-01-08      211.98  130.85
2010-01-11      210.11  129.48


Example 6-1. Query that will not work in Hive

SELECT a.ymd, a.price_close, b.price_close
FROM stocks a JOIN stocks b
ON a.ymd <= b.ymd
WHERE a.symbol = 'AAPL' AND b.symbol = 'IBM';

LEFT OUTER JOIN


hive> SELECT s.ymd, s.symbol, s.price_close, d.dividend
    > FROM stocks s LEFT OUTER JOIN dividends d ON s.ymd = d.ymd AND s.symbol = d.symbol
    > WHERE s.symbol = 'AAPL';
...
1987-05-01      AAPL    80.0    NULL
1987-05-04      AAPL    79.75   NULL
1987-05-05      AAPL    80.25   NULL
1987-05-06      AAPL    80.0    NULL
1987-05-07      AAPL    80.25   NULL
1987-05-08      AAPL    79.0    NULL
1987-05-11      AAPL    77.0    0.015
1987-05-12      AAPL    75.5    NULL
1987-05-13      AAPL    78.5    NULL
1987-05-14      AAPL    79.25   NULL
1987-05-15      AAPL    78.25   NULL
1987-05-18      AAPL    75.75   NULL
1987-05-19      AAPL    73.25   NULL
1987-05-20      AAPL    74.5    NULL
...

OUTER JOIN Gotcha


hive> SELECT s.ymd, s.symbol, s.price_close, d.dividend
    > FROM stocks s LEFT OUTER JOIN dividends d ON s.ymd = d.ymd AND s.symbol = d.symbol
    > WHERE s.symbol = 'AAPL'
    > AND s.exchange = 'NASDAQ' AND d.exchange = 'NASDAQ';
1987-05-11      AAPL    77.0    0.015
1987-08-10      AAPL    48.25   0.015
1987-11-17      AAPL    35.0    0.02
1988-02-12      AAPL    41.0    0.02
1988-05-16      AAPL    41.25   0.02
...

hive> SELECT s.ymd, s.symbol, s.price_close, d.dividend FROM
    > (SELECT * FROM stocks WHERE symbol = 'AAPL' AND exchange = 'NASDAQ') s
    > LEFT OUTER JOIN
    > (SELECT * FROM dividends WHERE symbol = 'AAPL' AND exchange = 'NASDAQ') d
    > ON s.ymd = d.ymd;
...
1988-02-10      AAPL    41.0    NULL
1988-02-11      AAPL    40.63   NULL
1988-02-12      AAPL    41.0    0.02
1988-02-16      AAPL    41.25   NULL
1988-02-17      AAPL    41.88   NULL

RIGHT OUTER JOIN


hive> SELECT s.ymd, s.symbol, s.price_close, d.dividend
    > FROM dividends d RIGHT OUTER JOIN stocks s ON d.ymd = s.ymd AND d.symbol = s.symbol
    > WHERE s.symbol = 'AAPL';
...
1987-05-07      AAPL    80.25   NULL
1987-05-08      AAPL    79.0    NULL
1987-05-11      AAPL    77.0    0.015
1987-05-12      AAPL    75.5    NULL
1987-05-13      AAPL    78.5    NULL


FULL OUTER JOIN


hive> SELECT s.ymd, s.symbol, s.price_close, d.dividend
    > FROM dividends d FULL OUTER JOIN stocks s ON d.ymd = s.ymd AND d.symbol = s.symbol
    > WHERE s.symbol = 'AAPL';
...
1987-05-07      AAPL    80.25   NULL
1987-05-08      AAPL    79.0    NULL
1987-05-11      AAPL    77.0    0.015
1987-05-12      AAPL    75.5    NULL
1987-05-13      AAPL    78.5    NULL
...


LEFT SEMI-JOIN


Example 6-2. Query that will not work in Hive

SELECT s.ymd, s.symbol, s.price_close FROM stocks s
WHERE s.ymd, s.symbol IN
(SELECT d.ymd, d.symbol FROM dividends d);

Instead, you use the following LEFT SEMI JOIN syntax:

hive> SELECT s.ymd, s.symbol, s.price_close
    > FROM stocks s LEFT SEMI JOIN dividends d ON s.ymd = d.ymd AND s.symbol = d.symbol;
...
1962-11-05      IBM     361.5
1962-08-07      IBM     373.25
1962-05-08      IBM     459.5
1962-02-06      IBM     551.5

ORDER BY and SORT BY


Here is an example using ORDER BY:

SELECT s.ymd, s.symbol, s.price_close
FROM stocks s
ORDER BY s.ymd ASC, s.symbol DESC;

Here is the same example using SORT BY instead:

SELECT s.ymd, s.symbol, s.price_cl
FROM stocks s
SORT BY s.ymd ASC, s.symbol DESC;

Casting

SELECT name, salary FROM employees
WHERE cast(salary AS FLOAT) < 100000.0;

Queries that Sample Data -- 抽样


hive> SELECT * from numbers TABLESAMPLE(BUCKET 3 OUT OF 10 ON rand()) s;
2
4
hive> SELECT * from numbers TABLESAMPLE(BUCKET 3 OUT OF 10 ON rand()) s;
7
10
hive> SELECT * from numbers TABLESAMPLE(BUCKET 3 OUT OF 10 ON rand()) s;

hive> SELECT * from numbers TABLESAMPLE(BUCKET 3 OUT OF 10 ON number) s;
2
hive> SELECT * from numbers TABLESAMPLE(BUCKET 5 OUT OF 10 ON number) s;
4
hive> SELECT * from numbers TABLESAMPLE(BUCKET 3 OUT OF 10 ON number) s;
2

hive> SELECT * from numbers TABLESAMPLE(BUCKET 1 OUT OF 2 ON number) s;
2
4
6
8
10
hive> SELECT * from numbers TABLESAMPLE(BUCKET 2 OUT OF 2 ON number) s;
1
3
5
7
9


UNION ALL


SELECT log.ymd, log.level, log.message
  FROM (
    SELECT l1.ymd, l1.level,
      l1.message, 'Log1' AS source
    FROM log1 l1
  UNION ALL
    SELECT l2.ymd, l2.level,
      l2.message, 'Log2' AS source
    FROM log1 l2
  ) log
SORT BY log.ymd ASC;

FROM (
  FROM src SELECT src.key, src.value WHERE src.key < 100
  UNION ALL
  FROM src SELECT src.* WHERE src.key > 110
) unioninput
INSERT OVERWRITE DIRECTORY '/tmp/union.out' SELECT unioninput.*


分享到:
评论

相关推荐

    部分普通sql查询在hive中的实现方式

    ### 部分普通SQL查询在Hive中的实现方式 Hive是一款基于Hadoop的数据仓库工具,能够对存储在Hadoop文件系统中的数据集进行数据提取、转换、加载(ETL),这是一种可以简化MapReduce编程的工具。由于Hive的设计初衷...

    java_hive简单查询

    在Java开发中,有时我们需要与大数据处理平台如Hive进行交互,执行SQL查询来获取或处理数据。本教程将深入探讨如何使用Java API与Hive进行集成,实现简单的查询操作。我们将围绕以下知识点展开讨论: 1. **Hive简介...

    大数据学习:Hive数据查询语言.pdf

    Hive数据查询语言 Hive数据查询语言是用于从数据库中查询数据的计算机语言。在Hive中,我们可以通过HiveQL语言查询数据,查询数据的结果会存储在结果集中。 SELECT 语句分析 SELECT 语句是 Hive 数据查询语言中最...

    龙战于野大数据MR原理启动hive查询表分区.docx

    Hive查询表分区的MR原理启动详解 Hive是一款基于Hadoop的数据仓库工具,主要用于处理结构化和半结构化的数据。MR(MapReduce)是Hadoop中的一种编程模型,用于处理大规模数据。在Hive中,MR原理启动是指使用...

    hive数据加载导出查询

    最后,我们来看看如何对 Hive 表进行简单的查询操作。 ```sql SELECT * FROM db_0309.emp_nopart; ``` 运行此命令后,可以看到表 `db_0309.emp_nopart` 中的数据。 ```sql OK 7369 SMITH CLERK 7 ``` 这里只展示...

    Hive中查询操作

    在Hive中,查询操作是数据处理的核心,它允许用户从大数据存储中提取所需的信息。以下将详细解析Hive查询语法及其基本查询方法。 一、Hive查询语法 Hive的查询语句遵循标准SQL的基本结构,但也有一些Hive特有的扩展...

    hive基本操作

    hive 基本操作的命令和hive 实用命令等,有利与学习hive和mysql

    Hive大数据查询技术.doc

    Hive 大数据查询技术 Hive 是一个基于 Hadoop 的数据仓库平台,通过 Hive,可以方便地进行数据提取转化加载(ETL)的工作。Hive 定义了一个类似于 SQL 的查询语言 HQL,能够将用户编写的 SQL 转化为相应的 ...

    spark或mr引擎插入的数据,hive表查询数据为0

    ### Spark或MR引擎插入的数据,Hive表查询数据为0的问题解析 #### 问题背景与现象 在大数据处理场景中,经常会遇到使用不同执行引擎(如Spark、MapReduce (MR) 或 Tez)进行数据处理的情况。其中一种常见的问题是...

    01.hive查询语法--基本查询--条件查询--关联查询.mp4

    01.hive查询语法--基本查询--条件查询--关联查询.mp4

    hive查询优化

    ### Hive查询优化详解 #### 一、Hive基础与架构 **Hive**作为Hadoop生态中的重要组成部分,被广泛应用于大数据分析领域。它通过提供类SQL语言(HiveQL)来简化对Hadoop分布式文件系统(HDFS)中存储的大规模数据集...

    Hive查询优化整理与Hive简易版思维导图

    本文将深入探讨Hive查询优化的一些关键点,并结合个人实践经验和整理的Hive简易版思维导图,帮助你更好地理解和运用Hive。 一、Hive查询优化基础 1. **表分区**:分区是Hive提高查询效率的重要手段。通过将大表按...

    Hive总结.docx

    总的来说,Hive是一个强大的大数据分析工具,它简化了在Hadoop上的数据查询和分析过程,为大数据分析提供了便利。通过理解其原理、掌握SQL语法、优化技巧和解决数据倾斜问题,可以在大数据环境中高效地进行数据分析...

    HIVE安装及详解

    "HIVE安装及详解" HIVE是一种基于Hadoop的数据...* 使用HIVE命令行工具执行查询 * 使用HIVE API开发应用程序 * 使用HIVE与其他工具集成 HIVE是一种功能强大且灵活的数据仓库工具,广泛应用于大数据分析和处理领域。

    Hive 基本命令操作1

    在Hive中,分区表是一种优化数据查询的方法,它将大表的数据按照特定的逻辑划分成多个小的、独立的部分,每个部分称为一个分区。通过分区,可以减少查询时需要扫描的数据量,从而提高查询性能。创建分区表的基本语法...

    hive客户端安装_hive客户端安装_hive_

    在大数据处理领域,Hive是一个非常重要的工具,它提供了一个基于Hadoop的数据仓库基础设施,用于数据查询、分析和管理大规模数据集。本教程将详细讲解如何在Linux环境下安装Hive客户端,以便进行数据操作和分析。 ...

    Hive数据查询详解.md

    Hive数据查询详解,基础篇

    Hive 对 Protobuf 序列化文件读取.zip

    在大数据处理领域,Apache Hive 是一个非常重要的工具,它提供了一个SQL-like的接口来查询、管理和分析存储在分布式存储系统(如Hadoop)中的大规模数据集。本篇将重点讲解如何利用Hive对Protobuf序列化的文件进行...

    hive2.1.1中orc格式读取报数组越界错误解决方法

    它提供了压缩、索引和列式存储等特性,能够极大地提高查询性能。然而,有时候在使用ORC格式读取数据时,可能会遇到“数组越界”错误,这通常是由于软件bug或者不兼容性导致的。 “数组越界”错误是Java编程语言中...

    基于Hadoop+Hive的数据查询优化设计与实现

    "基于Hadoop+Hive的数据查询优化设计与实现" 本文旨在介绍基于Hadoop和Hive的数据查询优化设计与实现,旨在解决大数据处理的需求。该文涵盖了Hadoop和Hive的工作原理、系统设计与实现、数据查询优化等方面的知识点...

Global site tag (gtag.js) - Google Analytics