`
lookqlp
  • 浏览: 345771 次
  • 性别: Icon_minigender_1
  • 来自: 上海
社区版块
存档分类
最新评论

hive left outer join where 条件问题

    博客分类:
  • hive
阅读更多
select count(1) from s_ods_trade where part ='2012-10-31';     
22076
select count(1) from s_ods_trade
104343

select count(1) from s_ods_trade_full where part ='2012-10-31';    
11456
select count(1) from s_ods_trade_full
53049

SELECT count(1) FROM s_ods_trade a left outer JOIN s_ods_trade_full b ON (a.dp_id = b.dp_id AND a.tid = b.tid and a.part='2012-10-31' and b.part='2012-10-31');
104343

SELECT count(1) FROM s_ods_trade a left outer JOIN s_ods_trade_full b ON (a.dp_id = b.dp_id AND a.tid = b.tid and a.part=b.part and a.part='2012-10-31');   
104343

SELECT count(1) FROM s_ods_trade a left outer JOIN s_ods_trade_full b ON (a.dp_id = b.dp_id AND a.tid = b.tid ) where  a.part='2012-10-31' and b.part='2012-10-31';
11456
SELECT count(1) FROM s_ods_trade a left outer JOIN s_ods_trade_full b ON (a.dp_id = b.dp_id AND a.tid = b.tid and a.part=b.part) where  a.part='2012-10-31'; 
22076

倒数第二个第三个sql很是不解。
最后一个sql效果相同语句,效率会高一些:
SELECT count(1) FROM (select * from s_ods_trade where part ='2012-10-31') a left outer JOIN (select * from s_ods_trade_full where part ='2012-10-31' ) b ON (a.dp_id = b.dp_id AND a.tid = b.tid); 

官网解释https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Joins#:
Joins occur BEFORE WHERE CLAUSES. So, if you want to restrict the OUTPUT of a join, a requirement should be in the WHERE clause, otherwise it should be in the JOIN clause. A big point of confusion for this issue is partitioned tables:
  SELECT a.val, b.val FROM a LEFT OUTER JOIN b ON (a.key=b.key)
  WHERE a.ds='2009-07-07' AND b.ds='2009-07-07'
will join a on b, producing a list of a.val and b.val. The WHERE clause, however, can also reference other columns of a and b that are in the output of the join, and then filter them out. However, whenever a row from the JOIN has found a key for a and no key for b, all of the columns of b will be NULL, including the ds column. This is to say, you will filter out all rows of join output for which there was no valid b.key, and thus you have outsmarted your LEFT OUTER requirement. In other words, the LEFT OUTER part of the join is irrelevant if you reference any column of b in the WHERE clause. Instead, when OUTER JOINing, use this syntax:
  SELECT a.val, b.val FROM a LEFT OUTER JOIN b
  ON (a.key=b.key AND b.ds='2009-07-07' AND a.ds='2009-07-07')
..the result is that the output of the join is pre-filtered, and you won't get post-filtering trouble for rows that have a valid a.key but no matching b.key. The same logic applies to RIGHT and FULL joins.


  • 大小: 45 KB
分享到:
评论

相关推荐

    hive优化(ppt)

    在Hive中,通过使用`LEFT SEMI JOIN`而非`LEFT OUTER JOIN`,可以在不包含NULL值的情况下快速找到匹配项,从而减少数据传输和处理的时间,特别是在处理大规模数据集时效果显著。 ### 存储格式和压缩 存储格式和...

    Hive SQL性能优化

    - 非下推:`SELECT ename, dept_name FROM E LEFT OUTER JOIN D ON E.dept_id = D.dept_id WHERE E.eid = 'HZ001'` 3. **使用SORT BY替代ORDER BY**: - **全局排序**:ORDER BY会导致所有数据进入同一个Reduce...

    SparkSQL相关语句总结.docx

    SparkSQL 支持 Inner Join、Left Outer Join、Right Outer Join 和 Full Outer Join 等多种 Join 语句。Left Outer Join 用于以左表驱动,右表不存在的 key 均赋值为 null;Right Outer Join 用于以右表驱动,左表不...

    工作总结hive优化

    Left outer join r_auction_auctions b On a.auction_id = cast(b.auction_id as string); ``` **案例3:利用Hive对UNION ALL的优化** Hive对`UNION ALL`操作进行了优化,但对于嵌套查询则不一定有效。因此,在...

    Hive教程.pdf

    - Hive支持内连接(INNER JOIN)、左连接(LEFT JOIN)、右连接(RIGHT JOIN)以及全连接(FULL OUTER JOIN),但在某些情况下可能需要特殊处理以避免数据倾斜。 - **Inner Join**: - `SELECT t1.*, t2.* FROM ...

    hive操作指南

    - Hive支持多种类型的JOIN操作,包括Inner Join、Left Outer Join、Right Outer Join等。 - 示例:`SELECT e.name, d.department_name FROM employees e JOIN departments d ON e.department_id = d.id;` #### 五、...

    Hive语法详解.docx (排版清晰,覆盖全面,含目录)

    - **LEFT OUTER JOIN**:返回左表所有行和右表匹配的行。 - **RIGHT OUTER JOIN**:返回右表所有行和左表匹配的行。 - **FULL OUTER JOIN**:返回两个表所有行。 **4.4 groupby** - **分组统计**:`SELECT column1...

    经典SQL查询总结关于Exists,not Exists.in ,not in效率的说明。

    除了上述几种查询方式之外,SQL 还提供了不同的连接类型来处理不同情况下的数据关联需求,包括左连接(Left Outer Join)、右连接(Right Outer Join)以及全连接(Full Outer Join)等。 1. **左连接**(Left ...

    Spark DataFrame详解.zip

    DataFrame支持多种类型的join,如inner join、outer join、left join和right join。通过`join()`方法,我们可以将两个DataFrame按照共同的列进行合并。例如,`df1.join(df2, "column_name", "join_type")`表示根据...

    《sql入门使用与高级使用篇》

    1. **联接(JOIN)**:通过INNER JOIN, LEFT JOIN, RIGHT JOIN, FULL OUTER JOIN等连接多个表,合并相关数据。 2. **子查询**:在查询中嵌套查询,解决更复杂的逻辑问题。 3. **视图(VIEW)**:创建虚拟表,方便...

    SQL

    常见的JOIN类型有INNER JOIN、LEFT JOIN、RIGHT JOIN和FULL OUTER JOIN。 ```sql SELECT * FROM table1 INNER JOIN table2 ON table1.common_column = table2.common_column; ``` 9. 子查询:子查询是在一个查询...

Global site tag (gtag.js) - Google Analytics