nested loop驱动行源的选择

a3mao

浏览: 564721 次
性别:
来自: 北京

最近访客更多访客>>

yuanbincs

yangsongqq

zyh11

miwu55

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

Work

SQL Access Oracle

一般情况下，nested loop驱动行源应该选择小表，或者行数少的行源，但有时候也不一定。我们来看一个使用大表做驱动行源的例子。

1. 创建测试环境

大表test_big，小表test_small，二者通过test_big.object_id=test_small.object_id来关联。其中大表的object_id中会有重复记录，而小表中的object_id是唯一的，两个列上都有索引。

C:\Documents and Settings\yuechao.tianyc>sqlplus test/test

SQL*Plus: Release 10.2.0.1.0 - Production on 星期日 3月 29 22:37:13 2009

Copyright (c) 1982, 2005, Oracle. All rights reserved.

连接到:
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Production
With the Partitioning, OLAP and Data Mining options

SQL> create table test_big as select * from dba_objects;

表已创建。

SQL> insert into test_big select * from test_big;

已创建50000行。

SQL> /

已创建100000行。

...

SQL> /

已创建6400000行。

SQL> commit;

提交完成。

SQL> create table test_small as select * from dba_objects where rownum < 5001;

表已创建。

SQL> create index ind_test_big on test_big(object_id);

索引已创建。

SQL> create index ind_test_small on test_small(object_id);

索引已创建。

2. 获取执行计划

分别使用大表和小表作为驱动行源，先看一下它们的执行计划：

-- 1. 使用test_big作为驱动行源
-- 执行步骤：（1）全表扫描test_big，得到行源A；（2）将A作为驱动行源，嵌套循环连接索引ind_test_small，得到行源B；（3）将行源B通过索引ind_test_small与表test_small关联，得到结果集。
SQL> explain plan for
2 select/*+ordered use_nl(test_big,test_small)*/ count(test_big.object_name||test_small.object_name)
3   from test_big, test_small where test_big.object_id = test_small.object_id;

已解释。

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
Plan hash value: 3591390207

----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                |     1 |   158 |    15M (1)| 53:19:22 |
|   1 | SORT AGGREGATE              |                |     1 |   158 |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID| TEST_SMALL     |     1 |    79 |     2   (0)| 00:00:01 |
|   3 |    NESTED LOOPS              |                | 1247K|   187M|    15M (1)| 53:19:22 |
|   4 |     TABLE ACCESS FULL        | TEST_BIG       |    14M| 1106M| 39134   (2)| 00:07:50 |
|* 5 |     INDEX RANGE SCAN         | IND_TEST_SMALL |     1 |       |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   5 - access("test_big"."OBJECT_ID"="test_small"."OBJECT_ID")

Note
-----
   - dynamic sampling used for this statement

已选择21行。

-- 2. 使用test_small作为驱动行源
-- 执行步骤：（1）全表扫描test_small，得到行源A；（2）将A作为驱动行源，嵌套循环连接索引ind_test_big，得到行源B；（3）将行源B通过索引ind_test_big与表test_big关联，得到结果集。
SQL> explain plan for
2 select/*+ordered use_nl(test_big,test_small)*/ count(test_big.object_name||test_small.object_name)
3   from test_small, test_big where test_big.object_id = test_small.object_id;

已解释。

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
Plan hash value: 1952886871

----------------------------------------------------------------------------------
| Id | Operation                    | Name      | Rows | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |              |     1 |   158 | 1098K (1)| 03:39:42 |
|   1 | SORT AGGREGATE              |              |     1 |   158 |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID| TEST_BIG     |   257 | 20303 |   226   (0)| 00:00:03 |
|   3 |    NESTED LOOPS              |              | 1247K|   187M| 1098K (1)| 03:39:42 |
|   4 |     TABLE ACCESS FULL        | TEST_SMALL   | 4854 |   374K|    17   (0)| 00:00:01 |
|* 5 |     INDEX RANGE SCAN         | IND_TEST_BIG |   257 |       |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   5 - access("test_big"."OBJECT_ID"="test_small"."OBJECT_ID")

Note
-----
   - dynamic sampling used for this statement

已选择21行。

3. 比较执行时间

比较上面两个不同的执行计划。看起来使用表test_small作为驱动行源更合理一些，它只有50001行，而表test_big有12800000行数据。而实际却不是这样，我们看一下他们各自的执行时间：

SQL> set timing on
-- 使用大表作为驱动行源
SQL> select/*+ordered use_nl(test_big,test_small)*/ count(test_big.object_name||test_small.object_name)
2 from test_big, test_small where test_big.object_id = test_small.object_id;

COUNT(TEST1.OBJECT_NAME||TEST2.OBJECT_NAME)
-------------------------------------------
                                   12800000

已用时间: 00: 00: 36.70

-- 使用小表作为驱动行源
SQL> select/*+ordered use_nl(test1,test2)*/ count(test1.object_name||test2.object_name)
2 from test2, test1 where test1.object_id = test2.object_id;

COUNT(TEST1.OBJECT_NAME||TEST2.OBJECT_NAME)
-------------------------------------------
                                   12800000

已用时间: 00: 02: 42.89

我们看到，使用大表作为驱动行源，耗时约36.7秒；而使用小表作为驱动行源，耗时达到2分42.89秒！

4. 原因

其实原因就在与大表的列object_id中有很多重复记录，当使用小表作为驱动行源时，全表扫描test_small的时间虽然很快，但对于test_small中的每一条记录，都对应test_big中的若干条记录，那么就需要根据查到的这些rowid，进行若干次的磁盘I/O来获得大表对应的数据。这里频繁的磁盘I/O就是问题的原因。因为test_big数据量很大，数据不可能存储在内存中。

而当使用大表作为驱动行源时，全表扫描test_big的时间虽然比较长，但每条记录都对应test_small中的一条记录，而且表test_small比较小，其数据及索引数据可以从内存中直接找到。这样耗费的时间就主要是全表扫描test_big的时间了，而这耗费不了太多的时间。在下面的测试中，只耗费了26.14秒：

SQL> select count(*) from test_big;

COUNT(*)
----------
12800000

已用时间: 00: 00: 26.14

SQL> set timing off
SQL> explain plan for
2 select count(*) from test_big;

已解释。

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------
--------------------------------------------------------------------------
Plan hash value: 3224830981

-----------------------------------------------------------------------
| Id | Operation          | Name     | Rows | Cost (%CPU)| Time     |
-----------------------------------------------------------------------
|   0 | SELECT STATEMENT   |          |     1 | 38982   (2)| 00:07:48 |
|   1 | SORT AGGREGATE    |          |     1 |            |          |
|   2 |   TABLE ACCESS FULL| TEST_BIG |    14M| 38982   (2)| 00:07:48 |
-----------------------------------------------------------------------

Note
-----
   - dynamic sampling used for this statement

已选择13行。

5. 备注

这里只是一个小测试，来说明nested loop中，驱动行源的选择要根据实际情况而定，不是一成不变的。其实这个SQL使用hash join是最快的：

SQL> explain plan for
2 select count(test_big.object_name||test_small.object_name)
3 from test_big, test_small where test_big.object_id = test_small.object_id;

已解释。

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
Plan hash value: 1810242240

----------------------------------------------------------------------------------
| Id | Operation           | Name       | Rows | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |            |     1 |   158 | 39404   (3)| 00:07:53 |
|   1 | SORT AGGREGATE     |            |     1 |   158 |            |          |
|* 2 |   HASH JOIN         |            | 1247K|   187M| 39404   (3)| 00:07:53 |
|   3 |    TABLE ACCESS FULL| TEST_SMALL | 4854 |   374K|    17   (0)| 00:00:01 |
|   4 |    TABLE ACCESS FULL| TEST_BIG   |    14M| 1106M| 39134   (2)| 00:07:50 |
----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("TEST_BIG"."OBJECT_ID"="TEST_SMALL"."OBJECT_ID")

Note
-----
   - dynamic sampling used for this statement

已选择20行。

SQL> set timing on
SQL> select count(test_big.object_name||test_small.object_name)
2 from test_big, test_small where test_big.object_id = test_small.object_id;

COUNT(TEST_BIG.OBJECT_NAME||TEST_SMALL.OBJECT_NAME)
---------------------------------------------------
                                            1280000

已用时间: 00: 00: 21.42

分享到：

通过分析SQL语句的执行计划优化SQL语句 | [转载]Effective SQL

2009-09-08 22:21
浏览 1102
评论(0)
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论