double trouble

czmmiao

浏览: 4431799 次
性别:
来自: 厦门

最近访客更多访客>>

amwfngt

zzbing

sky3063

hotsunshine

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

double trouble

Filed under: Execution plans,Performance,Tuning — Jonathan Lewis @ 7:06 pm BST May 18,2010

In the latest Quiz Night, I asked how you could make a query more efficient by changing a two table join into a three table join – with the clue that my third table was a repeat of the first table. Gary Myers, in comment 4, provided the type of answer I was looking for. Sometimes it is more efficient to get a small amount of data from a table on a first pass then go back and get the rest of the data on a second pass – especially if the first pass is an ‘index only’ operation.

I’ve created a little demonstration that gives you some idea of the approach:

create table t1
as
with generator as (

    select  --+ materialize

        rownum id

    from dual

    connect by

        rownum <= 10000
)
select

    rownum                  id,

    mod(rownum,100)             mod1,

    trunc(dbms_random.value(0,10000))   random1,

    lpad(rownum,10,'0')         small_vc,

    rpad('x',60)                padding
from

    generator   v1,

    generator   v2
where

    rownum <= 100000
;
 
create table t2
as
with generator as (

    select  --+ materialize

        rownum id

    from dual

    connect by

        rownum <= 10000
)
select

    rownum                  id,

    mod(rownum,100)             mod2,

    trunc(dbms_random.value(0,10000))   random2,

    lpad(rownum,10,'0')         small_vc,

    rpad('x',60)                padding
from

    generator   v1,

    generator   v2
where

    rownum <= 100000
;
 
create index t1_i1 on t1(mod1, random1);
create index t2_i1 on t2(mod2, random2);

This creates two tables of 100,000 (fairly short) rows. Note the mod columns which return 1,000 rows per value, and the random columns which return approximately 10 rows per value. When I give Oracle the following query, it overestimates the final result set and chooses what I know to be a relatively resource-intensive execution plan:

select

    t1.padding,

    t2.padding
from

    t1, t2
where

    t1.mod1 = 50
and t2.random2 = t1.random1
and t2.mod2 = 50
;
 
-----------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost  |
-----------------------------------------------------------
|   0 | SELECT STATEMENT   |      |  1045 |   138K|   388 |
|*  1 |  HASH JOIN         |      |  1045 |   138K|   388 |
|*  2 |   TABLE ACCESS FULL| T1   |  1000 | 68000 |   193 |
|*  3 |   TABLE ACCESS FULL| T2   |  1000 | 68000 |   193 |
-----------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("T2"."RANDOM2"="T1"."RANDOM1")

   2 - filter("T1"."MOD1"=50)

   3 - filter("T2"."MOD2"=50)

This plan is dictated largely by the fact that I have to collect quite a lot of data from both tables then eliminate a large fraction of the data I have collected. This pattern is the driver for what I am about to do: I know that I want a small volume of data eventually but if I have to go to the table at every step of the plan then I will have to do a lot of redundant work and carry a lot of redundant data at some point. Remember – it’s often the case that “visiting the table” is the expensive part of any query.

select

    /*+

        leading(t1 t2 t3)

        use_nl(t3)

        rowid(t3)

    */

    t3.padding,

    t2.padding
from

    t1,

    t2,

    t1  t3
where

    t1.mod1 = 50
and t2.random2 = t1.random1
and t2.mod2 = 50
and t3.rowid = t1.rowid
;
 
---------------------------------------------------------------------
| Id  | Operation                   | Name  | Rows  | Bytes | Cost  |
---------------------------------------------------------------------
|   0 | SELECT STATEMENT            |       |  1045 |   163K|  1244 |
|   1 |  NESTED LOOPS               |       |  1045 |   163K|  1244 |
|*  2 |   HASH JOIN                 |       |  1045 | 90915 |   199 |
|*  3 |    INDEX RANGE SCAN         | T1_I1 |  1000 | 19000 |     4 |
|*  4 |    TABLE ACCESS FULL        | T2    |  1000 | 68000 |   193 |
|   5 |   TABLE ACCESS BY USER ROWID| T1    |     1 |    73 |     1 |
---------------------------------------------------------------------
 
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------

   1 - SEL$1

   3 - SEL$1 / T1@SEL$1

   4 - SEL$1 / T2@SEL$1

   5 - SEL$1 / T3@SEL$1
 
Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("T2"."RANDOM2"="T1"."RANDOM1")

   3 - access("T1"."MOD1"=50)

   4 - filter("T2"."MOD2"=50)

I don’t think the optimizer can generate a plan like this at present – but I may be wrong. I’ve reduced my workload by taking advantage of an existing index on table t1 to do a range scan that picks up only the columns that I need to join to t2. In this case the t2 access path was still a full tablescan – but even so I have reduced the workload against t1, and by the time I revisit it by rowid I will only be visiting the (relatively) small number of rows I really need.

(Left as an exercise to the reader: I could have written the query as a four part join, visiting both table segments by rowid for just those rows that I really needed; have a go, and check that you’ve got it right. Don’t forget that any references to the “non-index” columns that appear in the query have to be changed to reference the second occurrence of the table – note how I’ve changed t1.padding in my original query to t3.padding in the rewrite.)

Footnote:
If you think this type of path is a little odd take a look at the typical stucture of a nested loop join that appears under “nlj_batching” in 11g (this isnt the same t1 and t2 as above, by the way):

select

        /*+ ordered use_nl(t1) index(t1(n1)) */

        t2.n1, t1.n2
from    t2,t1
where

        t2.n2 = 45
and     t1.n1 = t2.n1;
 
------------------------------------------------------
| Id  | Operation                    | Name  | Rows  |
------------------------------------------------------
|   0 | SELECT STATEMENT             |       |   225 |
|   1 |  NESTED LOOPS                |       |       |
|   2 |   NESTED LOOPS               |       |   225 |
|*  3 |    TABLE ACCESS FULL         | T2    |    15 |
|*  4 |    INDEX RANGE SCAN          | T1_I1 |    15 |
|   5 |   TABLE ACCESS BY INDEX ROWID| T1    |    15 |
------------------------------------------------------

Notice how Oracle can present a single join as two nested loops – one into the index and a second into the table. This is why I think there may be options within the optimizer to do my little trick automatically – if not now, then soon.

Update June 2012

I’ve just had an exchange of email with an Oak Table member who has pointed me to US patent 8103658 (dated November 2009) – which looks like a remarkably good description of this technique. So maybe the method will become an automatic option for the optimizer some time in the next couple of years.

Comment it's also very fantastic, please enjoy on the original post address, which you can find below

参考至:http://jonathanlewis.wordpress.com/2010/05/18/double-trouble/#comment-36301

如有错误，欢迎指正

邮箱:czmcj@163.com

分享到：

Optimizer Transformations: Star Transfor ... | The CPU Costing Model: A Few Thoughts Pa ...

2014-06-30 07:32
浏览 711
评论(0)
分类:数据库
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

double trouble

double trouble

Update June 2012

评论

发表评论

相关推荐

最近访客 更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

double trouble

double trouble

Update June 2012

评论

发表评论

相关推荐

The Query Optimizer

OPTIMIZER_INDEX_CACHING Parameter

The CPU Costing Model: A Few Thoughts Part V (Reality)

The CPU Costing Model: A Few Thoughts Part IV (Map of the Problematique)

The CBO CPU Costing Model: Indexes vs. Full Table Scans

The CPU Costing Model – A Few Thoughts Part II

The CPU Costing Model: A Few Thoughts Part I

The CBO CPU Costing Model and Indexes – Another Introduction

The CBO and Indexes: OPTIMIZER_INDEX_COST_ADJ Part III

The CBO and Indexes: OPTIMIZER_INDEX_COST_ADJ Part III

The CBO and Indexes: OPTIMIZER_INDEX_COST_ADJ Part II

The CBO and Indexes: OPTIMIZER_INDEX_COST_ADJ Part I

The CBO and Indexes: Introduction Continues …

The CBO and Indexes: An Introduction (Absolute Beginners)

最近访客更多访客>>