`

The CBO and Indexes: An Introduction (Absolute Beginners)

    博客分类:
  • CBO
 
阅读更多

One of the more common questions I get asked and one of the most common questions asked in the various Oracle related forums is the general question of why doesn’t the CBO choose to use a particular index. There are various reasons why an index might be ignored but generally and rather simplistically (for now), the answer is often simply because the CBO considers the cost of using the index to be more expensive than other alternatives.

As one would expect with the CBO, the cost is a rather important consideration and yet many don’t understand what the costs associated with an execution plan actually means and represents. Some people think the costs associated with an execution plan are basically meaningless and are only used by the CBO in “mysterious ways”. Some people even go so far as to suggest that the CBO sucks and recommend all sorts of inappropriate ways to force Oracle to use an index.

It all comes back to simply not understanding CBO fundamentals and how Oracle costs the basic use of an index.

In reality of course, the CBO doesn’t “suck” with 10g, in fact it’s actually extremely clever and resourceful when it comes to determining appropriate, efficient execution plans. If the CBO is armed with “accurate enough” statistics, it will generally do a remarkably good job. The costs and the associated cardinality of the various steps within an execution plan provide valuable information on why the CBO has made its decisions.

So I thought I might discuss the CBO a little and start with a very basic introduction into how the CBO costs an index to get this message across.

The basic formula for costing an index based range scan is:

basic index range scan cost = index blevel + ceil(index selectivity x leaf blocks) + ceil(table selectivity x clustering factor)

Note there are various slight variations and interpretations of this formula from database version to database version and for differing specific scenarios, which can vary the resultant costs marginally. All the following examples are from a windows based 10.2.0.3 database.

So from an index perspective, the index blevel, the number of leaf blocksand the clustering factor are all statistics that directly influence the cost of using the index. However, just as important are the associated selectivities of accessing both the index and the table. These statistics are often based on the associated column statistics (but not always) and if Oracle gets these selectivities wrong, then the CBO will generate wrong costings and all bets are off regarding the appropriateness of the resultant execution plan.

OK, let’s create a nice, simple little scenario which will be the basis of future demos.

1
2
3
4
5
6
7
SQL> CREATE TABLE bowie_stuff AS SELECT (mod(rownum,100)+1)*10 id, 'Ziggy Stardust' name FROM dual CONNECT BY LEVEL <= 100000;
 
Table created.
 
SQL> CREATE INDEX bowie_stuff_i ON bowie_stuff(id);
 
Index created.

We’ll use the BOWIE_STUFF table in future demos, but for now I’m after a relatively large index with an excellent clustering factor. So I’ll create a second table, ordered on the ID column and double it’s size …

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
SQL> CREATE TABLE bowie_stuff2 AS SELECT * FROM bowie_stuff ORDER BY id;
 
Table created.
 
SQL> INSERT INTO bowie_stuff2 SELECT * FROM bowie_stuff2;
 
100000 rows created.
 
SQL> commit;
 
Commit complete.
 
SQL> CREATE INDEX bowie_stuff2_i ON bowie_stuff2(id);
 
Index created.
 
SQL> exec dbms_stats.gather_table_stats(ownname=> null, tabname=> 'BOWIE_STUFF2', cascade=> true, estimate_percent=> null, method_opt=> 'FOR ALL COLUMNS SIZE 1');
 
PL/SQL procedure successfully completed.

If we now look at some of the key statistics, we note the following:

1
2
3
4
SQL> SELECT table_name, blocks, num_rows FROM user_tables WHERE table_name = 'BOWIE_STUFF2'
TABLE_NAME   BLOCKS NUM_ROWS
------------ ------ --------
BOWIE_STUFF2    659   200000

The table has 200,000 rows and consists of 659 data blocks.

1
2
3
4
SQL> SELECT column_name, num_distinct, density, num_nulls FROM user_tab_col_statistics WHERE table_name = 'BOWIE_STUFF2' and column_name = 'ID';
COLUMN_NAME NUM_DISTINCT DENSITY NUM_NULLS
----------- ------------ ------- ---------
ID                   100     .01         0

The indexed ID column has 100 distinct values, which are perfectly evenly distributed.  The density of the ID is an “accurate” 0.01 with a selectivity of 1%, so a selection of one distinct value will return 200,000 x 0.01 = 2,000 rows. Nice easy numbers …

1
2
3
4
5
6
7
8
9
SQL> SELECT index_name, blevel, leaf_blocks, num_rows, distinct_keys, clustering_factor FROM user_indexes WHERE index_name = 'BOWIE_STUFF2_I';
 
INDEX_NAME      BLEVEL LEAF_BLOCKS
-------------- ------- -----------
BOWIE_STUFF2_I       2         605  
 
NUM_ROWS DISTINCT_KEYS CLUSTERING_FACTOR
-------- ------------- -----------------  
200000           100               854

The index has a blevel of 2, it has 605 leaf blocks and a rather good clustering factor of just 854 (note the rows in the table were basically inserted in ID order).

OK, we now have all the information we need.

Note to start with, I’ll only use the “older” IO costing model by setting the following parameter:

1
2
3
SQL> alter session set "_optimizer_cost_model" = io;
 
Session altered.

As we’ll see in future posts, the use of the CPU costing model actually has minimal effect on many index related access costings but we’ll use the IO costing model as a starting point.

This first equality SQL demo shows the costings in relation to a simple, single value equality predicate (I’ve linked to a PDF to preserve formatting).

OK, 2 important pieces of information to note. Firstly, the CBO has got the expected cardinality (2000 rows) spot on.

As the stats are 100% accurate and the values are perfectly evenly distributed, this is to be expected. If Oracle gets the cardinality correct, we can be reasonably confident that the CBO to have made a reasonable decision here.

As previously mentioned, there are 100 distinct values of which we want one of those those values. Oracle has calculated the selectivity of the query to be 1/100 = 1% or a value of 0.01 (as defined in the density column statistic). The expected number of rows is therefore 200,000 x 0.01 = 2000. Like I said, spot on.

Secondly, the CBO has calculated the cost of the index range scan to be 9 and the overall cost of the query to be 18. How has the CBO derived these costings ? Well as it’s a simple, single column index and associated query predicate, we can plug in the 0.01 selectivity into the above formula as follows:

index blevel + ceil(index selectivity x leaf blocks) + ceil(table selectivity x clustering factor)

2 + ceil(0.01 x 605) + ceil(0.01 x 854) = 2 + 7 + 9 =    9 + 9 = 18

So we can see that the costs associated with accessing just the index:

blevel + ceil(index selectivity x leaf blocks) = 2 + ceil(0.01 x 605) = 2 + 7 = 9

indeed comes to a total of 9

and the overall cost of the execution plan indeed does come to 18.

The CBO costings all do actually make sense and add up as expected.

In this second IN list demo, we’re just going to expand on things a touch by selecting 3 distinct values in a IN list. However, the actual principles and cost calculations are basically the same.

This time we’re after 3 possible values as listed in the IN clause. So that’s 3 out of the 100 possible values, 3/100 = 3% or 0.03. The required cardinality is therefore 200,000 rows x 0.03 = 6000 which again Oracle has got spot on. Usually a good sign that the CBO has made a reasonable decision.

This time the costs have come to 23 for the index range scan part of the execution plan and 49 overall. Again using our basic formula:

index blevel + ceil(index selectivity x leaf blocks) + ceil(table selectivity x clustering factor)

2 + 3 x ceil(0.01 x 605) + ceil(0.03 x 854) = 2 + 3×7 + 26 = 2 + 21 + 26 =23 + 26 = 49.

So the cost associated with the index is indeed 23 and the overall cost of the query is indeed 49. Again, all these CBO costings make sense. These costings aren’t meaningless, internal numbers but values that give us a useful insight into how and why the CBO has come to a specific cost and so to a specific execution plan.

Note BTW, one of those subtle differences in how the formula is implemented with 10g (from say 9i) is the fact the selectivity is calculated for the index by performing the CEIL function for one value first and then multiplied by the overall number of expected values in the IN list:

2 + 3 x  ceil(0.01 x 605) = 23

rather than determining and using the total index selectivity within the CEIL function:

2 +  ceil(0.03 x 605) = 21

The net result being such costs are going to be just that little bit higher in 10g than they would have been in 9i.

So indeed, these things can all change a little from release to release but the basic principle is still the same. Feed the CBO acurate enough statistics and it’ll likely do the right thing. If it doesn’t, understanding the CBO and how it costs various execution plans will help to determine what the issues might be. Huge clues are provided by the costings and cardinality estimates in the associated query execution plans. 

I’ll expand on all this is future posts but for more information on this subject, I highly recommend the writings of Wolfgang Breitling (and his excellent paper “A Look Under The Hood Of The CBO: The 10053 Event” in particular) and the writings of Jonathan Lewis (and his excellent book “Cost-Based Oracle Fundamentals” in particular).

Like I said, this is only meant as an introduction, more to come …

 


参考至:http://richardfoote.wordpress.com/2009/06/09/the-cbo-and-indexes-an-introduction-absolute-beginners/
如有错误,欢迎指正

邮箱:czmcj@163.com

分享到:
评论

相关推荐

    CBO中Nd:YLF红色激光倍频的全固态330nm紫外激光

    4. CsB3O5(CBO)晶体:CBO是一种在紫外区域具有高透光性和高激光损伤阈值的非线性光学晶体。因其较大的有效非线性系数,CBO晶体成为产生紫外激光的优良选择。CBO可以用于倍频和三倍频转换,这里讨论的是利用CBO晶体...

    matlab巴特沃斯滤波器代码-CBO4filters:使用称为碰撞体优化(CBO)的元启发式优化技术,根据整数阶有理近似设计分数阶低通巴特沃

    matlab巴特沃斯代码CBO4过滤器 使用称为碰撞体优化 (CBO) 的元启发式优化技术,根据整数阶有理近似设计分数阶低通巴特沃斯滤波器的 MATLAB 代码 分数阶滤波器比整数阶滤波器具有更好的滚降频率。 但是实现分数阶...

    CBO原书代码

    【CBO原书代码】是关于Oracle数据库优化器(Cost-Based Optimizer,简称CBO)的学习资源,由知名Oracle专家Jonathan Lewis编著。CBO是Oracle数据库系统中用于选择执行SQL查询最佳执行路径的关键组件。它通过计算不同...

    cmobase-v1.0-rc2.pdf

    - 页错误、虚拟机页错误和访问错误异常(Page Fault, Guest-Page Fault, and Access Fault Exceptions) - 地址不对齐异常(Address Misaligned Exceptions) - 断点异常和调试模式入口(Breakpoint Exceptions ...

    how-cbo-works

    ### 如何理解CBO在Oracle数据库中的工作原理 本文旨在深入探讨成本基础优化器(Cost-Based Optimizer,简称CBO)在Oracle数据库中的工作方式及其重要性。通过一个具体的示例,我们将逐步分析CBO如何根据表统计信息...

    oracle CBO HINT

    Oracle的Cost-Based Optimizer (CBO) 是一种自动选择最优执行计划的机制,它通过评估不同执行路径的成本来决定如何执行SQL查询。CBO会考虑各种因素,包括表的大小、索引的存在、数据分布等,以确定最有效率的查询...

    oracle cbo

    cbooracle cbo

    cbo数据库优化

    **CBO数据库优化详解** 在Oracle数据库管理中,CBO(Cost-Based Optimizer)是一种重要的查询优化策略,它根据操作的成本来决定最佳的执行计划。CBO通过分析表的统计信息,如行数、块数、索引信息等,来评估不同...

    Oracle CBO 学习笔记之(1) : 深入理解Oracle Hash Join的代价模型及其执行流程

    Oracle Cost-Based Optimizer (CBO) 是Oracle数据库中的一种查询优化策略,它通过评估不同查询执行计划的成本来选择最优的执行路径。在这个学习笔记中,我们将深入探讨Oracle中的Hash Join操作,这是一种重要的联接...

    【Maclean Liu技术分享】拨开Oracle CBO优化器迷雾,探究Histogram直方图之秘_0321.pdf

    ### Oracle CBO优化器与直方图(Histogram)解析 #### 概述 在数据库管理领域,Oracle CBO(Cost-Based Optimizer)成本基于优化器是Oracle数据库管理系统中用于确定查询执行计划的一种重要机制。通过分析SQL语句,...

    CBO:CBO-巴西巴西利亚足球俱乐部(ClassificaçãoBrasileira deOcupações)

    cbo CBO-巴西杯足球俱乐部从2020年第1季度开始至在CBO-CBO-巴西巴西杯比赛之前,截止到2020年)。 O Fato de aVERSÃOser 2002não意义上的替代品,algumasocupaçõesforaminclídas。 Apenas o FORMATO de ...

    CBO,RBO在ORACLE中的应用

    在Oracle数据库中,CBO(Cost-Based Optimizer)和RBO(Rule-Based Optimizer)是两种不同的SQL查询优化策略。CBO自Oracle 7引入,但在Oracle 8i时达到成熟,并在Oracle 9i及以后的版本中逐渐取代RBO成为主要的优化...

    ORACLE CBO RBO 优化

    ORACLE CBO RBO 优化

    UFIDA.U9.CBO.ItemMasterBE.zip

    标题 "UFIDA.U9.CBO.ItemMasterBE.zip" 指的是用友U9系统中的一个插件开发相关的代码示例。用友U9是一款面向大型企业的全面企业管理软件,其核心设计思想是“实时企业,全球商务”。CBO(Client Business Object)是...

    addin

    Send us an e-mail so that we can notify you when the change is available.&lt;P&gt; &lt;BR&gt; 5) Linked Table Manager and other Wizards &lt;BR&gt; Also included with the setup are a Linked Table Manager and other Add-...

    Oracle的RBO和CBO详细介绍和优化模式设置方法

    Oracle的优化器有两种优化方式,即基于规则的优化方式(Rule-Based Optimization,简称为RBO)和基于代价的优化方式(Cost-Based Optimization,简称为CBO),在Oracle8及以后的版本,Oracle强列推荐用CBO的方式 ...

Global site tag (gtag.js) - Google Analytics