`
wsql
  • 浏览: 12034659 次
  • 性别: Icon_minigender_1
  • 来自: 深圳
文章分类
社区版块
存档分类
最新评论

Oracle Index Clustering Factor 说明

 
阅读更多

一. 官网说明

Theindexclustering factormeasures row order in relation to an indexed value suchas employee last name. The more order that exists in rowstorage for this value, the lower the clustering factor.

-- row 存储的越有序,clustering factor 的值越低

Theclustering factor is useful as a rough measure of thenumber of I/Os required to read an entire table by means of an index:

(1)If the clustering factor is high, then Oracle Database performs a relativelyhigh number of I/Os during a large index range scan. The index entriespoint to random table blocks, so the databasemay have to read and reread the same blocks over and over again to retrieve thedata pointed to by the index.

--当clustering factor 很高时,说明index entry(rowid) 是随机指向一些block的,在一个大的indexrange scan时,这样为了读取这些rowid 指向的block,就需要一次有一次重复的去读这些block。

(2)If the clustering factor is low, then Oracle Database performs a relativelylow number of I/Os during a large index range scan. The index keys in arange tend to point to the same data block, sothe database does not have to read and reread the same blocks over and over.

--当clustering factor 值低时,说明index keys(rowid) 是指向的记录是存储在相同的block里,这样去读row时,只需要在同一个block里读取就可以了。就可以减少重复读取block的次数。

The clustering factor isrelevant for index scans because it can show:

(1)Whether the database will use an index for large range scans

(2)The degree of table organization in relation to the index key

(3)Whether you should consider using an index-organized table,partitioning, or table cluster if rows must be ordered by the index key

http://download.oracle.com/docs/cd/E11882_01/server.112/e16508/indexiot.htm#CNCPT89180

二. Index Clustering Factor说明

之前也整理过一篇有关索引维护的Blog,参考:

Oracle 索引的维护

http://blog.csdn.net/xujinyang/article/details/6829355

在里面没有提到index Clustering Factor参数,所以这里说明一下。

简单的说, IndexClustering Factor是通过一个索引扫描一张表,需要访问的表的数据块的数量,即对I/O的影响. 也代表索引键值存储位置是否有序。

(1)如果越有序,即相邻的键值存储在相同的block,那么这时候ClusteringFactor 的值就越低。

(2)如果不是很有序,即键值是随即的存储在block上,这样在读取键值时,可能就需要一次又一次的去访问相同的block,从而增加了I/O.

Clustering Factor 的计算方式如下:

(1)扫描一个索引(large index range scan)

(2)比较某行的rowid和前一行的rowid,如果这两个rowid不属于同一个数据块,那么cluster factor增加1

(3)整个索引扫描完毕后,就得到了该索引的cluster factor。

如果ClusteringFactor接近于表存储的块数,说明这张表是按照索引字段的顺序存储的。

如果ClusteringFactor接近于行的数量,那说明这张表不是按索引字段顺序存储的。

在计算索引访问成本的时候,这个值十分有用。Clustering Factor乘以选择性参数(selectivity )就是访问索引的开销。

如果这个统计数据不能真实反映出索引的真实情况,那么可能会造成优化器错误的选择执行计划。另外如果某张表上的大多数访问是按照某个索引做索引扫描,那么将该表的数据按照索引字段的顺序重新组织,可以提高该表的访问性能。

三. 测试

3.1 模拟问题

--查看版本信息

SYS@anqing2(rac2)> select * fromv$version where rownum=1;

BANNER

----------------------------------------------------------------

Oracle Database 10g Enterprise EditionRelease 10.2.0.4.0 - Prod

--创建测试表

SYS@anqing2(rac2)> create table t asselect * from dba_objects where 1=2;

Table created.

SYS@anqing2(rac2)> begin

2 for i in 1..10 loop

3 insert /*+append*/ into tselect * from dba_objects order by i;

4 commit;

5 end loop;

6 end;

7 /

PL/SQL procedure successfully completed.

-- 这样insert的原因是保证数据存储的无序性

SYS@anqing2(rac2)> select count(*) fromt;

COUNT(*)

----------

502720

--查看表的大小

SYS@anqing2(rac2)> set wrap off

SYS@anqing2(rac2)> col owner for a10

SYS@anqing2(rac2)> col segment_name fora15

SYS@anqing2(rac2)> select owner, segment_name, blocks, extents,bytes/1024/1024||'M' "size" from dba_segments where owner='SYS' andsegment_name='T';

OWNERSEGMENT_NAME BLOCKS EXTENTS size

---------- --------------- -------------------- -------------------------------

SYST 6912 69 54M

--在object_id上构建索引

SYS@anqing2(rac2)> create index idx_t_id on t(object_id);

Index created.

SYS@anqing2(rac2)> select owner, segment_name, segment_type,blocks, extents,bytes/1024/1024||'M' "SIZE" from dba_segments where owner='SYS' andsegment_name=upper('idx_t_id');

ownersegment_name segment_type blocksextents size

---------- --------------------------------- ---------- ---------- ------------

SYSIDX_T_ID INDEX 115224 9M

--在没有收集相关的统计信息之前,我们查看一下Index Clustering Factor

SYS@anqing2(rac2)> select owner,index_name, clustering_factor, num_rows from dba_indexes where owner='SYS' andindex_name='IDX_T_ID';

OWNERINDEX_NAMECLUSTERING_FACTOR NUM_ROWS

---------- ----------------------------------------------- ----------

SYSIDX_T_ID 502720 502720

--收集统计信息

SYS@anqing2(rac2)> execdbms_stats.gather_table_stats('SYS','T',cascade => true);

PL/SQL procedure successfully completed.

--再次查看InexClustering Factor

SYS@anqing2(rac2)> select owner, index_name,clustering_factor, num_rows from dba_indexes where owner='SYS' andindex_name='IDX_T_ID';

OWNERINDEX_NAMECLUSTERING_FACTOR NUM_ROWS

---------- ----------------------------------------------- ----------

SYSIDX_T_ID 502720 502720

--统计信息收集前和后,Clustering Factor 值不变,说在创建索引的时候,会收集表中中数据真正的行数。并且这里的Clustering Factor 等于Num_rows,也也说明表的Clustering Factor 是无序的。

--查看一个确定值,然后查看执行计划

SYS@anqing2(rac2)> explain plan for select *from t where object_id=1501;

Explained.

SYS@anqing1(rac1)> select * fromtable(dbms_xplan.display);

PLAN_TABLE_OUTPUT

-------------------------------------------------------------------------------------------------------

Plan hash value: 514881935

----------------------------------------------------------------------------------------

| id| operation| name | rows| bytes | cost (%cpu)| time |

----------------------------------------------------------------------------------------

| 0| select statement | |10 | 930 | 14(0)| 00:00:01 |

| 1| table accessby index rowid| t | 10 |930 | 14(0)| 00:00:01 |

|* 2| index range scan | idx_t_id | 10 || 3 (0)| 00:00:01 |

----------------------------------------------------------------------------------------

Predicate Information (identified byoperation id):

PLAN_TABLE_OUTPUT

-------------------------------------------------------------------------------------------------------

2- access("OBJECT_ID"=1000)

--这里走了索引,cost为14

--查询一个范围的执行计划

SYS@anqing1(rac1)> explain plan for select * from t where object_id>1000 andobject_id<2000;

Explained.

SYS@anqing1(rac1)> select * fromtable(dbms_xplan.display);

PLAN_TABLE_OUTPUT

----------------------------------------------------------------------------------------------------

Plan hash value: 1601196873

--------------------------------------------------------------------------

| Id| Operation | Name |Rows | Bytes | Cost(%CPU)| Time |

--------------------------------------------------------------------------

| 0| SELECT STATEMENT | |8884 | 806K| 1537(2)| 00:00:19 |

|* 1| TABLE ACCESSFULL| T | 8884 |806K| 1537 (2)| 00:00:19 |

--------------------------------------------------------------------------

Predicate Information (identified byoperation id):

---------------------------------------------------

PLAN_TABLE_OUTPUT

----------------------------------------------------------------------------------------------------

1- filter("OBJECT_ID"<2000 AND "OBJECT_ID">1000)

13 rows selected.

--注意,object_id上是否索引的,但这里并没有使用索引而是使用了全表扫描

--刷新buffercache,然后查看SQL 执行的物理读,这个是否全表扫描的

SYS@anqing1(rac1)> alter system flushbuffer_cache;

System altered.

Elapsed: 00:00:00.24

SYS@anqing1(rac1)> set autot traceonlystat

SYS@anqing1(rac1)> select * from t where object_id>1000 andobject_id<2000;

9990 rows selected.

Elapsed: 00:00:17.13-- 用了17秒

Statistics

----------------------------------------------------------

1 recursive calls

0 db block gets

7573 consistent gets

6911 physical reads--物理读

984 redo size

746085 bytes sent via SQL*Net toclient

7715 bytes received via SQL*Netfrom client

667 SQL*Net roundtrips to/fromclient

0 sorts (memory)

0 sorts (disk)

9990 rows processed

--强制走索引,看执行计划

SYS@anqing1(rac1)> set autot off

SYS@anqing1(rac1)> explain plan for select /*+ index(t idx_t_id) */ * from twhere object_id>1000 and object_id<2000;

Explained.

Elapsed: 00:00:00.03

SYS@anqing1(rac1)> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT

----------------------------------------------------------------------------------------------------

Plan hash value: 514881935

----------------------------------------------------------------------------------------

| Id| Operation |Name | Rows | Bytes | Cost (%CPU)| Time |

----------------------------------------------------------------------------------------

| 0| SELECT STATEMENT | |8884 | 806K| 8974(1)| 00:01:48 |

| 1| TABLE ACCESSBY INDEX ROWID| T | 8884 |806K| 8974 (1)| 00:01:48 |

|* 2| INDEX RANGE SCAN | IDX_T_ID | 8942 || 22 (0)| 00:00:01 |

----------------------------------------------------------------------------------------

Predicate Information (identified byoperation id):

PLAN_TABLE_OUTPUT

----------------------------------------------------------------------------------------------------

2- access("OBJECT_ID">1000 AND "OBJECT_ID"<2000)

14 rows selected.

--强制走索引之后,这里的使用了index range scan,但是里的cost 变成了8974.而走全表扫描时,是1537.

--查看强制走索引的物理读

SYS@anqing1(rac1)> alter system flushbuffer_cache;

System altered.

Elapsed: 00:00:00.13

SYS@anqing1(rac1)> select /*+ index(tidx_t_id) */ * from t where object_id>1000 and object_id<2000;

9990 rows selected.

Elapsed: 00:00:00.25

Statistics

----------------------------------------------------------

0 recursive calls

0 db block gets

10679 consistent gets

154 physical reads

0redo size

205626 bytes sent via SQL*Net toclient

7715 bytes received via SQL*Netfrom client

667 SQL*Net roundtrips to/fromclient

0 sorts (memory)

0 sorts (disk)

9990 rows processed

--这里的物理读要比走索引低很多,但是Oracle 却没有使用索引。因为Oracle 认为走索引的Cost 比 走全表扫描大。而是是大N倍。 而CBO 就是基于Cost 来决定执行计划的。

通过第二节里的分析,对于索引的Cost,Oracle 是根据Clustering Factor参数来计算的,而我们的数据Clustering Factor参数很高,数据存储无序。 这就造成了Oracle 认为走索引的cost 比全表扫描大。

3.2 解决问题

通过上面的分析,可以看出,要降低Clustering Factor才能解决问题,而要解决Clustering Factor,就需要重新对table表的存储位置进行排序。

--重建table

SYS@anqing1(rac1)> create table tt as select * from t where 1=0;

Table created.

SYS@anqing1(rac1)> insert /*+append */ into tt select * from t order byobject_id;

502720 rows created.

SYS@anqing1(rac1)> commit;

Commit complete.

SYS@anqing1(rac1)> truncate table t;

Table truncated.

SYS@anqing1(rac1)> insert /*+append */ into t select * from tt;

502720 rows created.

SYS@anqing1(rac1)> commit;

Commit complete.

--查看表和索引的信息

SYS@anqing1(rac1)> select owner,segment_name, blocks, extents, bytes/1024/1024||'M' "size" fromdba_segments where owner='SYS' and segment_name='T';

OWNERSEGMENT_NAME BLOCKS EXTENTS size

---------- --------------- -------------------- -------------------------------

SYST 6912 69 54M

SYS@anqing1(rac1)> select owner,segment_name, segment_type,blocks, extents, bytes/1024/1024||'M'"SIZE" from dba_segments where owner='SYS' andsegment_name=upper('idx_t_id');

OWNERSEGMENT_NAME SEGMENT_TYPEBLOCKS EXTENTS SIZE

---------- --------------------------------- ---------- ---------- ------------

SYSIDX_T_ID INDEX 1024 23 8M

SYS@anqing1(rac1)> select owner,index_name, clustering_factor, num_rows from dba_indexes where owner='SYS' andindex_name='IDX_T_ID';

OWNERINDEX_NAMECLUSTERING_FACTOR NUM_ROWS

---------- ----------------------------------------------- ----------

SYSIDX_T_ID 502720 502720

--对索引进行rebuild

SYS@anqing1(rac1)> alter index idx_t_idrebuild;

Index altered.

--查看ClusteringFactor

SYS@anqing1(rac1)> select owner, index_name, clustering_factor,num_rows from dba_indexes where owner='SYS' and index_name='IDX_T_ID';

OWNERINDEX_NAME CLUSTERING_FACTOR NUM_ROWS

---------- ----------------------------------------------- ----------

SYSIDX_T_ID 6958 502720

--注意这里的Factor,已经变成6958.我们收集一下表的统计信息,然后与表的block 进行一次比较。

SYS@anqing1(rac1)> execdbms_stats.gather_table_stats('SYS','T',cascade => true);

PL/SQL procedure successfully completed.

SYS@anqing1(rac1)> select blocks fromdba_tables where table_name='T';

BLOCKS

----------

6896

--表T 实际使用的block是6896,Clustering Facter 是6958.基本还是比较接近了。这也说明相邻的row是存储在相同的block里。

--再次查看之前sql的执行计划

SYS@anqing1(rac1)> set linesize 100

SYS@anqing1(rac1)> explain plan for select * from t where object_id>1000 andobject_id<2000;

Explained.

SYS@anqing1(rac1)> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT

----------------------------------------------------------------------------------------------------

Plan hash value: 514881935

----------------------------------------------------------------------------------------

| Id| Operation |Name | Rows | Bytes | Cost (%CPU)| Time |

----------------------------------------------------------------------------------------

| 0| SELECT STATEMENT | |8928 | 810K| 147(1)| 00:00:02 |

| 1| TABLE ACCESSBY INDEX ROWID| T | 8928 |810K| 147 (1)| 00:00:02 |

|* 2| INDEX RANGE SCAN | IDX_T_ID | 8944 || 22 (0)| 00:00:01 |

----------------------------------------------------------------------------------------

Predicate Information (identified by operationid):

PLAN_TABLE_OUTPUT

----------------------------------------------------------------------------------------------------

---------------------------------------------------

2- access("OBJECT_ID">1000 AND "OBJECT_ID"<2000)

14 rows selected.

--注意这里的cost已经将到了147. 性能提升还是非常明显。

SYS@anqing1(rac1)> set autot trace stat

SYS@anqing1(rac1)> set timing on

SYS@anqing1(rac1)> alter system flushbuffer_cache;

System altered.

Elapsed: 00:00:00.08

SYS@anqing1(rac1)> select * from t whereobject_id>1000 and object_id<2000;

9990 rows selected.

Elapsed: 00:00:00.25

Statistics

----------------------------------------------------------

1 recursive calls

0 db block gets

1473 consistent gets

147 physical reads

0 redo size

205626 bytes sent via SQL*Net toclient

7715 bytes received via SQL*Netfrom client

667 SQL*Net roundtrips to/fromclient

0 sorts (memory)

0 sorts (disk)

9990 rows processed

四. 小结

通过以上说明和测试,可以看到Clustering Factor 也是索引健康的一个重要判断的标准。 其值越低越好。 它会影响CBO 选择正确的执行计划。但是要注意一点,Clustering Factor 总是趋势与不断恶化的。

在之前索引维护的blog里:

Oracle 索引的维护

http://blog.csdn.net/xujinyang/article/details/6829355

提到了一个索引的选择性. 索引的选择性是指索引列中不同值的数目与表中记录数的比。如果一个表中有2000条记录,表索引列有1980个不同的值,那么这个索引的选择性就是1980/2000=0.99。一个索引的选择性越接近于1,这个索引的效率就越高。CBO的优化器一般不会使用选择性不好的索引。

现在举一个例子来看下为什么索引的选择性越高效率越高。一般索引里会包含rowid和键值。 假设在字段name 上有索引,其值如下:

row1 dave

row2 dave

row3 dave

row4 dave

row5 anqing

按以上6条记录看,索引的选择性=2/6=0.33. 在这种情况下,如果我们根据Dave 来查询,那么索引就返回5行rowid。如果是多表的netsed loop连接,那代价就会很大了。所以当索引的选择性越低,这种扫描的代价越大。

对于这种列,可以说是数据倾斜。 对这种情况,就需要收集列信息的直方图(histogram)。让CBO 在选择执行计划时得到更多的信息,从而选择正确的执行计划。

更多信息参考:

Oracle Statistic 统计信息 小结

http://blog.csdn.net/xujinyang/article/details/6881672

-------------------------------------------------------------------------------------------------------

分享到:
评论

相关推荐

    Oracle很详细的讲解

    1.2.11 群集因子(Clustering Factor):它是表中数据在物理上分布与索引顺序是否接近的一个度量值,群集因子越小,索引扫描时访问的数据块就越少。 1.2.12 二元高度(Binary height):它是描述索引树的平衡程度的...

    Oracle Statistic 统计信息

    - **聚集因子 (Clustering factor):** 衡量索引顺序与表中数据实际物理存储顺序的一致性。 4. **系统统计信息 (System statistics):** - **I/O 性能和利用率 (I/O performance and utilization):** 数据库I/O操作...

    关于ORACLE数据库中索引的几点讨论.pdf

    Oracle在评估使用索引的成本时,会考虑两个重要因素:Clustering Factor (CF) 和 Filtering Factor (FF)。 - **CF(聚簇因子)**:它反映了每个索引块对应的平均数据块数。如果索引项在数据块中的分布很均匀,CF值...

    Oracle技术大牛整理常见问题.pdf

    *.*.*.** 群集因子(Clustering Factor):群集因子描述了表中行在物理上是如何根据索引排序的。一个低群集因子意味着表中的行按照索引顺序紧密排列,这有利于改善查询性能。 *.*.*.** 二元高度(Binary height):...

    Oracle Index 的三个问题

    Oracle评估成本时涉及两个关键参数:聚集因子(Clustering Factor, CF)和过滤因子(Filtering Factor, FF)。CF表示为获取索引中的每一项,需要读取多少数据块。FF则表示查询结果占总数据量的比例。计算公式大致为...

    Oracle优化.ppt

    2. 查看索引的统计信息:select ttt.index_name, ttt.num_rows, ttt.distinct_keys, ttt.avg_leaf_blocks_per_key, ttt.clustering_factor from dba_indexes ttt where ttt.owner='SCOTT'; 人工进行统计可以使用...

    Oracle关于重建索引争论的总结

    他们认为,删除的索引条目未被重新利用,导致空间浪费,同时索引的clustering factor(集群因子)可能不再同步,这些问题可以通过重建索引来解决。然而,这些观点并不完全准确,因为Oracle的B树索引具有自我管理的...

    oracle-认证之管理统计信息

    - **指标**:叶块数量(LEAF_BLOCKS)、等级(BLEVEL)以及聚簇因子(CLUSTERING_FACTOR)。 - **存储位置**:这些信息存储在`DBA_INDEXES`视图中。 - **作用**:帮助优化器评估使用索引访问的成本。 ##### 4. 系统计统计...

    40个DBA日常维护的SQL脚本

    ROUND(NVL(IND.CLUSTERING_FACTOR, 1) / DECODE(TAB.NUM_ROWS, 0, 1, TAB.NUM_ROWS), 3) * 100 || '%' "CLUSTERING%" FROM USER_TABLES TAB, USER_INDEXES IND WHERE TAB.TABLE_NAME = IND.TABLE_NAME AND TAB....

    索引对SQL执行成本的影响.pdf

    索引扫描的成本计算公式为:Cost=blevel+ceiling(leaf_blocks + branches_blocks) / effective_index_selectivity + ceiling(clustering_factor / effective_table_selectivity)。在这个公式中,blevel代表索引的...

Global site tag (gtag.js) - Google Analytics