复合索引branch block上存储几个列的信息

lanmh

浏览: 560385 次
性别:
来自: 杭州

最近访客更多访客>>

siwei19870827

五音谷

善假于物也

devcang

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

Oracle

SQL Oracle

[分享] 复合索引branch block上存储几个列的信息

这是很早之前研究的问题，已经忘记了是什么案例导致去研究这个问题。

不过这个问题本省就很有意思，当在两个或者多个列上创建索引时，究竟branch block中会纪录几个列的信息？

首先我们知道branch block中存储结构大致是这样的 pointer - value - pointer - value - pointer.其中pointer指向leaf block或者下一层branch block.

那么这个value的值就是为了区分左右指针所指向的两个leaf block中(或者下一层branch block中）的值，这样从root -> branch　-> leaf的时候才能准确定位到leaf block.

在提供答案前，首先给出几种不同的论点

1.不管建立在多少列上，branch block中只存储leading column信息

2. 如果建立在n个列上，branch block中存储n-1列的信息

3. branch block上存储所有列的信息.

这三种观点都有问题，首先观点一

如果这个索引的leading column的可选择性很差，例如status或者flag这样的列，distinct value很少

这时候如果索引建立在（status, id)上，对于

select * from table where status=0 and id=12345这样的查询,range scan并不能准确定位到 leaf block

对于观点二，在n=2的时候显然也可以用上面的例子来推翻

对于观点三，是可以正确定位到leaf block，但是如果对于像

index(id,name) 其中name 假定name字段类型为varchar(1000)，这样的索引效率也不好

事实上因为id列可选择性很好，一般不需要再存储name值到branch block中就可以区分左右的block

在branch block中存储的name列的值浪费了branch block的空间，

而我们知道单个branch block中能够存储值的多少直接影响了树的高度，而树的高度显然直接影响了index的效率

通过上面的分析，以上的三种观点都是错误的，那么oracle究竟如何来存储列信息才能既做到准确定位leaf block又不存储多余的值呢？

这里首先给出试验的结果，我没有找到官方文档或者其他的文章来解释这个结论，不过通过试验来看结论是很清楚的，而且也符合上面提到的两种考虑因素。

其实刚刚我们被限制住的原因是一直认为branch block中存储的列的个数是个常数，但事实上存储几个列是变化的。

这个值只需要区分其左右指针所指向的两个block的值就可以了，

更简单的说，当oracle从上到下走到这个branch block的时候，

通过这个值可以准确的知道向左走还是向右走，

举个例子，我们上面一开始提到的索引例子，

第一个是 index（status, id) , 如果左边block的最大值为(0,33),右边block的最小值为(0,35)，

因为此时单纯从第一列不能区分两个block，这时就需要存储两列的信息

如果左边block最大值为(0,100),右边block最小值为(1,1)

因为此时从第一列就可以区分出两个block，所以只需要存储第一列的信息

这样对于

select * from table where status=0 and id=12345

查询，通过branch block中的值既可以知道“向左走，向右走”，准确定位leaf block

又可以避免不必要的浪费branch block空间，避免树高度不必要的增高

下面是试验的过程，直接copy以前自己发的邮件，懒得翻译了

We discussed the internal storage rule for composite index last time, but we didn’t get conclusion yet.

So I did a test for this.

SQL> create table test(n1 int,n2 int ,n3 int,v1 varchar2(2000),v2 varchar2(2000),v3 varchar2(2000));

Table created.

SQL> begin

2 for i in 1..1000 loop

3 insert into test values(0,i,i,rpad(’a',2000),rpad(’a',2000),rpad(’a',2000));

4 end loop;

5 end;

6 /

PL/SQL procedure successfully completed.

insert 1000 rows, the format is like this

0,1,1,xxxxxx,xxxxxxxxxx

0,2,2,xxxxxx,xxxxxxxxxx

0,3,3, xxxxxx,xxxxxxxxxx

……..

0,1000,1000, xxxxxx,xxxxxxxxxx

SQL> commit;

Commit complete.

create 7 indexes on this table, the naming rule is index_

_N (number) V (varchar)

SQL> create index index_2_nv on test(n1,v1);

Index created.

SQL> create index index_2_vn on test(v1,n1);

Index created.

SQL> create index index_3_nnv on test(n1,n2,v1);

Index created.

SQL> create index index_3_nvn on test(n1,v1,n2);

Index created.

SQL> create index index_4_nnnv on test(n1,n2,n3,v1);

Index created.

SQL> create index index_4_nnvn on test(n1,n2,v1,n3);

Index created.

SQL> create index index_5_nnnvv on test (n1,n2,n3,v1,v2);

Index created.

SQL> analyze table test compute statistics;

Table analyzed.

SQL> select index_name,PREFIX_LENGTH,BLEVEL,LEAF_BLOCKS from user_indexes

2 where lower(index_name) in (’index_2_nv’,'index_2_vn’,'index_3_nnv’,'index_3_nvn’,

3 ‘index_4_nnnv’,'index_4_nnvn’,'index_5_nnnvv’);

INDEX_NAME PREFIX_LENGTH BLEVEL LEAF_BLOCKS

—————————— ————- ———- ———–

INDEX_2_NV 5 334

INDEX_2_VN 5 334

INDEX_3_NNV 1 334

INDEX_3_NVN 5 334

INDEX_4_NNNV 1 334

INDEX_4_NNVN 1 334

INDEX_5_NNNVV 2 1000

7 rows selected.

You see, three indexes (INDEX_2_NV, INDEX_2_VN, INDEX_3_NVN) have level5. This means these indexes’ branch block contains varchar2000.

Then I dump these indexes’ root block to confirm this.

SQL> select segment_name,file_id,block_id from dba_extents where

2 lower(segment_name) in (’index_2_nv’,'index_2_vn’,'index_3_nnv’,'index_3_nvn’,

‘index_4_nnnv’,'index_4_nnvn’,'index_5_nnnvv’) and extent_id=0 order by segment_name 3 ;

SEGMENT_NAME FILE_ID BLOCK_ID

—————————— ———- ———-

INDEX_2_NV 9 16082

INDEX_2_VN 9 16097

INDEX_3_NNV 9 13587

INDEX_3_NVN 1 22053

INDEX_4_NNNV 1 32702

INDEX_4_NNVN 1 62230

INDEX_5_NNNVV 1 61697

7 rows selected.

SQL> @temp.sql

SQL> alter system dump datafile 9 block 16083;

System altered.

SQL> alter system dump datafile 9 block 16098;

System altered.

SQL> alter system dump datafile 9 block 13588;

System altered.

SQL> alter system dump datafile 1 block 22054;

System altered.

SQL> alter system dump datafile 1 block 32703;

System altered.

SQL> alter system dump datafile 1 block 62231;

System altered.

SQL> alter system dump datafile 1 block 61698;

System altered.

It’s weird. The index INDEX_2_NV and INDEX_2_VN only have 2 columns.But in the root block dump file, we can find col0,col1 and col2(maybesomething related to rowid, I guess).

I am not sure about what the third col is, but it’s clear that the number of columns in branch block is related about the data.

0,1,1,xxxxxx,xxxxxxxxxx

0,2,2,xxxxxx,xxxxxxxxxx

0,3,3, xxxxxx,xxxxxxxxxx

……..

0,1000,1000, xxxxxx,xxxxxxxxxx

The index INDEX_2_NV, INDEX_2_VN can’t distinguish the leaf blocksusing the prefix two columns, so it added the third column (unknown,maybe something related to rowid, I guess).

The indexes INDEX_3_NNV, INDEX_4_NNNV, INDEX_4_NNVN and INDEX_5_NNNVVcan distinguish the leaf blocks using the prefix two columns, so thebranch blocks only contain 2 columns.

The index INDEX_3_NVN can should use three columns to distinguish theleft/right leaf blocks, so it contains 3 columns (because the prefixtwo columns’ values are the same).

You can check the dump file xfan_ora_25936.trc for details.

So I guess that the number of columns stored in branch block is related to linked leaf blocks’ data.

The columns which stored in branch blocks need to distinguish the left/right linked leaf blocks.

If the rule for composite index is like I guessed, then the columnsstored in branch blocks maybe different in one index branch block!!

So I did another test to prove this:

SQL> create table test2(n1 int,n2 int,n3 int,v1 varchar2(2000),v2 varchar2(2000));

Table created.

SQL> create index test_idx on test2(n1,n2,n3,v1,v2);

Index created.

SQL> begin

2 for i in 1..1000 loop

insert into test2 values(0,0,i,rpad(’a',2000),rpad(’a',2000));

end loop;

end;

/ 3 4 5 6

PL/SQL procedure successfully completed.

Insert first 1000 rows like this:

0, 0, 1, xxxxx,xxxxxx

0, 0, 2, xxxxx,xxxxxx

0, 0, 3, xxxxx,xxxxxx

……

0, 0, 1000, xxxxx,xxxxxx

SQL> SQL>

SQL> begin

2 for i in 1001..2000 loop

insert into test2 values(0,i,i,rpad(’a',2000),rpad(’a',2000));

end loop;

end;

/ 3 4 5 6

Then insert other 1000 rows like this:

0,1,1,xxxx,xxxx

0,2,2,xxxx,xxxx

0,3,3,xxxx,xxxx

…….

0,1000,1000,xxxx,xxxx

Then I did treedump and branch block dump.

Here is a snippet of the branch block dump:

row#454[2145] dba: 4218395=0×405e1b

col 0; len 1; (1): 80

col 1; len 1; (1): 80

col 2; len 3; (3): c2 0a 64

col 3; TERM

row#455[2133] dba: 4218396=0×405e1c

col 0; len 1; (1): 80

col 1; len 1; (1): 80

col 2; len 2; (2): c2 0b

col 3; TERM

row#456[2124] dba: 4218397=0×405e1d

col 0; len 1; (1): 80

col 1; len 1; (1): c2

col 2; TERM

row#457[2113] dba: 4218398=0×405e1e

col 0; len 1; (1): 80

col 1; len 3; (3): c2 0b 03

col 2; TERM

row#458[2102] dba: 4218399=0×405e1f

col 0; len 1; (1): 80

col 1; len 3; (3): c2 0b 04

col 2; TERM

You see, the numbers of columns changed in one branch block!!

The attachment index.trc is the dump file

So the conclusion is that the number of columns stored in branch block is related to linked leaf blocks’ data.

The branch block will store as less columns as possible to distinguish the left/right leaf blocks it linked

分享到：

oracle查询/子查询/复制表和数据 | 使用索引的误区之一：没有使用复合索引的前 ...

2010-12-04 10:26
浏览 659
评论(0)
分类:数据库
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论