`

Bitmap Index vs B-tree Index(原创)

 
阅读更多

Introduction

Conventional wisdom holds that bitmap indexes are most appropriate for columns having low distinct values--such as GENDER, MARITAL_STATUS, and RELATION. This assumption is not completely accurate, however. In reality, a bitmap index is always advisable for systems in which data is not frequently updated by many concurrent systems.

There are several disadvantages to using a bitmap index on a unique column--one being the need for sufficient space (and Oracle does not recommend it). However, the size of the bitmap index depends on the cardinality of the column on which it is created as well as the data distribution. Consequently, a bitmap index on the GENDER column will be smaller than a B-tree index on the same column. In contrast, a bitmap index on EMPNO (a candidate for primary key) will be much larger than a B-tree index on this column. But because fewer users access decision-support systems (DSS) systems than would access transaction-processing (OLTP) ones, resources are not a problem for these applications.

To illustrate this point, I created two tables, TEST_NORMAL and TEST_RANDOM. I inserted one million rows into the TEST_NORMAL table using a PL/SQL block, and then inserted these rows into the TEST_RANDOM table in random order:
Create table test_normal (empno number(10), ename varchar2(30), sal number(10));
Begin
For i in 1..1000000
Loop
   Insert into test_normal
   values(i, dbms_random.string('U',30), dbms_random.value(1000,7000));
   If mod(i, 10000) = 0 then
   Commit;
  End if;
End loop;
End;
/
Create table test_random
as
select /*+ append */ * from test_normal order by dbms_random.random;
SQL> select count(*) "Total Rows" from test_normal;
Total Rows
----------
   1000000
Elapsed: 00:00:01.09
SQL> select count(distinct empno) "Distinct Values" from test_normal;
Distinct Values
---------------
        1000000
Elapsed: 00:00:06.09
SQL> select count(*) "Total Rows" from test_random;
Total Rows
----------
   1000000
Elapsed: 00:00:03.05
SQL> select count(distinct empno) "Distinct Values" from test_random;
Distinct Values
---------------
        1000000
Elapsed: 00:00:12.07

Note that the TEST_NORMAL table is organized and that the TEST_RANDOM table is randomly created and hence has disorganized data. In the above table, column EMPNO has 100-percent distinct values and is a good candidate to become a primary key. If you define this column as a primary key, you will create a B-tree index and not a bitmap index because Oracle does not support bitmap primary key indexes.

To analyze the behavior of these indexes, we will perform the following steps:

 

  1. On TEST_NORMAL:
    1. Create a bitmap index on the EMPNO column and execute some queries with equality predicates.
    2. Create a B-tree index on the EMPNO column, execute some queries with equality predicates, and compare the logical and physical I/Os done by the queries to fetch the results for different sets of values.
  2. On TEST_RANDOM:
    1. Same as Step 1A.
    2. Same as Step 1B.
  3. On TEST_NORMAL:
    1. Same as Step 1A, except that the queries are executed within a range of predicates.
    2. Same as Step 1B, except that the queries are executed within a range of predicates. Now compare the statistics.
  4. On TEST_RANDOM:
    1. Same as Step 3A.
    2. Same as Step 3B.
  5. On TEST_NORMAL:
    1. Create a bitmap index on the SAL column, and then execute some queries with equality predicates and some with range predicates.
    2. Create a B-tree index on the SAL column, and then execute some queries with equality predicates and some with range predicates (same set of values as in Step 5A). Compare the I/Os done by the queries to fetch the results.
  6. Add a GENDER column to both of the tables, and update the column with three possible values: M for male, F for female, and null for N/A. This column is updated with these values based on some condition.
  7. Create a bitmap index on this column, and then execute some queries with equality predicates.
  8. Create a B-tree index on the GENDER column, and then execute some queries with equality predicates. Compare to results from Step 7.

Steps 1 to 4 involve a high-cardinality (100-percent distinct) column, Step 5 a normal-cardinality column, and Steps 7 and 8 a low-cardinality column.

Step 1A (on TEST_NORMAL)

In this step, we will create a bitmap index on the TEST_NORMAL table and then check for the size of this index, its clustering factor, and the size of the table. Then we will run some queries with equality predicates and note the I/Os of these queries using this bitmap index.

SQL> create bitmap index normal_empno_bmx on test_normal(empno);
Index created.
Elapsed: 00:00:29.06
SQL> analyze table test_normal compute statistics for table for all indexes for all indexed columns;
Table analyzed.
Elapsed: 00:00:19.01
SQL> select substr(segment_name,1,30) segment_name, bytes/1024/1024 "Size in MB"
  2  from user_segments
  3* where segment_name in ('TEST_NORMAL','NORMAL_EMPNO_BMX');
SEGMENT_NAME             Size in MB
------------------------------------       ---------------
TEST_NORMAL              50
NORMAL_EMPNO_BMX         28
Elapsed: 00:00:02.00
SQL> select index_name, clustering_factor from user_indexes;
INDEX_NAME              CLUSTERING_FACTOR
------------------------------         ---------------------------------
NORMAL_EMPNO_BMX        1000000
Elapsed: 00:00:00.00

You can see in the preceding table that the size of the index is 28MB and that the clustering factor is equal to the number of rows in the table. Now let's execute the queries with equality predicates for different sets of values

SQL> set autotrace only
SQL> select * from test_normal where empno=&empno;
Enter value for empno: 1000
old   1: select * from test_normal where empno=&empno
new   1: select * from test_normal where empno=1000
Elapsed: 00:00:00.01

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=4 Card=1 Bytes=34)
   1    0   TABLE ACCESS (BY INDEX ROWID) OF 'TEST_NORMAL' (Cost=4 Car
          d=1 Bytes=34)
   2    1     BITMAP CONVERSION (TO ROWIDS)
   3    2       BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_EMPNO_BMX'

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          5  consistent gets
          0  physical reads
          0  redo size
        515  bytes sent via SQL*Net to client
        499  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed
Step 1B (on TEST_NORMAL)

Now we will drop this bitmap index and create a B-tree index on the EMPNO column. As before, we will check for the size of the index and its clustering factor and execute the same queries for the same set of values, to compare the I/Os.

SQL> drop index NORMAL_EMPNO_BMX;
Index dropped.
SQL> create index normal_empno_idx on test_normal(empno);
Index created.
SQL> analyze table test_normal compute statistics for table for all indexes for all indexed columns;
Table analyzed.
SQL> select substr(segment_name,1,30) segment_name, bytes/1024/1024 "Size in MB"
  2  from user_segments
  3  where segment_name in ('TEST_NORMAL','NORMAL_EMPNO_IDX');
SEGMENT_NAME              Size in MB
----------------------------------         ---------------
TEST_NORMAL               50
NORMAL_EMPNO_IDX          18
SQL> select index_name, clustering_factor from user_indexes;
INDEX_NAME           CLUSTERING_FACTOR
----------------------------------    ----------------------------------
NORMAL_EMPNO_IDX     6210

It is clear in this table that the B-tree index is smaller than the bitmap index on the EMPNO column. The clustering factor of the B-tree index is much nearer to the number of blocks in a table; for that reason, the B-tree index is efficient for range predicate queries.

Now we'll run the same queries for the same set of values, using our B-tree index.

SQL> set autot trace
SQL> select * from test_normal where empno=&empno;
Enter value for empno: 1000
old   1: select * from test_normal where empno=&empno
new   1: select * from test_normal where empno=1000
Elapsed: 00:00:00.01

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=4 Card=1 Bytes=34)
   1    0   TABLE ACCESS (BY INDEX ROWID) OF 'TEST_NORMAL' (Cost=4 Car
          d=1 Bytes=34)
   2    1     INDEX (RANGE SCAN) OF 'NORMAL_EMPNO_IDX' (NON-UNIQUE) (C
          ost=3 Card=1)
Statistics
----------------------------------------------------------
         29  recursive calls
          0  db block gets
          5  consistent gets
          0  physical reads
          0  redo size
        515  bytes sent via SQL*Net to client
        499  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

 

As you can see, when the queries are executed for different set of values, the number of consistent gets and physical reads are identical for bitmap and B-tree indexes on a 100-percent unique column.

BITMAP EMPNO B-TREE
Consistent Reads Physical Reads Consistent Reads Physical Reads
5 0 1000 5 0
5 2 2398 5 2
5 2 8545 5 2
5 2 98008 5 2
5 2 85342 5 2
5 2 128444 5 2
5 2 858 5 2

Step 2A (on TEST_RANDOM)

Now we'll perform the same experiment on TEST_RANDOM:
SQL> create bitmap index random_empno_bmx on test_random(empno);
Index created.
SQL> analyze table test_random compute statistics for table for all indexes for all indexed columns;
Table analyzed.
SQL> select substr(segment_name,1,30) segment_name, bytes/1024/1024 "Size in MB"
  2  from user_segments
  3* where segment_name in ('TEST_RANDOM','RANDOM_EMPNO_BMX');
SEGMENT_NAME             Size in MB
------------------------------------       ---------------
TEST_RANDOM              50
RANDOM_EMPNO_BMX         28
SQL> select index_name, clustering_factor from user_indexes;
INDEX_NAME              CLUSTERING_FACTOR
------------------------------         ---------------------------------
RANDOM_EMPNO_BMX        1000000

Again, the statistics (size and clustering factor) are identical to those of the index on the TEST_NORMAL table:
SQL> select * from test_random where empno=&empno;
Enter value for empno: 1000
old   1: select * from test_random where empno=&empno
new   1: select * from test_random where empno=1000
Elapsed: 00:00:00.01
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=4 Card=1 Bytes=34)
   1    0   TABLE ACCESS (BY INDEX ROWID) OF 'TEST_RANDOM' (Cost=4 Card=1 Bytes=34)
   2    1     BITMAP CONVERSION (TO ROWIDS)
   3    2       BITMAP INDEX (SINGLE VALUE) OF 'RANDOM_EMPNO_BMX'

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          5  consistent gets
          0  physical reads
          0  redo size
        515  bytes sent via SQL*Net to client
        499  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed
Step 2B (on TEST_RANDOM)

Now, as in Step 1B, we will drop the bitmap index and create a B-tree index on the EMPNO column.
SQL> drop index RANDOM_EMPNO_BMX;
Index dropped.
SQL> create index random_empno_idx on test_random(empno);
Index created.
SQL> analyze table test_random compute statistics for table for all indexes for all indexed columns;
Table analyzed.
SQL> select substr(segment_name,1,30) segment_name, bytes/1024/1024 "Size in MB"
  2  from user_segments
  3  where segment_name in ('TEST_RANDOM','RANDOM_EMPNO_IDX');
SEGMENT_NAME              Size in MB
----------------------------------         ---------------
TEST_RANDOM               50
RANDOM_EMPNO_IDX          18
SQL> select index_name, clustering_factor from user_indexes;
INDEX_NAME           CLUSTERING_FACTOR
----------------------------------    ----------------------------------
RANDOM_EMPNO_IDX     999830

This table shows that the size of the index is equal to the size of this index on TEST_NORMAL table but the clustering factor is much nearer to the number of rows, which makes this index inefficient for range predicate queries (which we'll see in Step 4). This clustering factor will not affect the equality predicate queries because the rows have 100-percent distinct values and the number of rows per key is 1.

Now let's run the queries with equality predicates and the same set of values.
SQL> select * from test_random where empno=&empno;
Enter value for empno: 1000
old   1: select * from test_random where empno=&empno
new   1: select * from test_random where empno=1000
Elapsed: 00:00:00.01
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=4 Card=1 Bytes=34)
   1    0   TABLE ACCESS (BY INDEX ROWID) OF 'TEST_RANDOM' (Cost=4 Card=1 Bytes=34)
   2    1     INDEX (RANGE SCAN) OF 'RANDOM_EMPNO_IDX' (NON-UNIQUE) (Cost=3 Card=1)

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          5  consistent gets
          0  physical reads
          0  redo size
        515  bytes sent via SQL*Net to client
        499  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

Again, the results are almost identical to those in Steps 1A and 1B. The data distribution did not affect the amount of consistent gets and physical reads for a unique column.

Step 3A (on TEST_NORMAL)

In this step, we will create the bitmap index (similar to Step 1A). We know the size and the clustering factor of the index, which equals the number of rows in the table. Now let's run some queries with range predicates.

SQL> select * from test_normal where empno between &range1 and &range2;
Enter value for range1: 1
Enter value for range2: 2300

2300 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 641040856
-------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                  |  2299 | 78166 |   417   (0)| 00:00:06 |
|   1 |  TABLE ACCESS BY INDEX ROWID | TEST_NORMAL      |  2299 | 78166 |   417   (0)| 00:00:06 |
|   2 |   BITMAP CONVERSION TO ROWIDS|                  |       |       |            |          |
|*  3 |    BITMAP INDEX RANGE SCAN   | NORMAL_EMPNO_BMX |       |       |            |          |
-------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("EMPNO">=1 AND "EMPNO"<=2300)
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        331  consistent gets
         24  physical reads
          0  redo size
     121513  bytes sent via SQL*Net to client
       2102  bytes received via SQL*Net from client
        155  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
       2300  rows processed

Step 3B (on TEST_NORMAL)

In this step, we'll execute the queries against the TEST_NORMAL table with a B-tree index on it.
SQL> select * from test_normal where empno between &range1 and &range2;
Enter value for range1: 1
Enter value for range2: 2300
old   1: select * from test_normal where empno between &range1 and &range2
new   1: select * from test_normal where empno between 1 and 2300
2300 rows selected.
Elapsed: 00:00:00.02

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=23 Card=2299 Bytes=78166)
   1    0   TABLE ACCESS (BY INDEX ROWID) OF 'TEST_NORMAL' (Cost=23 Card=2299 Bytes=78166)
   2    1     INDEX (RANGE SCAN) OF 'NORMAL_EMPNO_IDX' (NON-UNIQUE) (Cost=8 Card=2299)
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        329  consistent gets
         21 physical reads
          0  redo size
     111416  bytes sent via SQL*Net to client
       2182  bytes received via SQL*Net from client
        155  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
       2300  rows processed

When these queries are executed for different sets of ranges, the results below show:

BITMAP EMPNO (Range) B-TREE
Consistent Reads Physical Reads Consistent Reads Physical Reads
331 24 1-2300 329 21
285 21 8-1980 283 19
346 26 1850-4250 344 23
427 33 28888-31850 425 30
371 29 82900-85478 368 25
2157 150 984888-1000000 2139 131

 

As you can see, the number of consistent gets and physical reads with both indexes is again nearly identical. The last range (984888-1000000) returned almost 15,000 rows, which was the maximum number of rows fetched for all the ranges given above. So when we asked for a full table scan (by giving the hint /*+ full(test_normal) */ ), the consistent read and physical read counts were 7,239 and 5,663, respectively.

Step 4A (on TEST_RANDOM)

In this step, we will run the queries with range predicates on the TEST_RANDOM table with bitmap index and check for consistent gets and physical reads. Here you'll see the impact of the clustering factor.
SQL>select * from test_random where empno between &range1 and &range2;
Enter value for range1: 1
Enter value for range2: 2300
old   1: select * from test_random where empno between &range1 and &range2
new   1: select * from test_random where empno between 1 and 2300
2300 rows selected.
Elapsed: 00:00:08.01
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=453 Card=2299 Bytes=78166)
   1    0   TABLE ACCESS (BY INDEX ROWID) OF 'TEST_RANDOM' (Cost=453 Card=2299 Bytes=78166)
   2    1     BITMAP CONVERSION (TO ROWIDS)
   3    2       BITMAP INDEX (RANGE SCAN) OF 'RANDOM_EMPNO_BMX'

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       2467  consistent gets
       1916  physical reads
          0  redo size
     111416  bytes sent via SQL*Net to client
       2182  bytes received via SQL*Net from client
        155  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
       2300  rows processed
Step 4B (on TEST_RANDOM)

In this step, we will execute the range predicate queries on TEST_RANDOM with a B-tree index on it. Recall that the clustering factor of this index was very close to the number of rows in a table (and thus inefficient). Here's what the optimizer has to say about that:

SQL> select * from test_random where empno between &range1 and &range2;
Enter value for range1: 1
Enter value for range2: 2300
old   1: select * from test_random where empno between &range1 and &range2
new   1: select * from test_random where empno between 1 and 2300
2300 rows selected.
Elapsed: 00:00:03.04
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=613 Card=2299 Bytes=78166)
   1    0   TABLE ACCESS (FULL) OF 'TEST_RANDOM' (Cost=613 Card=2299 Bytes=78166)

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       6415  consistent gets
       4910  physical reads
          0  redo size
     111416  bytes sent via SQL*Net to client
       2182  bytes received via SQL*Net from client
        155  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
       2300  rows processed

The optimizer opted for a full table scan rather than using the index because of the clustering factor:

 

BITMAP EMPNO (Range) B-TREE
Consistent Reads Physical Reads Consistent Reads Physical Reads
2467 1916 1-2300 6404 6250
2114 1680 8-1980 6404 6250
2571 2017 1850-4250 6412 6250
3173 2391 28888-31850 6449 6250
2761 2160 82900-85478 6425 6250
16173 5775 984888-1000000 7260 6250

 

For the last range (984888-1000000) only, the optimizer opted for a full table scan for the bitmap index, whereas for all ranges, it opted for a full table scan for the B-tree index. This disparity is due to the clustering factor: The optimizer does not consider the value of the clustering factor when generating execution plans using a bitmap index, whereas for a B-tree index, it does. In this scenario, the bitmap index performs more efficiently than the B-tree index.

The following steps reveal more interesting facts about these indexes.

Step 5A (on TEST_NORMAL)

Create a bitmap index on the SAL column of the TEST_NORMAL table. This column has normal cardinality
SQL> create bitmap index normal_sal_bmx on test_normal(sal);
Index created.
SQL> analyze table test_normal compute statistics for table for all indexes for all indexed columns;
Table analyzed.

Now let's get the size of the index and the clustering factor.

SQL>select substr(segment_name,1,30) segment_name, bytes/1024/1024 "Size in MB"
  2* from user_segments
  3* where segment_name in ('TEST_NORMAL','NORMAL_SAL_BMX');
SEGMENT_NAME                 Size in MB
------------------------------              --------------
TEST_NORMAL                  50
NORMAL_SAL_BMX               4
SQL> select index_name, clustering_factor from user_indexes;
INDEX_NAME              CLUSTERING_FACTOR
------------------------------         ----------------------------------
NORMAL_SAL_BMX          6001

Now for the queries. First run them with equality predicates

SQL> set autot trace
SQL> select * from test_normal where sal=&sal;
Enter value for sal: 1869
old   1: select * from test_normal where sal=&sal
new   1: select * from test_normal where sal=1869
164 rows selected.
Elapsed: 00:00:00.08
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=39 Card=168 Bytes=4032)
   1    0   TABLE ACCESS (BY INDEX ROWID) OF 'TEST_NORMAL' (Cost=39 Card=168 Bytes=4032)
   2    1     BITMAP CONVERSION (TO ROWIDS)
   3    2       BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_SAL_BMX'
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        174  consistent gets
        172  physical reads
          0  redo size
       8461  bytes sent via SQL*Net to client
        609  bytes received via SQL*Net from client
         12  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        164  rows processed

and then with range predicates

SQL> select * from test_normal where sal between &sal1 and &sal2;
Enter value for sal1: 1500
Enter value for sal2: 2000
old   1: select * from test_normal where sal between &sal1 and &sal2
new   1: select * from test_normal where sal between 1500 and 2000
83743 rows selected.
Elapsed: 00:00:05.00
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=601 Card=83376 Bytes
          =2001024)
   1    0   TABLE ACCESS (FULL) OF 'TEST_NORMAL' (Cost=601 Card=83376
          Bytes=2001024)
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
      11770  consistent gets
       6235  physical reads
          0  redo size
    4123553  bytes sent via SQL*Net to client
      61901  bytes received via SQL*Net from client
       5584  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      83743  rows processed

Now drop the bitmap index and create a B-tree index on TEST_NORMAL
SQL> create index normal_sal_idx on test_normal(sal);
Index created.
SQL> analyze table test_normal compute statistics for table for all indexes for all indexed columns;
Table analyzed.

Take a look at the size of the index and the clustering factor

SQL> select substr(segment_name,1,30) segment_name, bytes/1024/1024 "Size in MB"
  2  from user_segments
  3  where segment_name in ('TEST_NORMAL','NORMAL_SAL_IDX');
SEGMENT_NAME          Size in MB
------------------------------       ---------------
TEST_NORMAL           50
NORMAL_SAL_IDX        17
SQL> select index_name, clustering_factor from user_indexes;
INDEX_NAME            CLUSTERING_FACTOR
------------------------------       ----------------------------------
NORMAL_SAL_IDX        986778

In the above table, you can see that this index is larger than the bitmap index on the same column. The clustering factor is also near the number of rows in this table.

Now for the tests; equality predicates first:

SQL> set autot trace
SQL> select * from test_normal where sal=&sal;
Enter value for sal: 1869
old   1: select * from test_normal where sal=&sal
new   1: select * from test_normal where sal=1869
164 rows selected.
Elapsed: 00:00:00.01
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=169 Card=168 Bytes=4032)
   1    0   TABLE ACCESS (BY INDEX ROWID) OF 'TEST_NORMAL' (Cost=169 Card=168 Bytes=4032)
   2    1     INDEX (RANGE SCAN) OF 'NORMAL_SAL_IDX' (NON-UNIQUE) (Cost=3 Card=168)
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        186 consistent gets
        173  physical reads
          0  redo size
       8461  bytes sent via SQL*Net to client
        609  bytes received via SQL*Net from client
         12  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        164  rows processed

...and then, range predicates
SQL> select * from test_normal where sal between &sal1 and &sal2;
Enter value for sal1: 1500
Enter value for sal2: 2000
old   1: select * from test_normal where sal between &sal1 and &sal2
new   1: select * from test_normal where sal between 1500 and 2000
83743 rows selected.
Elapsed: 00:00:04.03
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=601 Card=83376 Bytes
          =2001024)
   1    0   TABLE ACCESS (FULL) OF 'TEST_NORMAL' (Cost=601 Card=83376
          Bytes=2001024)
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
      11770  consistent gets
       6235  physical reads

          0  redo size
    4123553  bytes sent via SQL*Net to client
      61901  bytes received via SQL*Net from client
       5584  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      83743  rows processed

When the queries were executed for different set of values, the resulting output, as shown in the tables below, reveals that the numbers of consistent gets and physical reads are identical.

 

BITMAP
SAL (Equality)
B-TREE Rows Fetched
Consistent Reads Physical Reads Consistent Reads Physical Reads
165 0 1869 177 164  
169 163 3548 181 167  
174 166 6500 187 172  
75 69 7000 81 73  
177 163 2500 190 175  

 

 

 

BITMAP
SAL (Range)
B-TREE Rows Fetched
Consistent Reads Physical Reads Consistent Reads Physical Reads
11778 5850 1500-2000 11778 3891 83743
11765 5468 2000-2500 11765 3879 83328
11753 5471 2500-3000 11753 3884 83318
17309 5472 3000-4000 17309 3892 166999
39398 5454 4000-7000 39398 3973 500520

For range predicates the optimizer opted for a full table scan for all the different set of values—it didn't use the indexes at all—whereas for equality predicates, the optimizer used the indexes. Again, the consistent gets and physical reads are identical.

Consequently, you can conclude that for a normal-cardinality column, the optimizer decisions for the two types of indexes were the same and there were no significant differences between the I/Os.

Step 6 (add a GENDER column)

Before performing the test on a low-cardinality column, let's add a GENDER column to this table and update it with M, F, and null values.

SQL> alter table test_normal add GENDER varchar2(1);
Table altered.
SQL> select GENDER, count(*) from test_normal group by GENDER;
S     COUNT(*)
-     ----------
F     339920
M     360079
      300001
3 rows selected.

The size of the bitmap index on this column is around 570KB, as indicated in the table below:

SQL> create bitmap index normal_GENDER_bmx on test_normal(GENDER);
Index created.
Elapsed: 00:00:02.08
SQL> select substr(segment_name,1,30) segment_name, bytes/1024/1024 "Size in MB"
  2  from user_segments
  3  where segment_name in ('TEST_NORMAL','NORMAL_GENDER_BMX');
SEGMENT_NAME         Size in MB
------------------------------      ---------------
TEST_NORMAL          50
NORMAL_GENDER_BMX    .5625
2 rows selected.

SQL>  select * from test_normal where GENDER ='M';  
360079 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 4115571900
--------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name              | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                   | 10000 |   253K|  1092   (0)| 00:00:14 |
|   1 |  TABLE ACCESS BY INDEX ROWID | TEST_NORMAL       | 10000 |   253K|  1092   (0)| 00:00:14 |
|   2 |   BITMAP CONVERSION TO ROWIDS|                   |       |       |            |          |
|*  3 |    BITMAP INDEX SINGLE VALUE | NORMAL_GENDER_BMX |       |       |            |          |
--------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("GENDER"='M')
Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
      26114  consistent gets
       2249  physical reads
          0  redo size
   20037706  bytes sent via SQL*Net to client
     264474  bytes received via SQL*Net from client
      24007  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
     360079  rows processed

In contrast, the B-tree index on this column is 13MB in size, which is much bigger than the bitmap index on this column
SQL> create index normal_GENDER_idx on test_normal(GENDER);
Index created.
SQL> select substr(segment_name,1,30) segment_name, bytes/1024/1024 "Size in MB"
  2  from user_segments
  3  where segment_name in ('TEST_NORMAL','NORMAL_GENDER_IDX');
SEGMENT_NAME        Size in MB
------------------------------     ---------------
TEST_NORMAL         50
NORMAL_GENDER_IDX   13
2 rows selected.

Now, if we execute a query with equality predicates, the optimizer will not make use of this index, be it a bitmap or a B-tree. Rather, it will prefer a full table scan.

SQL>  select * from test_normal where GENDER ='F';
339920 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 512490529
---------------------------------------------------------------------------------
| Id  | Operation         | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |             |   339K|  8298K|  1715   (1)| 00:00:21 |
|*  1 |  TABLE ACCESS FULL| TEST_NORMAL |   339K|  8298K|  1715   (1)| 00:00:21 |
---------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("GENDER"='F')
Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
      28764  consistent gets
       6235  physical reads
          0  redo size
   18157885  bytes sent via SQL*Net to client
     249690  bytes received via SQL*Net from client
      22663  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
     339920  rows processed

Bitmap Index Operation

 

With a bitmap index on the GENDER column in place, create another bitmap index on the SAL column and then execute some queries. The queries will be re-executed with B-tree indexes on these columns.

 

From the TEST_NORMAL table, you need the employee number of all the male employees whose monthly salaries equal any of the following values:

                                  1000 1500 2000 2500 3000 3500 4000 4500

This is a typical data warehouse query, which, of course, you should never execute on an OLTP system. Here are the results with the bitmap index in place on both columns

SQL> select * from test_normal
  2  where sal in (1000,1500,2000,2500,3000,3500,4000,4500,5000) and GENDER='M';
525 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 3746873205
---------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name              | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |                   |   540 | 13500 |   126   (0)| 00:00:02 |
|   1 |  TABLE ACCESS BY INDEX ROWID  | TEST_NORMAL       |   540 | 13500 |   126   (0)| 00:00:02 |
|   2 |   BITMAP CONVERSION TO ROWIDS |                   |       |       |            |          |
|   3 |    BITMAP AND                 |                   |       |       |            |          |
|   4 |     BITMAP OR                 |                   |       |       |            |          |
|*  5 |      BITMAP INDEX SINGLE VALUE| NORMAL_SAL_BMX    |       |       |            |          |
|*  6 |      BITMAP INDEX SINGLE VALUE| NORMAL_SAL_BMX    |       |       |            |          |
|*  7 |      BITMAP INDEX SINGLE VALUE| NORMAL_SAL_BMX    |       |       |            |          |
|*  8 |      BITMAP INDEX SINGLE VALUE| NORMAL_SAL_BMX    |       |       |            |          |
|*  9 |      BITMAP INDEX SINGLE VALUE| NORMAL_SAL_BMX    |       |       |            |          |
|* 10 |      BITMAP INDEX SINGLE VALUE| NORMAL_SAL_BMX    |       |       |            |          |
|* 11 |      BITMAP INDEX SINGLE VALUE| NORMAL_SAL_BMX    |       |       |            |          |
|* 12 |      BITMAP INDEX SINGLE VALUE| NORMAL_SAL_BMX    |       |       |            |          |
|* 13 |      BITMAP INDEX SINGLE VALUE| NORMAL_SAL_BMX    |       |       |            |          |
|* 14 |     BITMAP INDEX SINGLE VALUE | NORMAL_GENDER_BMX |       |       |            |          |
---------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   5 - access("SAL"=1000)
   6 - access("SAL"=1500)
   7 - access("SAL"=2000)
   8 - access("SAL"=2500)
   9 - access("SAL"=3000)
  10 - access("SAL"=3500)
  11 - access("SAL"=4000)
  12 - access("SAL"=4500)
  13 - access("SAL"=5000)
  14 - access("GENDER"='M')
Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
        491  consistent gets
        479  physical reads
          0  redo size
      29191  bytes sent via SQL*Net to client
        793  bytes received via SQL*Net from client
         36  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        525  rows processed

And with the B-tree index in place

SQL> select * from test_normal
where sal in (1000,1500,2000,2500,3000,3500,4000,4500,5000) and GENDER='M';
525 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 3412228270
-------------------------------------------------------------------------------------------------------
| Id  | Operation                         | Name              | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                  |                   |   540 | 13500 |   794   (1)| 00:00:10 |
|   1 |  TABLE ACCESS BY INDEX ROWID      | TEST_NORMAL       |   540 | 13500 |   794   (1)| 00:00:10 |
|   2 |   BITMAP CONVERSION TO ROWIDS     |                   |       |       |            |          |
|   3 |    BITMAP AND                     |                   |       |       |            |          |
|   4 |     BITMAP OR                     |                   |       |       |            |          |
|   5 |      BITMAP CONVERSION FROM ROWIDS|                   |       |       |            |          |
|*  6 |       INDEX RANGE SCAN            | NORMAL_SAL_IDX    |  1500 |       |     3   (0)| 00:00:01 |
|   7 |      BITMAP CONVERSION FROM ROWIDS|                   |       |       |            |          |
|*  8 |       INDEX RANGE SCAN            | NORMAL_SAL_IDX    |  1500 |       |     3   (0)| 00:00:01 |
|   9 |      BITMAP CONVERSION FROM ROWIDS|                   |       |       |            |          |
|* 10 |       INDEX RANGE SCAN            | NORMAL_SAL_IDX    |  1500 |       |     3   (0)| 00:00:01 |
|  11 |      BITMAP CONVERSION FROM ROWIDS|                   |       |       |            |          |
|* 12 |       INDEX RANGE SCAN            | NORMAL_SAL_IDX    |  1500 |       |     3   (0)| 00:00:01 |
|  13 |      BITMAP CONVERSION FROM ROWIDS|                   |       |       |            |          |
|* 14 |       INDEX RANGE SCAN            | NORMAL_SAL_IDX    |  1500 |       |     3   (0)| 00:00:01 |
|  15 |      BITMAP CONVERSION FROM ROWIDS|                   |       |       |            |          |
|* 16 |       INDEX RANGE SCAN            | NORMAL_SAL_IDX    |  1500 |       |     3   (0)| 00:00:01 |
|  17 |      BITMAP CONVERSION FROM ROWIDS|                   |       |       |            |          |
|* 18 |       INDEX RANGE SCAN            | NORMAL_SAL_IDX    |  1500 |       |     3   (0)| 00:00:01 |
|  19 |      BITMAP CONVERSION FROM ROWIDS|                   |       |       |            |          |
|* 20 |       INDEX RANGE SCAN            | NORMAL_SAL_IDX    |  1500 |       |     3   (0)| 00:00:01 |
|  21 |      BITMAP CONVERSION FROM ROWIDS|                   |       |       |            |          |
|* 22 |       INDEX RANGE SCAN            | NORMAL_SAL_IDX    |  1500 |       |     3   (0)| 00:00:01 |
|  23 |     BITMAP CONVERSION FROM ROWIDS |                   |       |       |            |          |
|* 24 |      INDEX RANGE SCAN             | NORMAL_GENDER_IDX |  1500 |       |   657   (1)| 00:00:08 |
-------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   6 - access("SAL"=1000)
   8 - access("SAL"=1500)
  10 - access("SAL"=2000)
  12 - access("SAL"=2500)
  14 - access("SAL"=3000)
  16 - access("SAL"=3500)
  18 - access("SAL"=4000)
  20 - access("SAL"=4500)
  22 - access("SAL"=5000)
  24 - access("GENDER"='M')
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       1147  consistent gets
       1129  physical reads
          0  redo size
      29191  bytes sent via SQL*Net to client
        793  bytes received via SQL*Net from client
         36  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        525  rows processed

 

As you can see here, with the B-tree index, the optimizer opted for  BITMAP CONVERSION FROM ROWIDS operation, even you don't have any bitmap index on this table . This's usually cauesd by create an index on a low distiinct value column. CBO would transform this index access to bitmap before fecth data , as it consider it's lower cost.

Conclusion

In summary, bitmap indexes are best suited for DSS regardless of cardinality for these reasons:

  • With bitmap indexes, the optimizer can efficiently answer queries that include AND, OR, or XOR. (Oracle supports dynamic B-tree-to-bitmap conversion, but it can be inefficient, as aboved)

  • With bitmaps, the optimizer can answer queries when searching or counting for nulls. Null values are also indexed in bitmap indexes (unlike B-tree indexes).

  • Most important, bitmap indexes in DSS systems support ad hoc queries, whereas B-tree indexes do not. More specifically, if you have a table with 50 columns and users frequently query on 10 of them— either the combination of all 10 columns or sometimes a single column—creating a B-tree index will be very difficult. If you create 10 bitmap indexes on all these columns, all the queries can be answered by these indexes, whether they are queries on all 10 columns, on 4 or 6 columns out of the 10, or on a single column. The AND_EQUAL hint provides this functionality for B-tree indexes, but no more than five indexes can be used by a query. This limit is not imposed with bitmap indexes.

In contrast, B-tree indexes are well suited for OLTP applications in which users' queries are relatively routine (and well tuned before deployment in production), as opposed to ad hoc queries, which are much less frequent and executed during nonpeak business hours. Because data is frequently updated in and deleted from OLTP applications, bitmap indexes can cause a serious locking problem in these situations.

The data here is fairly clear. Both indexes have a similar purpose: to return results as fast as possible. But your choice of which one to use should depend purely on the type of application, not on the level of cardinality.

 

参考至:http://www.oracle.com/technetwork/articles/sharma-indexes-093638.html

               http://www.xifenfei.com/1531.html

本文原创,转载请注明出处、作者

如有错误,欢迎指正

邮箱:czmcj@163.com

0
0
分享到:
评论

相关推荐

    B-tree bitmap index

    ### B-树索引与位图索引的深入解析 #### B-树索引概述 B-树索引是Oracle数据库中最常用的索引类型之一,它利用B-树的数据结构来组织索引项,以便快速查找数据。B-树是一种自平衡的树形数据结构,每个节点最多可以...

    位图索引与 B-tree 索引:选择与时间

    **位图索引**(Bitmap Index)是一种特殊类型的索引,它使用位图来表示数据。每个位对应数据库表中的一个特定值,如果某个记录包含这个值,那么相应的位就被设置为1,否则为0。例如,如果表中有100万行,有三个不同...

    The case for learned index structures-2018sigmod.pdf

    传统的索引结构,如B-Tree、Hash-Index和BitMap-Index,各自有其独特的功能和适用场景:B-Tree适用于范围查询,通过有序排列键值来快速定位记录;Hash-Index则在单个键查找时表现出色,通过哈希函数实现快速映射;而...

    分布式图数据库NebulaGraph的Index实践

    不同的数据库系统有不同的排序结构,目前常见的索引实现类型如B-Treeindex、B+-Treeindex、B*-Treeindex、Hashindex、Bitmapindex、Invertedindex等等,各种索引类型都有各自的排序算法。虽然索引可以带来更高的查询...

    BitmapIndexInternal

    CREATE BITMAP INDEX index1 ON property (bedrooms); CREATE BITMAP INDEX index2 ON property (receptions); CREATE BITMAP INDEX index3 ON property (garages); ``` 例如,如果要查询具有4个卧室、3个客厅和2个...

    oracle 索引类型

    五、Index-Organized Table (IOT) IOT是一种特殊的表类型,其数据和索引存储在同一结构中。适合插入密集型操作和全索引扫描的场景。 优点: 1. IOT对于全索引扫描非常快。 2. 减少了对主键的查找,提高了插入速度。...

    oracle index学习总结

    5. Clustered索引:在物理存储上将数据按索引顺序排列,Oracle默认使用非聚簇索引,但可以通过创建索引组织表(Index-Organized Table, IOT)实现类似效果。 二、索引创建与管理 创建索引的基本语法是`CREATE INDEX...

    HIT邹老师数据库实验三

    2. **索引类型**:常见的索引类型包括B-Tree、Hash、Bitmap和R-Tree等。B-Tree是最常见的一种,适用于大多数场景;Hash索引则在等值查询时表现出色;Bitmap索引适合于大数据量下的多条件查询;R-Tree用于处理多维...

    bit map index

    ### Bit Map Index (Bitmap Index) 知识点详解 #### 1. 概念与应用场景 Bitmap Index(位图索引)是一种特殊的索引类型,主要用于处理那些具有较低基数(即不同值的数量较少)的列。传统上,数据库系统如Oracle...

    Oracle经典面试总结-去重-附答案.pdf

    常见的索引类型有 B-tree index、bitmap index、function index、partition index(local/global)。索引的结构可以分为 B-tree 和 bitmap 两种,B-tree 索引适合范围查询,bitmap 索引适合等值查询。 _回滚段_ ...

    Oracle索引(B*tree与Bitmap)的学习总结

    Oracle数据库中的索引是提升查询效率的关键工具,主要包括B*Tree索引和Bitmap索引两种类型。本文将深入探讨这两种索引的特性和应用场景。 首先,B*Tree(B星树)索引是最常见的索引类型,适用于大量数据的增、删、...

    oracle索引介绍(图文详解)

    1. B-Tree索引:这是最常见的索引类型,包括正常B-Tree索引和倒序B-Tree索引。B-Tree是一种自平衡的多路搜索树,每个节点包含多个键和指针,确保查找效率较高。正常B-Tree索引按升序存储键值,而倒序B-Tree索引则按...

    oracleinde学习总结.docx

    Oracle Index 学习总结 一、索引概述 索引是 Oracle 中用于提高查询速度的一种机制。索引需要储存空间和 I/O 操作。索引的目的是加快 Select 速度,但 Insert、Update、Delete 数据时 Oracle 会同时对索引进行相应...

    java相关软件

    - **创建B-Tree索引**:基本语法为`CREATE [UNIQUE] INDEX index_name ON table_name (column_name [ASC|DESC]) [REVERSE];`,其中`REVERSE`关键字表示创建反向键索引。 - **创建唯一索引**:通过`CREATE UNIQUE ...

    Oracle索引.docx

    CREATE BITMAP INDEX index_name ON table_name (column1, column2, ...); ``` 索引创建后,可能需要对其进行修改或重命名。修改索引的名字可以使用`ALTER INDEX`语句,如: ```sql ALTER INDEX index_name1 ...

    oracle_partition_index.zip_partition

    - B树索引(B-Tree Index):最常见的索引类型,适用于等值查询。 - 唯一索引(Unique Index):确保索引列的值是唯一的。 - 位图索引(Bitmap Index):适用于低基数(即重复值多)的列,适用于联接查询。 - ...

    Oracle Index 索引介绍

    1. **B树索引(B-Tree Index)** B树索引是最常见的索引类型,适用于等值查询。Oracle使用自平衡的B树结构存储索引键值,每个索引条目指向表中的一个数据块。 2. **位图索引(Bitmap Index)** 位图索引主要用于...

    ORACLE 数据对象的分析2(索引_序列_同义词)

    - **概念**:B-Tree索引是最常见的索引类型,它是一种自平衡的树结构,能够确保数据项按键值排序。 - **特点**:适合范围查询和精确匹配查询,能够有效地减少磁盘I/O操作次数。 - **函数索引** - **定义**:函数...

    Relational Database Index Design and the Optimizers

    1. 索引类型:常见的索引类型包括B树(B-Tree)、哈希索引(Hash Index)、位图索引(Bitmap Index)和全文索引(Full-Text Index)。B树适用于范围查询和排序,哈希索引适用于等值查询,位图索引适合于大量重复值和...

    深入理解mysql索引底层.zip

    1. B-Tree索引:最常见的索引类型,适用于等值查询,包括主键和唯一性索引。B-Tree结构保证了数据的有序性,查找、插入和删除操作的时间复杂度为O(log n)。 2. Hash索引:适用于等值查询,但不支持范围查询。它的...

Global site tag (gtag.js) - Google Analytics