Overview
Table and pipelined functions are often used as simple rowsources (for example, SELECT * FROM TABLE(dbms_xplan.display)). In these cases, cardinality is not particularly useful as there is only one possible execution plan. However, there might be times when we need to join a pipelined or table function to a set of other rowsources (tables, views etc). At this point having cardinality statistics for the function can be critical to enable the CBO to derive an efficient execution plan. This is what we will focus on in this article by demonstrating several methods for setting cardinality statistics for pipelined and table functions.
We will cover the following methods for setting table/pipelined function cardinality in this article:
CARDINALITY hint (9i+) undocumented;
OPT_ESTIMATE hint (10g+) undocumented;
DYNAMIC_SAMPLING hint (11.1.0.7+);
Extensible Optimiser (10g+).
Note that the first two methods use undocumented hints. These hints are used internally by Oracle but they haven't been included in the public documentation. For this reason, they should not be used in production code. They are included in this article for full coverage of this subject and not as recommended solutions.
The examples in this article were run on an 11.1.0.7 database, but the version information above indicates the Oracle versions that support each method.
sample pipelined function
The majority of the examples in this article will be based on a join between the DEPARTMENTS and EMPLOYEES data in the supplied HR schema. The EMPLOYEES data will be generated by a pipelined function as an alternative rowsource to use in the queries. We will begin by creating the object and collection types to define our pipelined function's record structure and return type, as follows.
SQL> CREATE TYPE employees_ot AS OBJECT
2 ( employee_id NUMBER(6)
3 , first_name VARCHAR2(20)
4 , last_name VARCHAR2(25)
5 , email VARCHAR2(25)
6 , phone_number VARCHAR2(20)
7 , hire_date DATE
8 , job_id VARCHAR2(10)
9 , salary NUMBER(8,2)
10 , commission_pct NUMBER(2,2)
11 , manager_id NUMBER(6)
12 , department_id NUMBER(4)
13 );
14 /
Type created.
SQL> CREATE TYPE employees_ntt AS TABLE OF employees_ot;
2 /
Type created.
We will now create a pipelined function, as follows.
SQL> CREATE FUNCTION employees_piped
2 RETURN employees_ntt PIPELINED AS
3 BEGIN
4 FOR r IN (SELECT * FROM employees) LOOP
5 FOR i IN 1 .. 200 LOOP
6 PIPE ROW (employees_ot(
7 r.employee_id, r.first_name, r.last_name,
8 r.email, r.phone_number, r.hire_date,
9 r.job_id, r.salary, r.commission_pct,
10 r.manager_id, r.department_id));
11 END LOOP;
12 END LOOP;
13 RETURN;
14 END employees_piped;
15 /
Function created.
Note that this function pipes over 20,000 records by inflating the EMPLOYEES dataset by a factor of 200 (this volume of data will show the impact of using correct cardinality statistics more clearly). The duplicate data and the fact that there are no complex transformations (which is usually the reason for using a pipelined function in the first place) is irrelevant in this case as we simply need a representative rowsource for our examples.
default cardinality
We will begin by investigating how the CBO treats pipelined functions when optimising SQL. First we will query our function to see how many rows it actually generates, as follows.
SQL> SELECT COUNT(*)
2 FROM TABLE(employees_piped);
COUNT(*)
----------
21400
1 row selected.
Our function pipes 21,400 employees records. Using Autotrace to generate an Explain Plan, we will re-query the function to see how many rows the CBO thinks it returns.
SQL> set autotrace traceonly explain
SQL> SELECT
2 FROM TABLE(employees_piped);
Execution Plan
----------------------------------------------------------
Plan hash value: 825990498
--------------------------------------------------------------------- ...
| Id | Operation | Name | Rows | ...
--------------------------------------------------------------------- ...
| 0 | SELECT STATEMENT | | 8168 | ...
| 1 | COLLECTION ITERATOR PICKLER FETCH| EMPLOYEES_PIPED | | ...
--------------------------------------------------------------------- ...
We can see that the CBO uses a cardinality of 8,168 rows for our pipelined function. This is a heuristic value (rather than an estimate) that the optimiser uses for all table/pipelined functions. This default value appears to be dependant on the default database block size (this example is from a database with an 8kb block size). On databases with larger default block sizes, the CBO will use a larger heuristic value for the cardinality.
Given that the optimiser uses a default (and usually incorrect) value for cardinality, it follows that this could lead to sub-optimal execution plans. We will test this assertion by joining our EMPLOYEES_PIPED function to the DEPARTMENTS table, as follows.
SQL> SELECT *
2 FROM departments d
3 , TABLE(employees_piped) e
4 WHERE d.department_id = e.department_id;
Execution Plan
----------------------------------------------------------
Plan hash value: 178575256
----------------------------------------------------------------------- ...
| Id | Operation | Name | Rows | ...
----------------------------------------------------------------------- ...
| 0 | SELECT STATEMENT | | 8168 | ...
| 1 | MERGE JOIN | | 8168 | ...
| 2 | TABLE ACCESS BY INDEX ROWID | DEPARTMENTS | 27 | ...
| 3 | INDEX FULL SCAN | DEPT_ID_PK | 27 | ...
|* 4 | SORT JOIN | | 8168 | ...
| 5 | COLLECTION ITERATOR PICKLER FETCH| EMPLOYEES_PIPED | | ...
----------------------------------------------------------------------- ...
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("D"."DEPARTMENT_ID"=VALUE(KOKBF$))
filter("D"."DEPARTMENT_ID"=VALUE(KOKBF$))
With a cardinality of 8,168 for the pipelined function, the optimiser has chosen a sort-merge join. It is unlikely that this is the most efficient method for joining these two rowsources, so we will now demonstrate the effect of setting correct cardinality statistics for our pipelined function, using each of the four available methods in turn.
method 1: cardinality hint (undocumented)
The CARDINALITY hint is simple to use (although quite limited) yet for some reason remains undocumented. As its name suggests, this hint enables us to supply the cardinality for a rowsource in a query. Despite the fact that we shouldn't use this hint in production code, we will see the effect it has on our query below. We will supply the exact rowcount of our pipelined function in this case, though in practice we would probably only ever use an estimated value.
SQL> SELECT /*+ CARDINALITY(e, 21400) */ *
2 FROM departments d
3 , TABLE(employees_piped) e
4 WHERE d.department_id = e.department_id;
Execution Plan
----------------------------------------------------------
Plan hash value: 3250313570
---------------------------------------------------------------------- ...
| Id | Operation | Name | Rows | ...
---------------------------------------------------------------------- ...
| 0 | SELECT STATEMENT | | 21400 | ...
|* 1 | HASH JOIN | | 21400 | ...
| 2 | TABLE ACCESS FULL | DEPARTMENTS | 27 | ...
| 3 | COLLECTION ITERATOR PICKLER FETCH| EMPLOYEES_PIPED | | ...
---------------------------------------------------------------------- ...
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("D"."DEPARTMENT_ID"=VALUE(KOKBF$))
Using the hint, we have supplied the exact cardinality of our pipelined function to the CBO. With a more suitable cardinality, the optimiser has chosen an alternative access path (a hash join in this case). This clearly shows the importance of supplying representative statistics, rather than relying on defaults.
method 2: opt_estimate hint (undocumented)
The second method uses the OPT_ESTIMATE hint. This hint, although undocumented, is used by Oracle's SQL Profile feature to correct the cardinalities used by the optimiser. The OPT_ESTIMATE hint is used in SQL Profiles to scale the CBO's estimates up or down according to the correction factors it determined during a SQL Tuning Task (note that during a SQL Tuning Task, Oracle compares actual rowsource cardinalities for a query with the estimates it used to derive the original plan. A SQL Profile stores the corrections).
The OPT_ESTIMATE hint can be used to re-factor the cardinalities for tables (this includes views and table functions), indexes and joins. We will use it to correct our pipelined function cardinality to its exact rowcount, as follows.
SQL> SELECT /*+ OPT_ESTIMATE(table, e, scale_rows=2.62) */
2 *
3 FROM departments d
4 , TABLE(employees_piped) e
5 WHERE d.department_id = e.department_id;
Execution Plan
----------------------------------------------------------
Plan hash value: 3250313570
---------------------------------------------------------------------- ...
| Id | Operation | Name | Rows | ...
---------------------------------------------------------------------- ...
| 0 | SELECT STATEMENT | | 21400 | ...
|* 1 | HASH JOIN | | 21400 | ...
| 2 | TABLE ACCESS FULL | DEPARTMENTS | 27 | ...
| 3 | COLLECTION ITERATOR PICKLER FETCH| EMPLOYEES_PIPED | | ...
---------------------------------------------------------------------- ...
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("D"."DEPARTMENT_ID"=VALUE(KOKBF$))
With this hint, we have told the CBO to scale its cardinality estimates for our pipelined function by a factor of 2.62 (this is 21400/8168 to 2 decimal places but is probably more precise than we needed to be). Note that we defined this rowsource as a "table" and referenced it by its alias in the query. As we can see, the result is that Oracle uses a hash join, as it did when we used the CARDINALITY hint above. This is to be expected of course, because we have supplied the same information to the optimiser, albeit by a different method.
method 3: dynamic_sampling hint
The dynamic sampling feature has been available since 9i Release 2 but only supports pipelined and table functions from 11.1.0.7 onwards (at sampling levels 2 to 10). Note that the CBO will only sample a pipelined or table function if dynamic sampling is requested by using the DYNAMIC_SAMPLING hint and not with the OPTIMIZER_DYNAMIC_SAMPLING parameter setting. With this in mind, we will use the hint in our standard query, as follows.
SQL> set autotrace traceonly explain
SQL> SELECT /*+ DYNAMIC_SAMPLING(e, 5) */
2 *
3 FROM departments d
4 , TABLE(employees_piped) e
5 WHERE d.department_id = e.department_id;
Execution Plan
----------------------------------------------------------
Plan hash value: 3250313570
---------------------------------------------------------------------- ...
| Id | Operation | Name | Rows | ...
---------------------------------------------------------------------- ...
| 0 | SELECT STATEMENT | | 21400 | ...
|* 1 | HASH JOIN | | 21400 | ...
| 2 | TABLE ACCESS FULL | DEPARTMENTS | 27 | ...
| 3 | COLLECTION ITERATOR PICKLER FETCH| EMPLOYEES_PIPED | | ...
---------------------------------------------------------------------- ...
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("D"."DEPARTMENT_ID"=VALUE(KOKBF$))
Note
-----
- dynamic sampling used for this statement
The notes section of the DBMS_XPLAN output tells us that dynamic sampling was used for our query. We used a sampling level of 5 and interestingly the CBO has the exact cardinality for our pipelined function (and hence it switches to a hash join again). If we repeat the query with a dynamic sampling level of just 2 and trace the CBO using the 10053 event, we can see quite clearly how the CBO can be so accurate at lower sampling levels.
SQL> ALTER SESSION SET EVENTS '10053 trace name context forever';
Session altered.
SQL> SELECT /*+ DYNAMIC_SAMPLING(e, 2) */
2 *
3 FROM departments d
4 , TABLE(employees_piped) e
5 WHERE d.department_id = e.department_id;
Execution Plan
----------------------------------------------------------
Plan hash value: 3250313570
---------------------------------------------------------------------- ...
| Id | Operation | Name | Rows | ...
---------------------------------------------------------------------- ...
| 0 | SELECT STATEMENT | | 21400 | ...
|* 1 | HASH JOIN | | 21400 | ...
| 2 | TABLE ACCESS FULL | DEPARTMENTS | 27 | ...
| 3 | COLLECTION ITERATOR PICKLER FETCH| EMPLOYEES_PIPED | | ...
---------------------------------------------------------------------- ...
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("D"."DEPARTMENT_ID"=VALUE(KOKBF$))
Note
-----
- dynamic sampling used for this statement
SQL> set autotrace off
SQL> ALTER SESSION SET EVENTS '10053 trace name context off';
Session altered.
Even at level 2, the CBO has the exact cardinality. The reason why is in the trace file, as the following excerpt demonstrates.
***************************************
SINGLE TABLE ACCESS PATH
Single Table Cardinality Estimation for KOKBF$[KOKBF$]
*** 2009-06-11 18:15:51.109
** Performing dynamic sampling initial checks. **
** Dynamic sampling initial checks returning TRUE (level = 2).
** Dynamic sampling updated table stats.: blocks=33
<<...snip...>>
*** 2009-06-11 18:15:51.140
** Executed dynamic sampling query:
level : 2
sample pct. : 100.000000
actual sample size : 21400
filtered sample card. : 21400
orig. card. : 8168
block cnt. table stat. : 33
block cnt. for sampling: 33
max. sample block cnt. : 64
sample block cnt. : 33
min. sel. est. : -1.00000000
** Using dynamic sampling card. : 21400
** Dynamic sampling updated table card.
Table: KOKBF$ Alias: KOKBF$
Card: Original: 21400.000000 Rounded: 21400 Computed: 21400.00 Non Adjusted: 21400.00
***************************************
As highlighted above, dynamic sampling for pipelined functions at level 2 or above means that a 100% sample of the rowsource is taken, regardless of the sampling level specified in the hint. For pipelined functions with small resultsets, this 100% sample might be trivial compared with the effect that the wrong cardinality could have on the overall execution plan.
For pipelined functions that process large volumes of data, however, dynamic sampling could be more costly because it increases the workload during parsing. Despite this, it doesn't mean that the workload (and therefore time to completion) is doubled just because the function is queried twice. We can use SQL Trace to demonstrate this, by measuring the additional workload that dynamic sampling generates. In the following example, we will query our pipelined function with SQL Trace set and a dynamic sampling level of 2.
SQL> ALTER SESSION SET EVENTS '10046 trace name context forever, level 12';
Session altered.
SQL> set autotrace traceonly
SQL> SELECT /*+ DYNAMIC_SAMPLING(e, 2) */ *
2 FROM TABLE(employees_piped) e;
21400 rows selected.
<<...snip...>>
SQL> ALTER SESSION SET EVENTS '10046 trace name context off';
Session altered.
The resulting trace file (formatted with TKProf) shows two SQL statements for our query. First, we can see the query itself, as follows.
SELECT /*+ DYNAMIC_SAMPLING(e, 2) */ *
FROM TABLE(employees_piped) e
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 44 0.15 0.15 0 0 0 21400
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 46 0.15 0.15 0 0 0 21400
Misses in library cache during parse: 0
Optimizer mode: ALL_ROWS
Parsing user id: 82
Rows Row Source Operation
------- ---------------------------------------------------
21400 COLLECTION ITERATOR PICKLER FETCH EMPLOYEES_PIPED (cr=8 pr=0 pw=0 time=0 us)
We can see that this query completed extremely quickly (this was due to having Autotrace set to traceonly so that the resultset wasn't spooled to screen and a SQL*Plus arraysize of 500). The tracefile also contains a second query of our pipelined function, as follows.
SQL ID: 82w2n8jb84umg
Plan Hash: 3623837808
SELECT /* OPT_DYN_SAMP */ /*+ ALL_ROWS IGNORE_WHERE_CLAUSE
NO_PARALLEL(SAMPLESUB) opt_param('parallel_execution_enabled', 'false')
NO_PARALLEL_INDEX(SAMPLESUB) NO_SQL_TUNE */ NVL(SUM(C1),:"SYS_B_0"),
NVL(SUM(C2),:"SYS_B_1")
FROM
(SELECT /*+ NO_PARALLEL("KOKBF$") FULL("KOKBF$") NO_PARALLEL_INDEX("KOKBF$")
*/ :"SYS_B_2" AS C1, :"SYS_B_3" AS C2 FROM TABLE("HR"."EMPLOYEES_PIPED"())
"KOKBF$") SAMPLESUB
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 1 0.03 0.03 0 0 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 3 0.03 0.03 0 0 0 1
Misses in library cache during parse: 0
Optimizer mode: ALL_ROWS
Parsing user id: 82 (recursive depth: 1)
Rows Row Source Operation
------- ---------------------------------------------------
1 SORT AGGREGATE (cr=8 pr=0 pw=0 time=0 us)
21400 COLLECTION ITERATOR PICKLER FETCH EMPLOYEES_PIPED (cr=8 pr=0 pw=0 time=0 us)
This recursive query was executed by Oracle because we used the DYNAMIC_SAMPLING hint. We can see that it took less time and fewer resources than our actual query, yet it accessed the same number of rows. The workload was reduced because Oracle was simply trying to get the pipelined function's rowcount. This means that it was able to use an aggregate query without projecting any of the function's return attributes. Instead, it used bind variable expressions. The raw trace file tells us that the values of the bind variables were 0 for the default NVL arguments in the outer query block and 1 for the C1 and C2 expressions in the in-line view. By using an in-line view, SUM and NVL, Oracle executed a more complicated version of a COUNT(*) from the function to determine its rowcount.
method 4: extensible optimiser
So far in this article, we have seen three hints that can be used to ensure the CBO has accurate cardinalities for our table or pipelined functions. However, two of these hints are unsupported and the third is potentially costly. Since 10g Release 1, however, there has been a fully supported and lightweight method for supplying cardinality to the optimiser; this being the Extensible Optimiser.
The Extensible Optimiser is implemented by an Oracle Data Cartridge (which is essentially an object type, known as an interface type, that contains one or more well-defined and structured methods). This feature enables us to design our own cardinality calculations (as a prescribed method in the interface type) and then associate them with our table or pipelined functions. The type's cardinality method is invoked by the CBO during query optimisation to determine the rowcount for the pipelined or table function. We will see examples of this below.
enabling the extensible optimiser
The Extensible Optimiser has a range of well-defined methods for calculating various statistics for functions, but the one we are interested in is the ODCIStatsTableFunction method. To use the Extensible Optimiser with a table or pipelined function, we require three components:
a table or pipelined function;
an object type;
an association between the function and the type.
We will create each of these components below.
(1) pipelined function
First, we'll modify our EMPLOYEES_PIPED pipelined function to include a P_NUM_ROWS parameter, as follows.
SQL> CREATE OR REPLACE FUNCTION employees_piped(
2 p_num_rows IN NUMBER
3 ) RETURN employees_ntt PIPELINED AS
4 BEGIN
5 FOR r IN (SELECT * FROM employees) LOOP
6 FOR i IN 1 .. 200 LOOP
7 PIPE ROW (employees_ot(
8 r.employee_id, r.first_name, r.last_name,
9 r.email, r.phone_number, r.hire_date,
10 r.job_id, r.salary, r.commission_pct,
11 r.manager_id, r.department_id));
12 END LOOP;
13 END LOOP;
14 RETURN;
15 END employees_piped;
16 /
Function created.
Note that the P_NUM_ROWS parameter doesn't appear in the function body at all. Instead, it will be used by the CBO as we will see shortly.
(2) interface type
We will now create an interface object type, beginning with the specification, as follows.
SQL> CREATE OR REPLACE TYPE pipelined_stats_ot AS OBJECT (
2
3 dummy_attribute NUMBER,
4
5 STATIC FUNCTION ODCIGetInterfaces (
6 p_interfaces OUT SYS.ODCIObjectList
7 ) RETURN NUMBER,
8
9 STATIC FUNCTION ODCIStatsTableFunction (
10 p_function IN SYS.ODCIFuncInfo,
11 p_stats OUT SYS.ODCITabFuncStats,
12 p_args IN SYS.ODCIArgDescList,
13 p_num_rows IN NUMBER
14 ) RETURN NUMBER
15
16 );
17 /
Type created.
Some important points to note about this are as follows:
Line 3: object types must have at least one attribute, even if it is not needed in the implementation;
Lines 5, 9: these method names are prescribed by Oracle and we must use them. There are a number of different methods that we can use for the Extensible Optimiser. In our case, we are only using the method for table function cardinality (ODCIStatsTableFunction). The ODCIGetInterfaces method is mandatory for all interface types;
Lines 10-13: the parameter positions and types for ODCIStatsTableFunction are also prescribed by Oracle. There is one exception to this. Note the highlighted line after the P_ARGS parameter on line 12. Here, we must include all of the parameters of our associated table or pipelined function(s). In the case of EMPLOYEES_PIPED, the only parameter is P_NUM_ROWS, which we have included in our method as required, but interface type methods can cater for more than one user-parameter if required.
The interface type body is where we code our cardinality calculation, as follows.
SQL> CREATE TYPE BODY pipelined_stats_ot AS
2
3 STATIC FUNCTION ODCIGetInterfaces (
4 p_interfaces OUT SYS.ODCIObjectList
5 ) RETURN NUMBER IS
6 BEGIN
7 p_interfaces := SYS.ODCIObjectList(
8 SYS.ODCIObject ('SYS', 'ODCISTATS2')
9 );
10 RETURN ODCIConst.success;
11 END ODCIGetInterfaces;
12
13 STATIC FUNCTION ODCIStatsTableFunction (
14 p_function IN SYS.ODCIFuncInfo,
15 p_stats OUT SYS.ODCITabFuncStats,
16 p_args IN SYS.ODCIArgDescList,
17 p_num_rows IN NUMBER
18 ) RETURN NUMBER IS
19 BEGIN
20 p_stats := SYS.ODCITabFuncStats(p_num_rows);
21 RETURN ODCIConst.success;
22 END ODCIStatsTableFunction;
23
24 END;
25 /
Type body created.
As stated earlier, the ODCIGetInterfaces method is mandatory and so is its implementation, as shown. The ODCIStatsTableFunction is the method where we can be creative, although in fact our implementation is very simple. Remember that we included a P_NUM_ROWS parameter in our pipelined function. We didn't use it in the function itself. Instead, we have simply taken this parameter and passed it straight through to the CBO via the interface type, as highlighted above (on line 20).
(3) association
The interface type is the bridge between the table or pipelined function and the CBO. The ODCIStatsTableFunction method simply picks up the parameter we pass to our pipelined function, optionally uses it to calculate a cardinality value and then passes it on to the CBO. For Oracle to be able to do this, however, we require a third and final component; that is, the association between the pipelined function and the interface type. We do this as follows.
SQL> ASSOCIATE STATISTICS WITH FUNCTIONS employees_piped USING pipelined_stats_ot;
Statistics associated.
With this command, our pipelined function and interface type are now directly linked. Incidentally, our type could also be associated with other functions (assuming they also had a single parameter named P_NUM_ROWS).
testing the extensible optimiser
We have now completed our setup for the Extensible Optimiser. To test it, we will repeat our sample query but without any hints, as follows.
SQL> set autotrace traceonly explain
SQL> SELECT *
2 FROM departments d
3 , TABLE(employees_piped(21400)) e
4 WHERE d.department_id = e.department_id;
Execution Plan
----------------------------------------------------------
Plan hash value: 3250313570
---------------------------------------------------------------------- ...
| Id | Operation | Name | Rows | ...
---------------------------------------------------------------------- ...
| 0 | SELECT STATEMENT | | 21400 | ...
|* 1 | HASH JOIN | | 21400 | ...
| 2 | TABLE ACCESS FULL | DEPARTMENTS | 27 | ...
| 3 | COLLECTION ITERATOR PICKLER FETCH| EMPLOYEES_PIPED | | ...
---------------------------------------------------------------------- ...
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("D"."DEPARTMENT_ID"=VALUE(KOKBF$))
As we can see, the CBO has picked up the cardinality value that we passed to our pipelined function and used it to optimise our SQL. If we trace the optimisation of our query with a 10053 event, we can see further evidence that our interface type is being used by the CBO, as follows.
Access path analysis for KOKBF$
***************************************
SINGLE TABLE ACCESS PATH
Single Table Cardinality Estimation for KOKBF$[KOKBF$]
Calling user-defined function card function...
Bind :3 Value 21400
HR.PIPELINED_STATS_OT.ODCIStatsTableFunction returned:
num_rows : 21400
Table: KOKBF$ Alias: KOKBF$
Card: Original: 21400.000000 Rounded: 21400 Computed: 21400.00 Non Adjusted: 21400.00
Access Path: TableScan
Cost: 29.29 Resp: 29.29 Degree: 0
Cost_io: 29.00 Cost_cpu: 6481984
Resp_io: 29.00 Resp_cpu: 6481984
Best:: AccessPath: TableScan
Cost: 29.29 Degree: 1 Resp: 29.29 Card: 21400.00 Bytes: 0
***************************************
This trace file is from an 11.1.0.7 instance. We can see that the optimiser executes the ODCIStatsTableFunction method in our interface type and receives the correct cardinality in return. Note that in 10g (and possibly 11.1.0.6), the 10053 trace file includes the full PL/SQL block that the optimiser uses to execute our ODCIStatsTableFunction method.
benefits of the extensible optimiser method
Overall, we can see that the Extensible Optimiser is a useful, accurate and, importantly, a supported method for supplying cardinality to the CBO. Its main benefit over the DYNAMIC_SAMPLING hint (the only other supported method at the time of writing) is that it doesn't execute the pipelined function itself, just the ODCIStatsTableFunction method.
In our example, we have simply exploited the Extensible Optimiser feature to create our own alternative to the CARDINALITY hint. Apart from being a supported method, another benefit over the CARDINALITY hint is that initial value of P_NUM_ROWS could be passed as a variable rather than hard-coded as above (it must be known in advance when used in the CARDINALITY hint).
an alternative implementation
For an alternative implementation of our Extensible Optimiser method, we could remove the P_NUM_ROWS parameter and instead use a lookup table to store representative cardinalities for all of our table or pipelined functions. A single interface type could be associated to all functions, with the ODCIStatsTableFunction method looking up the cardinality based on the executing function name (which is also known to the interface type). With this technique, we could avoid hard-coding cardinalities and modify them over time in the lookup table as needed.
the extensible optimiser, table functions and variable in-lists
So far, our examples have all been based on the EMPLOYEES_PIPED pipelined function. We will complete this article with an example of a table function. Table functions are commonly used as a mechanism to bind variable in-lists passed as collections into SQL queries (for an example, see this article). Using the Extensible Optimiser, we can devise a generic way to determine the cardinalities of all table functions used in variable in-list queries.
First, we will create a simple collection type to support any variable in-list of string values.
SQL> CREATE OR REPLACE TYPE varchar2_ntt AS TABLE OF VARCHAR2(4000);
2 /
Type created.
Second, we will create a small function that will receive and return a collection of our generic VARCHAR2_NTT type. This function does nothing with the collection itself; it is merely a wrapper over it.
SQL> CREATE FUNCTION collection_wrapper(
2 p_collection IN varchar2_ntt
3 ) RETURN varchar2_ntt IS
4 BEGIN
5 RETURN p_collection;
6 END collection_wrapper;
7 /
Function created.
Third, we will create an interface type specification to be associated with our simple COLLECTION_WRAPPER function, as follows.
SQL> CREATE TYPE collection_wrapper_ot AS OBJECT (
2
3 dummy_attribute NUMBER,
4
5 STATIC FUNCTION ODCIGetInterfaces (
6 p_interfaces OUT SYS.ODCIObjectList
7 ) RETURN NUMBER,
8
9 STATIC FUNCTION ODCIStatsTableFunction (
10 p_function IN SYS.ODCIFuncInfo,
11 p_stats OUT SYS.ODCITabFuncStats,
12 p_args IN SYS.ODCIArgDescList,
13 p_collection IN varchar2_ntt
14 ) RETURN NUMBER
15
16 );
17 /
Type created.
This is very similar to our previous example so doesn't need to be explained in any great detail. Note, however, that our function has a P_COLLECTION parameter, which needs to be replicated in our ODCIStatsTableFunction method signature. We can now add our interface type body, as follows.
SQL> CREATE TYPE BODY collection_wrapper_ot AS
2
3 STATIC FUNCTION ODCIGetInterfaces (
4 p_interfaces OUT SYS.ODCIObjectList
5 ) RETURN NUMBER IS
6 BEGIN
7 p_interfaces := SYS.ODCIObjectList(
8 SYS.ODCIObject ('SYS', 'ODCISTATS2')
9 );
10 RETURN ODCIConst.success;
11 END ODCIGetInterfaces;
12
13 STATIC FUNCTION ODCIStatsTableFunction (
14 p_function IN SYS.ODCIFuncInfo,
15 p_stats OUT SYS.ODCITabFuncStats,
16 p_args IN SYS.ODCIArgDescList,
17 p_collection IN varchar2_ntt
18 ) RETURN NUMBER IS
19 BEGIN
20 p_stats := SYS.ODCITabFuncStats(p_collection.COUNT);
21 RETURN ODCIConst.success;
22 END ODCIStatsTableFunction;
23
24 END;
25 /
Type body created.
Our implementation is very similar to our previous example. This time, however, we have a collection parameter rather than a scalar number, so to supply the CBO with the correct cardinality, we simply count the collection's elements (line 20).
Fourth and finally, we must associate the function with the interface type, as follows.
SQL> ASSOCIATE STATISTICS WITH FUNCTIONS collection_wrapper USING collection_wrapper_ot;
Statistics associated.
Before we test the Extensible Optimiser with a variable in-list query, we'll see how it works with a simple table function select with Autotrace. First, we will query a hard-coded collection of three elements without our COLLECTION_WRAPPER function, as follows.
SQL> set autotrace traceonly explain
SQL> SELECT *
2 FROM TABLE(varchar2_ntt('A','B','C'));
Execution Plan
----------------------------------------------------------
Plan hash value: 1748000095
--------------------------------------------------------------
| Id | Operation | Name | Rows |
--------------------------------------------------------------
| 0 | SELECT STATEMENT | | 8168 |
| 1 | COLLECTION ITERATOR CONSTRUCTOR FETCH| | |
--------------------------------------------------------------
Unsurprisingly, the CBO has to resort to the default cardinality of 8,168 rows. To counter this, we can wrap our collection in a call to the wrapper function and enable the CBO to get the correct cardinality, as follows.
set autotrace traceonly explain
SQL> SELECT *
2 FROM TABLE(
3 collection_wrapper(
4 varchar2_ntt('A','B','C')));
Execution Plan
----------------------------------------------------------
Plan hash value: 4261576954
------------------------------------------------------------------------
| Id | Operation | Name | Rows |
------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3 |
| 1 | COLLECTION ITERATOR PICKLER FETCH| COLLECTION_WRAPPER | |
------------------------------------------------------------------------
This time the optimiser has the correct cardinality due to our interface type. The generic COLLECTION_WRAPPER function and COLLECTION_WRAPPER_OT interface type combine to provide statistics to the CBO when using small collections such as this.
As stated earlier, table functions are typically used to support variable in-lists, so we will see an example of how this wrapper method can assist in this scenario. We will filter the EMPLOYEES table by a variable in-list of names. The in-list is represented by a collection and although it is hard-coded for simplicity below, we would usually expect it to be passed as a parameter/bind variable.
First, we will execute the query without the COLLECTION_WRAPPER function, as follows.
SQL> set autotrace traceonly explain
SQL> SELECT *
2 FROM employees
3 WHERE last_name IN (SELECT column_value
4 FROM TABLE(
5 varchar2_ntt('Grant','King')));
Execution Plan
----------------------------------------------------------
Plan hash value: 2363723373
-------------------------------------------------------------------- ...
| Id | Operation | Name | Rows | ...
-------------------------------------------------------------------- ...
| 0 | SELECT STATEMENT | | 1 | ...
|* 1 | HASH JOIN SEMI | | 1 | ...
| 2 | TABLE ACCESS FULL | EMPLOYEES | 107 | ...
| 3 | COLLECTION ITERATOR CONSTRUCTOR FETCH| | | ...
-------------------------------------------------------------------- ...
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("LAST_NAME"=VALUE(KOKBF$))
Despite the fact that we only want to query two names from the EMPLOYEES table, Oracle has chosen a hash semi-join. This is because the optimiser has used the 8,168 heuristic cardinality for our collection. To see the effect of the Extensible Optimiser in this case, we will repeat the query but with the COLLECTION_WRAPPER function, as follows.
SQL> SELECT *
2 FROM employees
3 WHERE last_name IN (SELECT column_value
4 FROM TABLE(
5 collection_wrapper(
6 varchar2_ntt('Grant','King'))));
Execution Plan
----------------------------------------------------------
Plan hash value: 1407606243
--------------------------------------------------------------------------- ...
| Id | Operation | Name | Rows | ...
--------------------------------------------------------------------------- ...
| 0 | SELECT STATEMENT | | 2 | ...
| 1 | NESTED LOOPS | | | ...
| 2 | NESTED LOOPS | | 2 | ...
| 3 | SORT UNIQUE | | | ...
| 4 | COLLECTION ITERATOR PICKLER FETCH| COLLECTION_WRAPPER | | ...
|* 5 | INDEX RANGE SCAN | EMP_NAME_IX | 1 | ...
| 6 | TABLE ACCESS BY INDEX ROWID | EMPLOYEES | 1 | ...
--------------------------------------------------------------------------- ...
Predicate Information (identified by operation id):
---------------------------------------------------
5 - access("LAST_NAME"=VALUE(KOKBF$))
This time, the optimiser is aware that there are only two elements in the variable in-list collection and has opted for a nested loops join accordingly. This also demonstrates that the COLLECTION_WRAPPER and COLLECTION_WRAPPER_OT objects can be re-used for all queries that include small collections (as in-lists or otherwise).
summary
In this article, we have seen four methods for supplying table and pipelined function cardinalities to the optimiser. Two of these methods are unsupported (as of 11g Release 1) and for this reason, their use in production code is discouraged. Of the two supported methods, the DYNAMIC_SAMPLING hint is a new feature of 11.1.0.7 and has some limitations and performance implications. The Extensible Optimiser feature is the most flexible method to use at this stage and is usable in all versions of 10g. Using this, we have devised a good alternative to the CARDINALITY hint for pipelined functions and also created a generic wrapper for small collections that are typically used in variable in-list queries.
further reading
For more information on the Extensible Optimiser and how it can be used to generate a range of statistics for functions, read pl/sql functions and cbo costing on this website and the further reading references provided therein. For discussions of variable in-lists as a technique, see this article by William Robertson and binding in-lists in 10g on oracle-developer.net. For more information on dynamic sampling, read the Performance Tuning Guide in the online Oracle documentation.
参考至:http://www.oracle-developer.net/display.php?id=427
本文原创,转载请注明出处、作者
如有错误,欢迎指正
邮箱:czmcj@163.com
相关推荐
"A Preprocessing Scheme for High-Cardinality Categorical Attributes"这个主题探讨的就是如何有效地处理这类问题。 一、高基数类别变量的挑战 1. 维度灾难:高基数会导致数据的维度增加,这可能会引起过拟合,...
·Use a consistent,generic Information Engineering(IE)Crow's Foot E-R diagram notation for data modehng and database design. Provide a detailed discussion of specific normal forms within a ...
Whether it's simplifying difference equations or leveraging complex analysis for insightful approximations, generating functions offer a unique and valuable perspective on the intersection of ...
标题中的"Stream summarizer and cardinality estimator"是一个与数据流处理和基数估计相关的概念。在IT行业中,数据流处理指的是对连续不断的数据进行实时分析,而基数估计则是在大规模数据集中估算唯一元素数量的...
New cardinality estimation algorithms for HyperLogLog sketchesOtmar Ertl otmar.ertl@gmail.comFebruary 27, 2017This paper presents new methods to estimate the cardinalities of data sets recorded by ...
(b) the name of the table, the names of the table's attributes, the data types of the table's attributes, the formats of the table's attributes, and the maximum number of rows that the table can have...
为了解决这个问题,可以使用`ANALYZE TABLE`命令来重新计算表的统计信息,包括`Cardinality`。执行`ANALYZE TABLE feed_comment_info_id_0000`之后,数据库将更新索引的Cardinality值,使其更准确地反映索引中的唯一...
The only source of any storage location information is the sysindexes table, which keeps track of the address of the root page for every index, and the first IAM page for the index or table....
本文主要探讨了遗传算法的编码元数(coding alphabet cardinality)与函数周期(function period)之间的关系,并比较了不同编码元数的遗传算法在处理单周期函数和多周期函数时的优化效果。研究建立了一个基于一阶...
例如,如果表T有1000行,列COL1上有500个非重复值,没有直方图和空值,那么查询"WHERE COL1=某值"时,优化器可能会假设数据均匀分布,预计返回2行,即Cardinality为2。准确的Cardinality有助于生成更高效的执行计划...
### Oracle 11g Functions 常用函数详解 #### ABS **语法**: `ABS(n)` **用法**: 返回参数`n`的绝对值。 **参数**: `n`为数值数据类型,或者可以隐式转化为数值类型的其他数据类型。 **示例**: ```sql SELECT ABS(-...
Cardinality estimation has a wide range of applications and is of particular importance in database systems. Various algorithms have been proposed in the past, and the HyperLogLog algorithm is one of ...
cardinality estimation algorithmPhilippe Flajolet1 and Éric Fusy1 and Olivier Gandouet2 and Frédéric Meunier11Algorithms Project, INRIA–Rocquencourt, F78153 Le Chesnay (France) 2LIRMM, 161 rue Ada...
1.6 证明「有且仅有」(Proving an「If and Only If」) 1.7 案例证明(Proof by Cases) 1.8 反证法(Proof by Contradiction) 1.9 证明的实战演练(Good Proofs in Practice) 1.10 参考文献(References) 2 良...
在IT行业中,"cardinality"一词通常与数据库设计、数据结构或集合论中的概念相关。在CSS(层叠样式表)领域,它可能是指选择器的权重或元素出现的次数。这里我们将深入探讨这两个方面的cardinality概念。 首先,...
本文介绍的《Mathematics for Computer Science》一书是由Eric Lehman(Google)、F Thomson Leighton(麻省理工学院,简称MIT)和Albert R Meyer(MIT)编写的,属于计算机科学领域中离散数学教材。离散数学是...