Star transformation was introduced in Oracle 8i to process star queries efficiently. These queries are commonly used in data warehouse applications that follow the Star Schema data model. The Star Schema is so called because the data model diagram resembles a star. The center of the star consists of one or more fact tables and the points of the star are the dimension tables.
Consider the following star query Q1. The query is to find the total sales amount in all cities in California for quarters Q1 and Q2 of year 1999 through the Internet.
Q1:
SELECT c.cust_city, t.calendar_quarter_desc, SUM(s.amount_sold) sales_amount
FROM sales s, times t, customers c, channels ch
WHERE s.time_id = t.time_id
AND s.cust_id = c.cust_id
AND s.channel_id = ch.channel_id
AND c.cust_state_province = 'CA'
AND ch.channel_desc = 'Internet'
AND t.calendar_quarter_desc IN ('1999-01','1999-02')
GROUP BY c.cust_city, t.calendar_quarter_desc;
Sales is the fact table while the other tables are considered as dimension tables. The Sales table contains one row for every sale of a product and thus it may contain billions of sales records. However only a few of them are sold to customers in California through the Internet for the specified quarters. The query is transformed into Q2.
Q2:
FROM sales s, times t, customers c
WHERE s.time_id = t.time_id
AND s.cust_id = c.cust_id
AND c.cust_state_province = 'CA'
AND t.calendar_quarter_desc IN ('1999-01','1999-02')
AND s.time_id IN (SELECT time_id
FROM times
WHERE calendar_quarter_desc IN('1999-01','1999-02'))
AND s.cust_id IN (SELECT cust_id
FROM customers
WHERE cust_state_province='CA')
AND s.channel_id IN (SELECT channel_id
FROM channels
WHERE channel_desc = 'Internet')
GROUP BY c.cust_city, t.calendar_quarter_desc;
Star transformation is essentially about adding subquery predicates corresponding to the constraint dimensions. These subquery predicates are referred to as bitmap semi-join predicates. The transformation is performed when there are indexes on the fact join columns (s.timeid, s.custid...). By driving bitmap AND and OR operations (bitmaps can be from bitmap indexes or generated from regular B-Tree indexes) of the key values supplied by the subqueries, only the relevant rows from the fact table need to be retrieved. If the filters on the dimension tables filter out a lot of data, this can be much more efficient than a full table scan on the fact table. After the relevant rows have been retrieved from the fact table, they may need to be joined back to the dimension tables, using the original predicates. In some cases, the join back can be eliminated. We will discuss this situation later.
Table 1 shows the query plan for the transformed query. Note that the sales table has a bitmap access path instead of a full table scan. For each key value coming from the subqueries (lines 11, 16, 21), the bitmaps are retrieved from the fact table indexes (lines 12, 17, 22). Each bit in the bitmap corresponds to a row in fact table. The bit is set if the key value from the subquery is same as the value in the row of fact table. For example, the bitmap [1][0][1][0][0][0]...(all 0s for remaining rows) indicate that rows 1 and 3 of fact table has matching key value from subquery. Lets say the above bitmap is for a key value from customers table subquery.
The operations in lines 9, 14, 19 iterates over the keys from the subqueries and get the corresponding bitmaps. Lets say the customers subquery produces one more key value with the bitmap [0][1][0][0][0][0]...
The bitmaps for each subquery are merged (ORed) (lines 8, 13 and 18). In the above example, it will produce a single bitmap [1][1][1][0][0][0]... for customers subquery after merging the two bitmaps.
The merged bitmaps are ANDed (line 7). Lets say the bitmap from channels is [1][0][0][0][0][0]... If you AND this bitmap with the bitmap from customers subquery it will produce [1][0][0][0][0]...
The corresponding rowids of the final bitmap are generated (line 6). The fact table rows are retrieved using the rowids (line 5). In the above example, it will generate only 1 rowid corresponding to the first row and fetches only a single row instead of scanning the entire fact table.
The representation of bitmaps in the above example is for illustration purpose only. In oracle, they are represented and stored in a compressed form.
Table 1: The plan of the transformed query
Id |
Operation |
Name |
0 |
SELECT STATEMENT |
|
1 |
HASH GROUP BY |
|
2 |
HASH JOIN |
|
3 |
HASH JOIN |
|
4 |
PARTITION RANGE SUBQUERY |
|
5 |
TABLE ACCESS BY LOCAL INDEX ROWID |
SALES |
6 |
BITMAP CONVERSION TO ROWIDS |
|
7 |
BITMAP AND |
|
8 |
BITMAP MERGE |
|
9 |
BITMAP KEY ITERATION |
|
10 |
BUFFER SORT |
|
11 |
TABLE ACCESS FULL |
CHANNELS |
12 |
BITMAP INDEX RANGE SCAN |
SALES_CHANNEL_BIX |
13 |
BITMAP MERGE |
|
14 |
BITMAP KEY ITERATION |
|
15 |
BUFFER SORT |
|
16 |
TABLE ACCESS FULL |
TIMES |
17 |
BITMAP INDEX RANGE SCAN |
SALES_TIME_BIX |
18 |
BITMAP MERGE |
|
19 |
BITMAP KEY ITERATION |
|
20 |
BUFFER SORT |
|
21 |
TABLE ACCESS FULL |
CUSTOMERS |
22 |
BITMAP INDEX RANGE SCAN |
SALES_CUST_BIX |
23 |
TABLE ACCESS FULL |
CUSTOMERS |
24 |
TABLE ACCESS FULL |
TIMES |
Join back elimination
The subqueries and their bitmap tree only filter the fact table based on the dimension filters, so it may still be necessary to join to the dimension table. The join back of the dimension table is eliminated when all the predicates on dimension tables are part of the semijoin subquery predicate, the column(s) selected from the subquery are unique and the dimension columns are not in select list, group by etc. In the above example, the table channels is not joined back to the sales table since it is not referenced outside and channel_id is unique.
Temporary table transformation
If the join back is not eliminated, Oracle stores the results of the subquery in a temporary table to avoid re-scanning the dimension table (for bitmap key generation and join back). In addition to this, the results are materialized if the query is run in parallel, so that each slave can select the results from the temporary tables instead of executing the subquery again.
For example, if Oracle materializes the results of the subquery on customers into a temporary table, the transformed query Q3 will be as follows.
Q3:SELECT t1.c1 cust_city, t.calendar_quarter_desc calendar_quarter_desc,
sum(s.amount_sold) sales_amount
FROM sales s, sh.times t, sys_temp_0fd9d6621_e7e24 t1
WHERE s.time_id=t.time_id
AND s.cust_id=t1.c0
AND (t.calendar_quarter_desc='1999-q1' OR t.calendar_quarter_desc='1999-q2')
AND s.cust_id IN (SELECT t1.c0 FROM sys_temp_0fd9d6621_e7e24 t1)
AND s.channel_id IN (SELECT ch.channel_id
FROM channels ch
WHERE ch.channel_desc='internet')
AND s.time_id IN (SELECT t.time_id
FROM times t
WHERE t.calendar_quarter_desc='1999-q1'
OR t.calendar_quarter_desc='1999-q2')
GROUP BY t1.c1, t.calendar_quarter_desc
Note that customers is replaced by the temporary table sys_temp_0fd9d6621_e7e24 and references to columns cust_idand cust_city are replaced by the corresponding columns of the temporary table. The temporary table will be created with 2 columns - (c0 number, c1 varchar2(30)). These columns corresponds to cust_id and cust_city of customers table. The table will be populated using the following query Q4 at the beginning of the execution of the statement Q3.
Q4:
SELECT c.cust_id, c.cust_city FROM customers WHERE c.cust_state_province = 'CA'
Table 2: Plan with temporary table transformation
0
|
SELECT STATEMENT
|
|||
1 |
TEMP TABLE TRANSFORMATION |
|||
2 |
LOAD AS SELECT |
sys_temp_0fd9d6621_e7e24 | ||
3 |
TABLE ACCESS FULL |
CUSTOMERS |
||
4 |
HASH GROUP BY |
|||
5 |
HASH JOIN |
|||
6 |
HASH JOIN |
|||
7 |
PARTITION RANGE SUBQUERY |
|||
8 |
TABLE ACCESS BY LOCAL INDEX ROWID |
SALES |
||
9 |
BITMAP CONVERSION TO ROWIDS |
|||
10 |
BITMAP AND |
|||
11 |
BITMAP MERGE |
|||
12 |
BITMAP KEY ITERATION |
|||
13 |
BUFFER SORT |
|||
14 |
TABLE ACCESS FULL |
CHANNELS |
||
15 |
BITMAP INDEX RANGE SCAN |
SALESCHANNELBIX |
||
16 |
BITMAP MERGE |
|||
17 |
BITMAP KEY ITERATION |
|||
18 |
BUFFER SORT |
|||
19 |
TABLE ACCESS FULL |
TIMES |
||
20 |
BITMAP INDEX RANGE SCAN |
SALESTIMEBIX |
||
21 |
BITMAP MERGE |
|||
22 |
BITMAP KEY ITERATION |
|||
23 |
BUFFER SORT |
|||
24 |
TABLE ACCESS FULL |
sys_temp_0fd9d6621_e7e24 |
||
25 |
BITMAP INDEX RANGE SCAN |
SALESCUSTBIX |
||
26 |
TABLE ACCESS FULL |
sys_temp_0fd9d6621_e7e24 |
||
27 |
TABLE ACCESS FULL |
TIMES |
The lines 1,2 and 3 of the plan materialize the customers subquery into the temporary table. In line 24, it scans the temporary table (instead of the subquery) to build the bitmap from the fact table. Line 26 is for scanning the temporary table for joining back instead of scanning customers. Note that the filter on customers is not needed to be applied on the temporary table since the filter is already applied while materializing the temporary table.
Enabling the transformation
Star transformation is controlled by the star_transformation_enabled parameter. The parameter takes 3 values.
- TRUE - The Oracle optimizer performs transformation by identifying fact and constraint dimension tables automatically. This is done in a cost-based manner, i.e. the transformation is performed only if the cost of the transformed plan is lower than the non-transformed plan. Also the optimizer will attempt temporary table transformation automatically whenever materialization improves performance.
- FALSE - The transformation is not tried.
- TEMP_DISABLE - This value has similar behavior as TRUE except that temporary table transformation is not tried.
The default value of the parameter is FALSE. You have to change the parameter value and create indexes on the joining columns of the fact table to take advantage of this transformation.
Summary
Star transformation improves the performance of queries with a very big fact table joined to multiple dimension tables when the dimension tables have very selective predicates. The transformation avoids the full scan of the fact table. It fetches only relevant rows from the fact table that will eventually join to the constraint dimension rows. The transformation is performed based on cost - only when the cost of the transformed plan is lower than that of the non-transformed plan. If the dimension filters do not significantly reduce the amount of data to be retrieved from the fact table, then a full table scan is more efficient.
In this post we have tried to illustrate the basic ideas behind star transformation by showing simple example queries and plans. Oracle can do star transformation in more complex cases. For example, a query with multiple fact tables, snowflakes (dimension is a join of several normalized tables instead of denormalized single table), etc.
参考至:https://blogs.oracle.com/optimizer/entry/star_transformation
如有错误,欢迎指正
邮箱:czmcj@163.com
相关推荐
TypeError: Unexpected keyword argument passed to optimizer: learning_rate 3.问题定位: 先看报错代码:大概意思是, 传给优化器的learning_rate参数错误。 模型训练是在服务器Linux环境下进行的,之后在...
[[pdf]](https://arxiv.org/pdf/1606.04474) **(Neural Optimizer,Amazing Work)** :star::star::star::star::star: **[25]** Han, Song, Huizi Mao, and William J. Dally. "**Deep compression: Compressing ...
algorithm called Multi-Verse Optimizer (MVO). The main inspirations of this algorithm are based on three concepts in cosmology: white hole, black hole, and wormhole. The mathematical models of these ...
《智能优化算法:Aquila Optimizer——金雕优化器》 智能优化算法是现代计算机科学领域中的一个重要分支,尤其在解决复杂优化问题时展现出了强大的潜力。Aquila Optimizer,又称为金雕优化器,是一种新颖的群智能...
如今,优化算法的设计非常流行来解决各个科学领域的问题。 优化算法通常受代理的自然行为启发,代理可以是人类、动物、植物或物理或化学代理。 过去十年中提出的大多数算法都是受动物行为启发的。...
Al-qaness,MA,Gandomi, AH,Aquila Optimizer:一种新颖的元启发式优化算法,计算机与工业工程(2021年),doi: https : //doi.org/10.1016/j.cie.2021.107250 可在Researchgate上找到的代码: https : //...
法式长棍面包优化器这是官方的Baguette-Optimizer源代码! 〜这是什么? 这是最好的计算机优化器之一,是付费程序,但现在免费。 〜安全吗?是的,因为您创建了还原点并进行了优化,所以一切都会很好。 包括病毒/...
名称:Image Optimizer 版本:5.0 适用系统:Windows 9x/ME/NT4/2000/XP 简介: 本软件是一个图像优化(减肥)工具,支持 bmp、jpg、gif、pnp、tif 等多种图像格式,支持批量处理。
这是一个简单的工具箱,具有友好的图形界面,非常适合那些没有很高编程技能的人。 GWO 算法的参数可以在工具箱中轻松定义。 目标函数的默认名称是 CostFunction。 如果您查看 CostFunction.m 文件,您可能会注意到...
《蒲公英优化算法Dandelion-Optimizer的深度解析与应用》 蒲公英优化算法(Dandelion-Optimizer)是一种新兴的群智能优化算法,它借鉴了自然界中蒲公英种子随风飘散的随机扩散特性,以解决复杂优化问题。在机器学习...
GBO(Global Best Optimization)是一种全局优化算法,它在解决多模态函数优化问题时表现得尤为出色。这种算法借鉴了生物进化中的优胜劣汰原则,通过模拟种群的进化过程来寻找最佳解决方案。在标准的GBO算法中,种群...
用于CS3223数据库实现的查询优化器 设置 运行source queryenv 运行source build.sh 。 请注意,每次您都需要运行build.sh来重新编译代码,而不是通过诸如IntelliJ之类的任何IDE来生成项目。 建立表格 ...
pip3 install onnxoptimizer 请注意,如果遇到问题,您可能需要先升级点子:pip3 install -U pip 如果要从源代码构建:git clone --recursive https://github.com/onnx/optimizer onnxoptimizercd onnxoptim
火炬优化器 torch-optimizer -- 与模块兼容的优化器集合。简单的例子 import torch_optimizer as optim# model = ...optimizer = optim . DiffGrad ( model . parameters (), lr = 0.001 )optimizer . step ()安装...
feather-optimizer-htmlmin通过自动化这个过程,使得开发者能够更专注于代码逻辑和功能实现,而不是手动处理这些琐碎的优化工作。这个工具通常集成在构建流程中,如Webpack或Gulp,当项目构建时自动运行,确保每一次...
Maven坐标:org.apache.flink:flink-optimizer:1.14.3; 标签:apache、flink、optimizer、中文文档、jar包、java; 使用方法:解压翻译后的API文档,用浏览器打开“index.html”文件,即可纵览文档内容。 人性化...
**标题:“ZendOptimizer-3.3.3下载”** **描述:“ZendOptimizer-3.3.3组件”** 本文将详细介绍 ZendOptimizer-3.3.3,这是一个在IT行业中广泛使用的组件,尤其对于PHP开发人员而言,它具有重要的意义。Zend...
图像优化器图像优化器可优化给定...用法有关命令行的用法,请参见image-optimizer命令的帮助。 有关用作库的信息,请参见image-optimizer命令的源代码。要求必须安装以下软件才能获得完整功能: (必填), (必填),
requirejs_optimizer:executable requirejs_optimizer:run一些 requirejs-optimizer 特定选项。 requirejs_optimizer_config<Hash> base_url基本网址require_config_path需要配置的路径main_config_root_path...
《Vitis AI Optimizer 用户指南》(UG1333 v3.0) 是一份针对人工智能领域的技术文档,由Xilinx公司发布。Xilinx致力于创建一个包容性的工作环境,不仅对内部员工,也包括其客户和合作伙伴。为实现这一目标,公司正在...