The Hybrid Response
When requirements call for changed data to result in both type 1 and type 2 behavior, the solution is disarmingly simple: provide for both. This cannot be achieved with a single attribute, but it can be achieved by providing a pair of attributes. Each attribute has the same source, but one will react to changes as a type 1 response and the other as type 2. This solution is known as a type 1/2 hybrid. It’s a good idea to name these attributes clearly, since each serves a different function. Use the suffix _current to identify the type 1 attribute, and _historic to identify the type 2 attribute. When the source for the pair of attributes changes, the ETL process must do two things:
- Update the _current value for all records that share the same natural key.
- Insert a new row, initializing both _current and _historic values to the new value.
When reporting on the data, developers choose which column to use based on the requirements for the report.
- Use the _current column to group all facts under the current values.
- Use the _historic column to group all facts under the historic values.
A Type 1/2 Hybrid Response in Action
A hybrid design for handling changes in company names is illustrated in Below figure. The customer dimension in the figure records company names in two columns:
- company_name_current will capture the current name of the company. This attribute will exhibit type 1 behavior. When a company name changes, it will be updated.
- company_name_historic will capture the detailed history of company names. This attribute will exhibit type 2 behavior. When a company name changes, a new row will be created to capture the new value. The new row will be associated with any subsequent transactions added to a fact table.
Processing a Change
Suppose one of the companies in the customer table is Apple Computer. You may recall that in 2007 the company changed its name from “Apple Computer, Inc.” to “Apple Inc.” Imagine that products have been ordered by Apple, both before and after the name change. Below figure illustrates the two-step process that occurs when the name of the company changes.
The top of the figure shows the row for Apple that was in place prior to the name change. The natural key for Apple is BB770. It has one row in the customer table, with a surrogate key value of 1011. The column company_name_current shows the name of the company as “Apple Computer, Inc.” There have been no changes as of yet, so company_name_historic contains the same value. The lower half of Figure shows what happens when the name of the company changes:
- The first row shows the previously existing record, with surrogate key value 1011. The company_name_current column is updated with the new company name “Apple Inc.” This is the type 1 response. The company_name_historic value is untouched, since it is treated as type 2.
- To process the type 2 change, a new row is added. It contains the new name “Apple Inc.” in both the company_name_current and company_name_historic columns.
This new row is assigned a new surrogate key, 2822, which will be used for new rows in associated fact tables.
This may appear convoluted, but it is no different from processing any table that contains both type 1 and type 2 attributes. The only difference is that both attributes happen to have the same source.
Using the Company Name Columns
Facts can now be studied in two ways. When you want to be able to capture all orders under the new name (Apple Inc.), you can use the column company_name_current. This might be done to filter a query or to group aggregated facts. All activity with Apple is associated with the same name: Apple Inc. This is particularly useful for producing this year versus last year comparisons; all orders will have the same name, even if it recently changed. When you want to group orders under historically accurate company names, you can use the company_name_historic. Transactions that took place before Apple’s name change are grouped under “Apple Computer, Inc.” Transactions that took place after the name
change are grouped under “Apple Inc.”Each time a company changes,the same two-step process is followed. First, all existing rows with the same natural key have their _current value updated. Second, a new row is added, with _current and _historic initialized to the new value.
Suppose that Apple decides to change its name again. The iPod and iPhone have been
so successful they might rename the company “iApple Inc.” Below figure shows how this
change would be processed.
First, the existing records in the dimension table for Apple, which is company BB770, are updated with the new company_name_current. This time around, there are two records to be updated: those identified by surrogate keys 1011 and 2822. That takes care of the type 1 processing.
Second, a new row is added to the table for iApple. This row records the new name, iApple Inc.,in both the company_name_current and company_name_historic columns.
That takes care of the type 2 processing. This new row has surrogate key value 3100. Any new orders from iApple will use this key value in the fact table.
This process can be repeated as many times as the name of a given company changes.
It will always be possible to use the current name of the company to study all facts by using company_name_current. It will always be possible to study all facts with historically accurate company names by using company_name_historic.
Evaluating and Extending the Hybrid Approach
The hybrid approach addresses conflicting slow change requirements at the expense of
understandability. Hybrid solutions generate no end of confusion among end users and
analysts, who are often uncertain about which column to use in a given report. When the
wrong column is used for a situation, the results can be perplexing.
It is, therefore, important to use hybrid solutions judiciously. Evaluate the requirements carefully. In the case of the company name changes, for example, why do some people want access to the historically accurate name? If it is so they can reproduce a historic invoice, that purpose is better served by the operational system. On the other hand, if it is needed for certain forms of statutory reporting, the requirement may be valid.
When a hybrid approach is implemented, most analysis tends to focus on one of the columns. In the company example, the column of primary interest contains the current company name. To simplify the model exposed to end users, you can create a view of the customer dimension that hides the historic version. Developers trained in the complexities of the star can be provided with the full version of the table, for use in creating canned reports.
In extremely rare situations, you may find requirements that call for a type 1/2/3 hybrid. In this case, the attribute in question will be represented by three columns: the current value, the previous value, and the historic value. This situation will compound the confusion of users trying to choose which column to show in a report. Again, you will want to try to insulate end users from this complexity, exposing them only to the version required most often.
At this point, the reader may be thinking about other ways to combine these techniques. There is no need to describe a type 1/3 hybrid, because a type 3 change already includes a column that exhibits type 1 behavior. For the same reason, a type 2/3 hybrid is no different from the 1/2/3 hybrid described in the previous paragraph. It is possible to incorporate time stamps with any hybrid technique that involves a type 2 component. Remember that complexity increases the ETL workload and reduces usability; be sure that any solution is truly warranted by the requirements.
参考至:《Star Schema The Complete Reference》
本文原创,转载请注明出处、作者
如有错误,欢迎指正
邮箱:czmcj@163.com
相关推荐
The hybrid system composed of a Photovoltaic (PV) array and a Proton exchange membrane fuel cell (PEMFC) is considered. The PV array normally uses a maximum power point tracking (MPPT) technique to ...
The book takes us on a journey of architecting, building, and operating a hybrid cloud while taking a very pragmatic approach towards it. The book starts by defining the different demographics of the ...
Face recognition on large-scale video in the wild with hybrid
### H3C交换机的Hybrid接口详解 #### 一、引言 在现代网络环境中,灵活而安全的网络架构对于提高工作效率至关重要。H3C作为国内知名的网络设备制造商之一,其产品广泛应用于各种规模的企业网络建设中。其中,Hybrid...
随着信息技术的发展与图书馆行业的不断变革,“混合图书馆”(Hybrid Library)的概念逐渐成为图书馆领域的重要议题之一。所谓混合图书馆,是指结合了传统纸质资源与数字化资源的图书馆模式。在这样的背景下,客户...
在本文中,我们将深入探讨H3C Hybrid端口的基础配置案例,这主要针对H3CV7版本的网络设备,如交换机和路由器。这个案例旨在帮助刚入门的网络工程师理解和应用Hybrid端口配置,以实现特定的网络访问策略。 首先,...
By the end of the book, you will not only have built a professional, hybrid mobile application, but will also have ensured that your app is secure and performance driven. Style and approach A step-...
In this study, we propose a hybrid variable selection strategy based on the continuous shrinkage of variable space which is the core idea of variable combination population analysis (VCPA). The VCPA-...
在IT行业中,"Hybrid"一词具有广泛的含义,它通常指的是不同技术或系统的结合,以创造出具有更多功能和灵活性的新解决方案。在这个特定的上下文中,我们似乎关注的是与“字体”相关的“Hybrid”,这可能是指一种混合...
This study is aimed at eliminating the influence of the higher-order modes on the frequency response functions (FRFs) of non-proportionally viscously damped systems. Based on the Neumann expansion ...
### Qt Hybrid 应用开发详解 #### 一、Qt Hybrid 的定义与意义 Qt Hybrid 是指结合了原生 Qt C++ 和 Web 技术(如 HTML、CSS 和 JavaScript)的应用程序开发方式。这种方式使得开发者能够充分利用 Qt 强大的功能与...
HOMER (Hybrid Optimization of Multiple Electric Renewables), the micropower optimization model, simplifies the task of evaluating designs of both off-grid and grid-connected power systems for a ...
Ionic has evolved as the most popular choice for Hybrid Mobile App development as it tends to match the native experience and provides robust components/tools to build apps. The Ionic Complete ...
Barney walks you through the entire hybrid iPhone development process, from creating great user interfaces to compiling, deploying, and executing applications. Along the way, he introduces techniques...
Abstract—The decoupled architecture and the fine-grained ...of the hybrid hierarchical approach. Finally, we verify the effectiveness of Orion both from the theoretical and experimental aspects
This study presents a hybrid AI (artificial intelligence) approach to the implementation of trading strategies in the S&P 500 stock index futures market. The hybrid AI approach integrates the rule-...
华为交换机Hybrid接口特性 华为交换机的Hybrid特性是华为交换机的专有特性,该特性为局域网的搭建提供了更多的灵活性和安全性。Hybrid特性可以承载多个VLAN的流量,可以用在与PC或交换机相连的接口,与Trunk接口的...
交换机 Hybrid 接口详解 在计算机网络中,交换机的接口类型是非常重要的,它可以改变数据包的处理方式。今天,我们将详细介绍交换机的 Hybrid 接口类型,并解释它与 Access 和 Trunk 接口类型的区别。 交换机的...