Slowly Changing Dimensions
The "Slowly Changing Dimension" problem is a common one particular to data warehousing. In a nutshell, this applies to cases where the attribute for a record varies over time. We give an example below:
Christina is a customer with ABC Inc. She first lived in Chicago, Illinois. So, the original entry in the customer lookup table has the following record:
Customer Key | Customer_ID | Name | Birthday | State |
1001 | 999 | Christina | 1990-01-01 | Illinois |
At a later date, she moved to Los Angeles, California on January, 2003. How should ABC Inc. now modify its customer table to reflect this change? This is the "Slowly Changing Dimension" problem.
There are in general three ways to solve this type of problem, and they are categorized as follows:
Type 1: The new record replaces the original record. No trace of the old record exists.
Type 2: A new record is added into the customer dimension table. Therefore, the customer is treated essentially as two people.
Type 3: The original record is modified to reflect the change.
Type 1 Slowly Changing Dimension
When the source of a dimension value changes, and it is not necessary to preserve its history in the star schema, a type 1 response is employed. The dimension is simply overwritten with the new value. This technique is commonly employed in situations where a source data element is being changed to correct an error.
By overwriting the corresponding dimension in the star schema, the type 1 change obliterates the history of the data element. The star carries no hint that the column ever contained a different value. While this is generally the desired effect, it can also lead to confusion. If there were any associated facts before the change occurred, their historic context is retroactively altered.
When a record is updated in a dimension table, the context for existing facts is restated. This effect
can give rise to confusion. Steps can be taken to minimize the confusion caused by type 1 changes. Systems analysts responsible for supporting the data warehouse users must be aware of this phenomenon so they are prepared to address confusion. Developers of reports can place the query execution date within the report footer or cover page, signaling to readers the date as of which the report was current. Any reports that are pre-run and stored for users can be automatically updated on a regular basis so users do not inintentionally access “stale” data.
In our example, recall we originally have the following table:
Customer Key | Customer_ID | Name | Birthday | State |
1001 | 999 | Christina | 1990-01-01 | Illinois |
Developers found Operating system record Christina's birthday incorrectly at the begining, and they correct the birtday value to 1990-02-01, the new information replaces the new record, and we have the following table:
Customer Key | Customer_ID | Name | Birthday | State |
1001 | 999 | Christina | 1990-02-01 | Illinois |
Advantages:
- This is the easiest way to handle the Slowly Changing Dimension problem, since there is no need to keep track of the old information.
Disadvantages:
- All history is lost. By applying this methodology, it is not possible to trace back in history. For example, in this case, the company would not be able to know that Christina lived in Illinois before.
Preexisting Facts Have a New Context
When a record is updated in a dimension table, the context for existing facts is restated. This effect
can give rise to confusion.
Usage:
About 50% of the time.
When to use Type 1:
Type 1 slowly changing dimension should be used when it is not necessary for the data warehouse to keep track of historical changes.
Type 2 Slowly Changing Dimension
In Type 2 Slowly Changing Dimension, a new record is added to the table to represent the new information. Therefore, both the original and the new record will be present. The new record gets its own surrogate key.
In our example, recall we have corrected the birthday value, and Christina moved from Illinois to California this time, the following table:
Customer Key | Customer_ID | Name | Birthday | State |
1001 | 999 | Christina | 1990-02-01 | Illinois |
The Order fact table shows as below
Customer Key | day_key | product_key | quantity_ordered |
1001 | 2322 | 10119 | 5 |
After Christina moved from Illinois to California, we add the new information as a new row into the table:
Dimension table
Customer Key | Customer_ID | Name | Birthday | State |
1001 | 999 | Christina | 1990-02-01 | Illinois |
1005 | 999 | Christina | 1990-02-01 | California |
Fact table
Customer Key | day_key | product_key | quantity_ordered |
1001 | 2322 | 10119 | 5 |
1005 | 4377 | 20111 | 1 |
Historic Context of Facts Is Preserved
By creating multiple versions of the dimension, a type 2 response avoids restating the context of previously existing facts. Old facts can remain associated with the old row; new facts can be associated with the new row. This has the desired effect of preserving past history, while allowing new activity to be associated with the new value.
History of Dimension Is Partially Maintained
A type 2 change results in multiple dimension rows for a given natural key. While this serves to preserve the historic context of facts, it can trigger new forms of confusion. Users may be confused by the presence of duplicate values in the dimension tables. Designers may be lulled by a false sense that they are preserving dimensional history.
Type 2 changes can confuse end users because they cause duplicate values to appear in dimension tables. For example, after Christina's change of address, there are two rows in the dimension table for her customer key. If someone were to query the dimension table to get the name associated with Christina, both rows would be returned. This side effect can be avoided by issuing browse queries that select distinct values. A flag may also be added to indicate the current row for a given natural key value.
Although the type 2 change preserves the historic context of facts, it does not preserve history in the dimension. It is easy to see that a given natural key has taken on multiple representations in the dimension, but we do not know when each of these representations was correct. This information is only provided by way of a fact.
For example, after the change to Christina’s address has occurred, the dimension table in Figure 3-8 shows that there have been two versions of Christina, but it cannot tell us what Christina looked like on any given date. Where was she living on January 1, 2008? The dimension table does not carry this information. If there is an order for January 1, 2008, we are in luck, because the orders fact table will refer to the version of Sue that was correct at the time of the order. If there is not an order on that date, we are unable to determine what Christina looked like at that point in time.
It may be clear to you that this problem is easily rectified by adding a date stamp to each version of Christina. This technique allows the dimension to preserve both the history of facts and the history of dimensions. Another possibility is to build an additional fact table that associates versions of Sue with various dates.
Advantages:
- This allows us to accurately keep all historical information.
Disadvantages:
- This will cause the size of the table to grow fast. In cases where the number of rows for the table is very high to start with, storage and performance can become a concern.
- This necessarily complicates the ETL process.
- It's not easy to confirm which version is correct in dimension table without timestamp information.
Usage:
About 50% of the time.
When to use Type 2:
Type 2 slowly changing dimension should be used when it is necessary for the data warehouse to track historical changes.
In addition to the type 1 and type 2 techniques introduced in this chapter, additional responses to source data changes are possible. Options include the type 3 response, hybrid responses, and time-stamped variations. Though less common, these techniques meet additional analytic challenges that I plan to elobrate it by other post.
参考至:《Star Schema The Complete Reference》
http://www.1keydata.com/datawarehousing/slowly-changing-dimensions.html
http://www.1keydata.com/datawarehousing/slowly-changing-dimensions-type-1.html
http://www.1keydata.com/datawarehousing/slowly-changing-dimensions-type-2.html
本文原创,转载请注明出处,作者
如有错误,欢迎指正
邮箱:czmcj@163.com
相关推荐
远程debug流程,方便debug
基于麻雀生物特性的搜索算法(SSA)的Matlab实现:原理、代码与实战应用,基于圈养麻雀特性的搜索算法(SSA)matlab实现:原理、代码与警觉机制解析,麻雀搜索算法(SSA)的matlab实现 原创代码,注释清晰,可直接运行 研究表明,圈养的麻雀存在两种不同类型:发现者和加入者。 发现者在种群中负责寻找食物并为整个麻雀种群提供觅食区域和方向,而加入者则是利用发现者来获取食物。 在生活中我们仔细观察会发现,当群体中有麻雀发现周围有捕食者时,此时群体中一个或多个个体会发出啁啾声,一旦发出这样的声音整个种群就会立即躲避危险,进而飞到其它的安全区域进行觅食。 这样的麻雀被称为警觉者。 麻雀搜索算法就是利用麻雀的这种生物特性进行迭代寻优的优化算法。 本资源包含以下三部分内容: 1.麻雀搜索算法的基本原理(两篇参考文献),非常适合用来学习。 2.麻雀搜索算法的matlab代码,注释详细,结构清晰。 3.五个群智能优化算法常用的测试函数。 ,麻雀搜索算法(SSA); MATLAB实现; 原创代码; 注释清晰; 可直接运行; 生物特性迭代寻优; 发现者与加入者; 警觉者; 参考两篇文献。
基于java的五子棋游戏设计源码+论文
deepseek-r1使用指南
DeepSeek+DeepResearch——让科研像聊天一样简单 (1)DeepSeek如何做数据分析? (2)DeepSeek如何分析文件内容? (3)DeepSeek如何进行数据挖掘? (4)DeepSeek如何进行科学研究? (5)DeepSeek如何写综述? (6)DeepSeek如何进行数据可视化? (7)DeepSeek如何写作润色? (8)DeepSeek如何中英文互译? (9)DeepSeek如何做降重? (10)DeepSeek论文参考文献指令 (11)DeepSeek基础知识。
项目工程资源经过严格测试运行并且功能上ok,可实现复现复刻,拿到资料包后可实现复现出一样的项目,本人系统开发经验充足(全栈全领域),有任何使用问题欢迎随时与我联系,我会抽时间努力为您解惑,提供帮助 【资源内容】:包含源码+工程文件+说明等。答辩评审平均分达到96分,放心下载使用!可实现复现;设计报告也可借鉴此项目;该资源内项目代码都经过测试运行,功能ok 【项目价值】:可用在相关项目设计中,皆可应用在项目、毕业设计、课程设计、期末/期中/大作业、工程实训、大创等学科竞赛比赛、初期项目立项、学习/练手等方面,可借鉴此优质项目实现复刻,设计报告也可借鉴此项目,也可基于此项目来扩展开发出更多功能 【提供帮助】:有任何使用上的问题欢迎随时与我联系,抽时间努力解答解惑,提供帮助 【附带帮助】:若还需要相关开发工具、学习资料等,我会提供帮助,提供资料,鼓励学习进步 下载后请首先打开说明文件(如有);整理时不同项目所包含资源内容不同;项目工程可实现复现复刻,如果基础还行,也可在此程序基础上进行修改,以实现其它功能。供开源学习/技术交流/学习参考,勿用于商业用途。质量优质,放心下载使用
项目工程资源经过严格测试运行并且功能上ok,可实现复现复刻,拿到资料包后可实现复现出一样的项目,本人系统开发经验充足(全栈全领域),有任何使用问题欢迎随时与我联系,我会抽时间努力为您解惑,提供帮助 【资源内容】:包含源码+工程文件+说明等。答辩评审平均分达到96分,放心下载使用!可实现复现;设计报告也可借鉴此项目;该资源内项目代码都经过测试运行;功能ok 【项目价值】:可用在相关项目设计中,皆可应用在项目、毕业设计、课程设计、期末/期中/大作业、工程实训、大创等学科竞赛比赛、初期项目立项、学习/练手等方面,可借鉴此优质项目实现复刻,设计报告也可借鉴此项目,也可基于此项目来扩展开发出更多功能 【提供帮助】:有任何使用上的问题欢迎随时与我联系,抽时间努力解答解惑,提供帮助 【附带帮助】:若还需要相关开发工具、学习资料等,我会提供帮助,提供资料,鼓励学习进步 下载后请首先打开说明文件(如有);整理时不同项目所包含资源内容不同;项目工程可实现复现复刻,如果基础还行,也可在此程序基础上进行修改,以实现其它功能。供开源学习/技术交流/学习参考,勿用于商业用途。质量优质,放心下载使用
DSP28335通过SPI与AD7606八路信号采集与通信的实践:实时数值与波形展示在上位机界面上,DSP28335与AD7606 SPI通信:采集八路信号并通过SCI上送至上位机实现数据及波形显示,Dsp28335利用spi与ad7606通信,采集八路信号,通过sci发送到到上位机显示数值和波形 ,DSP28335; SPI; AD7606; 八路信号采集; SCI; 上位机显示; 数值和波形,DSP28335 SPI通讯 AD7606 八路信号采集 SCI发送上位机显示
项目工程资源经过严格测试运行并且功能上ok,可实现复现复刻,拿到资料包后可实现复现出一样的项目,本人系统开发经验充足(全栈全领域),有任何使用问题欢迎随时与我联系,我会抽时间努力为您解惑,提供帮助 【资源内容】:包含源码+工程文件+说明等。答辩评审平均分达到96分,放心下载使用!可实现复现;设计报告也可借鉴此项目;该资源内项目代码都经过测试运行,功能ok 【项目价值】:可用在相关项目设计中,皆可应用在项目、毕业设计、课程设计、期末/期中/大作业、工程实训、大创等学科竞赛比赛、初期项目立项、学习/练手等方面,可借鉴此优质项目实现复刻,设计报告也可借鉴此项目,也可基于此项目来扩展开发出更多功能 【提供帮助】:有任何使用上的问题欢迎随时与我联系,抽时间努力解答解惑,提供帮助 【附带帮助】:若还需要相关开发工具、学习资料等,我会提供帮助,提供资料,鼓励学习进步 下载后请首先打开说明文件(如有);整理时不同项目所包含资源内容不同;项目工程可实现复现复刻,如果基础还行,也可在此程序基础上进行修改,以实现其它功能。供开源学习/技术交流/学习参考,勿用于商业用途。质量优质,放心下载使用
1、文件内容:marisa-ruby-0.2.4-4.el7.rpm以及相关依赖 2、文件形式:tar.gz压缩包 3、安装指令: #Step1、解压 tar -zxvf /mnt/data/output/marisa-ruby-0.2.4-4.el7.tar.gz #Step2、进入解压后的目录,执行安装 sudo rpm -ivh *.rpm 4、更多资源/技术支持:公众号禅静编程坊
基于Tent混沌映射的麻雀搜索算法优化:提高全局搜索能力与初始解质量,基于Tent混沌映射的麻雀搜索算法优化:提高全局搜索能力与初始解质量,基于Tent混沌映射的麻雀搜索算法matlab代码: 针对麻雀搜索算法(SSA)在接近全局最优时,种群多样性减少,易陷入局部最优解等问题,提出了一种混沌麻雀搜索优化算法(CSSA)。 通过改进 Tent 混沌序列初始化种群,提高初始解的质量,增强算法的全局搜索能力; ,基于Tent混沌映射的麻雀搜索算法; CSSA(混沌麻雀搜索优化算法); Tent混沌序列初始化种群; 全局搜索能力。,基于Tent混沌映射的CSSA算法:提高麻雀搜索全局搜索能力
项目工程资源经过严格测试运行并且功能上ok,可实现复现复刻,拿到资料包后可实现复现出一样的项目,本人系统开发经验充足(全栈全领域),有任何使用问题欢迎随时与我联系,我会抽时间努力为您解惑,提供帮助 【资源内容】:包含源码+工程文件+说明等。答辩评审平均分达到96分,放心下载使用!可实现复现;设计报告也可借鉴此项目;该资源内项目代码都经过测试运行,功能ok 【项目价值】:可用在相关项目设计中,皆可应用在项目、毕业设计、课程设计、期末/期中/大作业、工程实训、大创等学科竞赛比赛、初期项目立项、学习/练手等方面,可借鉴此优质项目实现复刻,设计报告也可借鉴此项目,也可基于此项目来扩展开发出更多功能 【提供帮助】:有任何使用上的问题欢迎随时与我联系,抽时间努力解答解惑,提供帮助 【附带帮助】:若还需要相关开发工具、学习资料等,我会提供帮助,提供资料,鼓励学习进步 下载后请首先打开说明文件(如有);整理时不同项目所包含资源内容不同;项目工程可实现复现复刻,如果基础还行,也可在此程序基础上进行修改,以实现其它功能。供开源学习/技术交流/学习参考,勿用于商业用途。质量优质,放心下载使用
上接战略下接绩效的培训规划.pptx
基于S7-300 PLC和Wincc Flexible触摸屏的温室大棚智能控制解决方案:梯形图程序详解、接线与原理图图谱及组态设计,基于S7-300 PLC与Wincc Flexible触摸屏的温室大棚智能控制解决方案:梯形图程序、接线图与组态画面全解析,No.943 基于S7-300 PLC和Wincc Flexible触摸屏温室大棚控制 带解释的梯形图程序,接线图原理图图纸,io分配,组态画面 ,943; S7-300 PLC; Wincc Flexible触摸屏; 温室大棚控制; 梯形图程序; 接线图原理图; IO分配; 组态画面,S7-300 PLC与Wincc Flexible触摸屏联合打造:No.943温室大棚控制系统的设计与实现
基于ADMM算法的GAMS程序:发电商竞标策略模型及其应用解析,GAMS程序解析:基于ADMM算法的发电商竞标策略优化模型与代码实现,GAMS程序:ADMM算法-基于ADMM法的发电商竞标策略 本程序主要介绍ADMM算法在GAMS中的编写方式,模型基于发电商竞标策略进行编写,基本包含了文章中的模型,但并非完全复现,可作为参考程序自学使用,也可在程序的基础上进行修改使用。 需要的同学可根据以下图片研究是否为自己需要的程序进行。 也可提供ADMM部分程序。 程序包括两个,分别为解决战略投资问题的广义MILP制定的GAMS代码、基于提出的共识- admm算法解决战略投资问题的GAMS代码 ,GAMS程序; ADMM算法; 发电商竞标策略; 模型编写; 广义MILP; 共识-ADMM算法; 战略投资问题; 程序修改。,GAMS程序:ADMM算法在发电商竞标策略中的应用示例
项目工程资源经过严格测试运行并且功能上ok,可实现复现复刻,拿到资料包后可实现复现出一样的项目,本人系统开发经验充足(全栈全领域),有任何使用问题欢迎随时与我联系,我会抽时间努力为您解惑,提供帮助 【资源内容】:包含源码+工程文件+说明等。答辩评审平均分达到96分,放心下载使用!可实现复现;设计报告也可借鉴此项目;该资源内项目代码都经过测试运行,功能ok 【项目价值】:可用在相关项目设计中,皆可应用在项目、毕业设计、课程设计、期末/期中/大作业、工程实训、大创等学科竞赛比赛、初期项目立项、学习/练手等方面,可借鉴此优质项目实现复刻,设计报告也可借鉴此项目,也可基于此项目来扩展开发出更多功能 【提供帮助】:有任何使用上的问题欢迎随时与我联系,抽时间努力解答解惑,提供帮助 【附带帮助】:若还需要相关开发工具、学习资料等,我会提供帮助,提供资料,鼓励学习进步 下载后请首先打开说明文件(如有);整理时不同项目所包含资源内容不同;项目工程可实现复现复刻,如果基础还行,也可在此程序基础上进行修改,以实现其它功能。供开源学习/技术交流/学习参考,勿用于商业用途。质量优质,放心下载使用
重庆市农村信用合作社 农商行数字银行系统建设方案.ppt
三菱FX5U定位模块与昆仑通态触摸屏包装机配置程序集成:五轴控制及双轴插补技术,三菱FX5U定位模块与伺服系统控制:包装机配置清单及功能分配手册,三菱 FX5U定位模块5轴 2轴插补伺服 包括三菱FX5U伺服5轴程序2轴插补,昆仑通态触摸屏程序。 包装机程序,有详细配置清单 IO表 功能分配等清单 扩展FX5-16ET-ES-H定位,有定位设置说明 ,三菱FX5U;定位模块;5轴;2轴插补伺服;昆仑通态触摸屏程序;包装机程序;配置清单;IO表;功能分配;扩展FX5-16ET-ES-H定位设置。,三菱FX5U定位模块:5轴伺服控制与2轴插补程序包
项目工程资源经过严格测试运行并且功能上ok,可实现复现复刻,拿到资料包后可实现复现出一样的项目,本人系统开发经验充足(全栈全领域),有任何使用问题欢迎随时与我联系,我会抽时间努力为您解惑,提供帮助 【资源内容】:包含源码+工程文件+说明等。答辩评审平均分达到96分,放心下载使用!可实现复现;设计报告也可借鉴此项目;该资源内项目代码都经过测试运行;功能ok 【项目价值】:可用在相关项目设计中,皆可应用在项目、毕业设计、课程设计、期末/期中/大作业、工程实训、大创等学科竞赛比赛、初期项目立项、学习/练手等方面,可借鉴此优质项目实现复刻,设计报告也可借鉴此项目,也可基于此项目来扩展开发出更多功能 【提供帮助】:有任何使用上的问题欢迎随时与我联系,抽时间努力解答解惑,提供帮助 【附带帮助】:若还需要相关开发工具、学习资料等,我会提供帮助,提供资料,鼓励学习进步 下载后请首先打开说明文件(如有);整理时不同项目所包含资源内容不同;项目工程可实现复现复刻,如果基础还行,也可在此程序基础上进行修改,以实现其它功能。供开源学习/技术交流/学习参考,勿用于商业用途。质量优质,放心下载使用
恶性肿瘤骨转移临床诊疗专家共识总论要点解读.pptx