规则1:Load detail atomic data into dimensional structures.
必须把所有的细节数据都装载到维结构中来,也许有些数据在当时看来并没有太大用处,也许用户可能只关心聚集了之后的数据,但是谁也不能保证用户的需求不会增长或者变化。
规则2:Structure dimensianal model around business processes
通常情况下,一个事实表对于一个业务流程,比如说,给用户开账单,可能涉及到时间,地理,产品类型多个维度,涉及到一个事实表:金额表。另外,除了简单事实表以外,可能在细节数据水平上,有多个业务流程使用统一的一个事实表,统一的事实表是简单事实表的一个补充,而不是它的替代品。
规则3:Ensure that every fact tables has an associated date dimensional table
确保每一个事实表都有一个与之关联的时间维表,有时候还可能有多个时间维对应到一个事实表,比如说重要事件的时间表,重大的节日等等。
规则4:Ensure that all facts in a single fact table are at the same grain or level of detail
确保在一个单一事实表中的所有事实都具有相同的粒度或细节水平
比如说,你把按照月份统一以后的数据和按天统计的数据放在一个事实表中,这是不对的。
规则5:reslove the many-to-many relationship in fact tables;
避免多对多关系出现在事实表中,这样损坏事实表本身的粒度。由于事实表和维表之间本身就是多对多关系,如果在事实表中有多对多关系要记录,请把这些关系用事实表和维表的维度关系来表示。
规则6:reslove the many-to-one relationship in demension tables;
避免多对一的关系出现在维表中。比如说:在学生表中,有班主任属性,学生和班主任属性是多对一的关系,当在建立学生维时,这样的关系就应该分解,产生雪花模型,将班主任属性提取到比较小的子维去。
但是有时候,多对一关系可以出现在事实表中,比如说维表里的数据很多,而且维表的ROLL-UP属性经常发生变化的时候,可以考虑把这种多对一关系到事实表中处理。
规则7:store report labels and filter domain values in dimention tables;
尽量避免大块的描述信息出现在事实表中,比如说某个字段在report中显示的label,如果是必不可少的描述信息,请在维表中添加某个属性来记录。
规则8:Make certain that dimension tables use a surrogate key.
确保每一个维表(除了时间维)是使用一个代理的主键,也就是没有业务意义的主键,比如sequence自增。
规则9:create conformed dimensions to integrate data across the enterprise
首先建立一些标准的维来集成企业数据,比如说:制造企业的产品维,这个维几乎在各个数据集市中都需要用到,先把这些维建立起来,可以减少以后的冗余设计。
规则10:Continuousely balance requirements and realities to deliver a dw/bi solusion that'accept by business users and that supports their decision-support
在维度建模的过程中,不断的做到需求和实现之间的平衡,使得你的设计能能够不断的适应用户不断变化的需求。
原文如下:
Rule #1: Load detailed atomic data into dimensional structures.
Dimensional models should be populated with bedrock atomic details to support the unpredictable filtering and grouping required by business user queries. Users typically don't need to see a single record at a time, but you can't predict the somewhat arbitrary ways they'll want to screen and roll up the details. If only summarized data is available, then you've already made assumptions about data usage patterns that will cause users to run into a brick wall when they want to dig deeper into the details. Of course, atomic details can be complemented by summary dimensional models that provide performance advantages for common queries of aggregated data, but business users cannot live on summary data alone; they need the gory details to answer their ever-changing questions.
Rule #2: Structure dimensional models around business processes.
Business processes are the activities performed by your organization; they represent measurement events, like taking an order or billing a customer. Business processes typically capture or generate unique performance metrics associated with each event. These metrics translate into facts, with each business process represented by a single atomic fact table. In addition to single process fact tables, consolidated fact tables are sometimes created that combine metrics from multiple processes into one fact table at a common level of detail. Again, consolidated fact tables are a complement to the detailed single-process fact tables, not a substitute for them.
Rule #3: Ensure that every fact table has an associated date dimension table.
The measurement events described in Rule #2 always have a date stamp of some variety associated with them, whether it's a monthly balance snapshot or a monetary transfer captured to the hundredth of a second. Every fact table should have at least one foreign key to an associated date dimension table, whose grain is a single day, with calendar attributes and nonstandard characteristics about the measurement event date, such as the fiscal month and corporate holiday indicator. Sometimes multiple date foreign keys are represented in a fact table.
<!--new pagination - -->
Rule #4: Ensure that all facts in a single fact table are at the same grain or level of detail.
There are three fundamental grains to categorize all fact tables: transactional, periodic snapshot, or accumulating snapshot. Regardless of its grain type, every measurement within a fact table must be at the exact same level of detail. When you mix facts representing multiple levels of granularity in the same fact table, you are setting yourself up for business user confusion and making the BI applications vulnerable to overstated or otherwise erroneous results.
Rule #5: Resove many-to-many relationships in fact tables.
Since a fact table stores the results of a business process event, there's inherently a many-to-many (M:M) relationship between its foreign keys, such as multiple products being sold in multiple stores on multiple days. These foreign key fields should never be null. Sometimes dimensions can take on multiple values for a single measurement event, such as the multiple diagnoses associated with a health care encounter or multiple customers with a bank account. In these cases, it's unreasonable to resolve the many-valued dimensions directly in the fact table, as this would violate the natural grain of the measurement event. Thus, we use a many-to-many, dual-keyed bridge table in conjunction with the fact table.
Rule #6: Resolve many-to-one relationships in dimension tables.
Hierarchical, fixed-depth many-to-one (M:1) relationships between attributes are typically denormalized or collapsed into a flattened dimension table. If you've spent most of your career designing entity-relationship models for transaction processing systems, you'll need to resist your instinctive tendency to normalize or snowflake a M:1 relationship into smaller subdimensions; dimension denormalization is the name of the game in dimensional modeling.
It is relatively common to have multiple M:1 relationships represented in a single dimension table. One-to-one relationships, like a unique product description associated with a product code, are also handled in a dimension table. Occasionally many-to-one relationships are resolved in the fact table, such as the case when the detailed dimension table has millions of rows and its roll-up attributes are frequently changing. However, using the fact table to resolve M:1 relationships should be done sparingly.
Rule #7: Store report labels and filter domain values in dimension tables.
The codes and, more importantly, associated decodes and descriptors used for labeling and query filtering should be captured in dimension tables. Avoid storing cryptic code fields or bulky descriptive fields in the fact table itself; likewise, don't just store the code in the dimension table and assume that users don't need descriptive decodes or that they'll be handled in the BI application. If it's a row/column label or pull-down menu filter, then it should be handled as a dimension attribute.
Though we stated in Rule #5 that fact table foreign keys should never be null, it's also advisable to avoid nulls in the dimension tables' attribute fields by replacing the null value with "NA" (not applicable) or another default value, determined by the data steward, to reduce user confusion if possible.
<!--new pagination - -->
Rule #8: Make certain that dimension tables use a surrogate key.
Meaningless, sequentially assigned surrogate keys (except for the date dimension, where chronologically assigned and even more meaningful keys are acceptable) deliver a number of operational benefits, including smaller keys which mean smaller fact tables, smaller indexes, and improved performance. Surrogate keys are absolutely required if you're tracking dimension attribute changes with a new dimension record for each profile change. Even if your business users don't initially visualize the value of tracking attribute changes, using surrogates will make a downstream policy change less onerous. The surrogates also allow you to map multiple operational keys to a common profile, plus buffer you from unexpected operational activities, like the recycling of an obsolete product number or acquisition of another company with its own coding schemes.
Rule #9: Create conformed dimensions to integrate data across the enterprise.
Conformed dimensions (otherwise known as common, master, standard or reference dimensions) are essential for enterprise data warehousing. Managed once in the ETL system and then reused across multiple fact tables, conformed dimensions deliver consistent descriptive attributes across dimensional models and support the ability to drill across and integrate data from multiple business processes. The Enterprise Data Warehouse Bus Matrix is the key architecture blueprint for representing the organization's core business processes and associated dimensionality. Reusing conformed dimensions ultimately shortens the time-to-market by eliminating redundant design and development efforts; however, conformed dimensions require a commitment and investment in data stewardship and governance, even if you don't need everyone to agree on every dimension attribute to leverage conformity.
Rule #10: Continuously balance requirements and realities to deliver a DW/BI solution that's accepted by business users and that supports their decision-making.
Dimensional modelers must constantly straddle business user requirements along with the underlying realities of the associated source data to deliver a design that can be implemented and that, more importantly, stands a reasonable chance of business adoption. The requirements-versus-realities balancing act is a fact of life for DW/BI practitioners, whether you're focused on the dimensional model, project strategy, technical/ETL/BI architectures or deployment/maintenance plan.
If you've read our Intelligent Enterprise articles, Toolkit books or monthly Design Tips regularly, these rules shouldn't be news to you, but here we've consolidated our rules into a single rulebook that you can refer to when you are gathered to design or review your models.
Good luck!
<!--new pagination - -->
分享到:
相关推荐
### 闸流管和双向可控硅成功应用的十条黄金规则 #### 1. 闸流管的基本原理与触发条件 闸流管作为一种可控制的整流元件,在电力电子领域有着广泛应用,尤其是在直流传动和调功系统中。闸流管通过门极(Gate)向阴极...
这十条黄金规则涵盖了闸流管和双向可控硅应用的基本要点,遵循这些规则将有助于实现高效、可靠的电路设计。在实际操作中,还需要根据具体的应用环境和需求进行深入的分析和计算,以确保器件的正确选用和安全工作。
IT 人员相信的十条潜规则
北约网络安全十条规则报告是一份由北约组织发布的文档,旨在为成员国提供网络安全领域的指导原则和最佳实践。这份文件强调了在当前网络威胁不断演变的环境下,组织必须采取更为严格的网络防御措施以保护其关键信息...
后进生工作的十条规则.doc
通过理解并遵循文中提出的十条黄金规则,可以有效地提高这些器件在各种应用场景中的性能和可靠性。 #### 闸流管的基本原理与应用技巧 1. **触发导通**:闸流管是一种可控整流器,其导通过程需要通过门极(gate)向...
十条最基本的命令,比如:华为交换机命令在基于IOS的交换机上设置登录口令,华为交换机命令在基于IOS的交换机上设置主机名/系统名等
"十条最有效的PCB设计黄金法则" 本文将介绍十条最有效的PCB设计黄金法则,这些法则自25年前商用PCB设计诞生以来,大多没有任何改变,且广泛适用于各种PCB设计项目,无论是对年轻的电子设计工程师还是更为成熟的电路...
《十条最有效的PCB设计黄金法则》是一篇针对电子设计工程师的入门教程,旨在帮助初学者建立起关于PCB设计的知识框架。PCB,即印制电路板,是电子设备中不可或缺的部分,它为电子组件提供了物理支撑和电气连接。本文...
### 新手入门写Java程序的三十个基本规则 #### 规则一:命名规范 - **类名**:首字母大写,多个单词时每个单词首字母大写(驼峰式命名),例如`ThisIsAClassName`。 - **变量、方法、对象**:首字母小写,多个单词...
《安全操作规程十条规定》是企业确保生产安全、预防事故发生的重要管理制度。这十条规定涵盖了从个人防护、设备操作到紧急处理等多个方面的安全要点,旨在规范员工的行为,提高生产过程的安全性。以下是每一条规定的...
软件测试十条基本原则总结 软件测试是软件开发过程中不可或缺的一部分,它的目的是为了确保软件产品的质量和可靠性。以下是软件测试十条基本原则的总结: 一、所有的测试都应追溯到用户需求 软件测试的目标在于...
这十条诫律覆盖了同步数字系统的基本原则及其常见故障处理方法,目的是确保电路的稳定性和可靠性,避免后期修改和修订所带来的额外时间和成本开销。 ### 数字系统基础 #### 同步数字系统的定义 同步数字系统是指...
医疗卫生人员十条禁令.pdf
### 优秀员工的十条戒律 #### 一、理解和接受自己会犯错误 在软件开发过程中,承认并接受自己会犯错误是成长的第一步。重要的是要学会如何及时地发现这些错误,确保它们不会进入最终的产品中。对于大多数软件...
《9学校(幼儿园)消防安全管理十条规定》是指导学校和幼儿园进行消防安全管理的重要法规,旨在保障校园环境的安全,预防火灾事故的发生。以下是该规定的详细解读: 1. **消防安全责任**:规定强调了学校(幼儿园)的...
环境保护:“大气十条”“水十条”“土十条”.doc