规则1:Load detail atomic data into dimensional structures.
必须把所有的细节数据都装载到维结构中来,也许有些数据在当时看来并没有太大用处,也许用户可能只关心聚集了之后的数据,但是谁也不能保证用户的需求不会增长或者变化。
规则2:Structure dimensianal model around business processes
通常情况下,一个事实表对于一个业务流程,比如说,给用户开账单,可能涉及到时间,地理,产品类型多个维度,涉及到一个事实表:金额表。另外,除了简单事实表以外,可能在细节数据水平上,有多个业务流程使用统一的一个事实表,统一的事实表是简单事实表的一个补充,而不是它的替代品。
规则3:Ensure that every fact tables has an associated date dimensional table
确保每一个事实表都有一个与之关联的时间维表,有时候还可能有多个时间维对应到一个事实表,比如说重要事件的时间表,重大的节日等等。
规则4:Ensure that all facts in a single fact table are at the same grain or level of detail
确保在一个单一事实表中的所有事实都具有相同的粒度或细节水平
比如说,你把按照月份统一以后的数据和按天统计的数据放在一个事实表中,这是不对的。
规则5:reslove the many-to-many relationship in fact tables;
避免多对多关系出现在事实表中,这样损坏事实表本身的粒度。由于事实表和维表之间本身就是多对多关系,如果在事实表中有多对多关系要记录,请把这些关系用事实表和维表的维度关系来表示。
规则6:reslove the many-to-one relationship in demension tables;
避免多对一的关系出现在维表中。比如说:在学生表中,有班主任属性,学生和班主任属性是多对一的关系,当在建立学生维时,这样的关系就应该分解,产生雪花模型,将班主任属性提取到比较小的子维去。
但是有时候,多对一关系可以出现在事实表中,比如说维表里的数据很多,而且维表的ROLL-UP属性经常发生变化的时候,可以考虑把这种多对一关系到事实表中处理。
规则7:store report labels and filter domain values in dimention tables;
尽量避免大块的描述信息出现在事实表中,比如说某个字段在report中显示的label,如果是必不可少的描述信息,请在维表中添加某个属性来记录。
规则8:Make certain that dimension tables use a surrogate key.
确保每一个维表(除了时间维)是使用一个代理的主键,也就是没有业务意义的主键,比如sequence自增。
规则9:create conformed dimensions to integrate data across the enterprise
首先建立一些标准的维来集成企业数据,比如说:制造企业的产品维,这个维几乎在各个数据集市中都需要用到,先把这些维建立起来,可以减少以后的冗余设计。
规则10:Continuousely balance requirements and realities to deliver a dw/bi solusion that'accept by business users and that supports their decision-support
在维度建模的过程中,不断的做到需求和实现之间的平衡,使得你的设计能能够不断的适应用户不断变化的需求。
原文如下:
Rule #1: Load detailed atomic data into dimensional structures.
Dimensional models should be populated with bedrock atomic details to support the unpredictable filtering and grouping required by business user queries. Users typically don't need to see a single record at a time, but you can't predict the somewhat arbitrary ways they'll want to screen and roll up the details. If only summarized data is available, then you've already made assumptions about data usage patterns that will cause users to run into a brick wall when they want to dig deeper into the details. Of course, atomic details can be complemented by summary dimensional models that provide performance advantages for common queries of aggregated data, but business users cannot live on summary data alone; they need the gory details to answer their ever-changing questions.
Rule #2: Structure dimensional models around business processes.
Business processes are the activities performed by your organization; they represent measurement events, like taking an order or billing a customer. Business processes typically capture or generate unique performance metrics associated with each event. These metrics translate into facts, with each business process represented by a single atomic fact table. In addition to single process fact tables, consolidated fact tables are sometimes created that combine metrics from multiple processes into one fact table at a common level of detail. Again, consolidated fact tables are a complement to the detailed single-process fact tables, not a substitute for them.
Rule #3: Ensure that every fact table has an associated date dimension table.
The measurement events described in Rule #2 always have a date stamp of some variety associated with them, whether it's a monthly balance snapshot or a monetary transfer captured to the hundredth of a second. Every fact table should have at least one foreign key to an associated date dimension table, whose grain is a single day, with calendar attributes and nonstandard characteristics about the measurement event date, such as the fiscal month and corporate holiday indicator. Sometimes multiple date foreign keys are represented in a fact table.
<!--new pagination - -->
Rule #4: Ensure that all facts in a single fact table are at the same grain or level of detail.
There are three fundamental grains to categorize all fact tables: transactional, periodic snapshot, or accumulating snapshot. Regardless of its grain type, every measurement within a fact table must be at the exact same level of detail. When you mix facts representing multiple levels of granularity in the same fact table, you are setting yourself up for business user confusion and making the BI applications vulnerable to overstated or otherwise erroneous results.
Rule #5: Resove many-to-many relationships in fact tables.
Since a fact table stores the results of a business process event, there's inherently a many-to-many (M:M) relationship between its foreign keys, such as multiple products being sold in multiple stores on multiple days. These foreign key fields should never be null. Sometimes dimensions can take on multiple values for a single measurement event, such as the multiple diagnoses associated with a health care encounter or multiple customers with a bank account. In these cases, it's unreasonable to resolve the many-valued dimensions directly in the fact table, as this would violate the natural grain of the measurement event. Thus, we use a many-to-many, dual-keyed bridge table in conjunction with the fact table.
Rule #6: Resolve many-to-one relationships in dimension tables.
Hierarchical, fixed-depth many-to-one (M:1) relationships between attributes are typically denormalized or collapsed into a flattened dimension table. If you've spent most of your career designing entity-relationship models for transaction processing systems, you'll need to resist your instinctive tendency to normalize or snowflake a M:1 relationship into smaller subdimensions; dimension denormalization is the name of the game in dimensional modeling.
It is relatively common to have multiple M:1 relationships represented in a single dimension table. One-to-one relationships, like a unique product description associated with a product code, are also handled in a dimension table. Occasionally many-to-one relationships are resolved in the fact table, such as the case when the detailed dimension table has millions of rows and its roll-up attributes are frequently changing. However, using the fact table to resolve M:1 relationships should be done sparingly.
Rule #7: Store report labels and filter domain values in dimension tables.
The codes and, more importantly, associated decodes and descriptors used for labeling and query filtering should be captured in dimension tables. Avoid storing cryptic code fields or bulky descriptive fields in the fact table itself; likewise, don't just store the code in the dimension table and assume that users don't need descriptive decodes or that they'll be handled in the BI application. If it's a row/column label or pull-down menu filter, then it should be handled as a dimension attribute.
Though we stated in Rule #5 that fact table foreign keys should never be null, it's also advisable to avoid nulls in the dimension tables' attribute fields by replacing the null value with "NA" (not applicable) or another default value, determined by the data steward, to reduce user confusion if possible.
<!--new pagination - -->
Rule #8: Make certain that dimension tables use a surrogate key.
Meaningless, sequentially assigned surrogate keys (except for the date dimension, where chronologically assigned and even more meaningful keys are acceptable) deliver a number of operational benefits, including smaller keys which mean smaller fact tables, smaller indexes, and improved performance. Surrogate keys are absolutely required if you're tracking dimension attribute changes with a new dimension record for each profile change. Even if your business users don't initially visualize the value of tracking attribute changes, using surrogates will make a downstream policy change less onerous. The surrogates also allow you to map multiple operational keys to a common profile, plus buffer you from unexpected operational activities, like the recycling of an obsolete product number or acquisition of another company with its own coding schemes.
Rule #9: Create conformed dimensions to integrate data across the enterprise.
Conformed dimensions (otherwise known as common, master, standard or reference dimensions) are essential for enterprise data warehousing. Managed once in the ETL system and then reused across multiple fact tables, conformed dimensions deliver consistent descriptive attributes across dimensional models and support the ability to drill across and integrate data from multiple business processes. The Enterprise Data Warehouse Bus Matrix is the key architecture blueprint for representing the organization's core business processes and associated dimensionality. Reusing conformed dimensions ultimately shortens the time-to-market by eliminating redundant design and development efforts; however, conformed dimensions require a commitment and investment in data stewardship and governance, even if you don't need everyone to agree on every dimension attribute to leverage conformity.
Rule #10: Continuously balance requirements and realities to deliver a DW/BI solution that's accepted by business users and that supports their decision-making.
Dimensional modelers must constantly straddle business user requirements along with the underlying realities of the associated source data to deliver a design that can be implemented and that, more importantly, stands a reasonable chance of business adoption. The requirements-versus-realities balancing act is a fact of life for DW/BI practitioners, whether you're focused on the dimensional model, project strategy, technical/ETL/BI architectures or deployment/maintenance plan.
If you've read our Intelligent Enterprise articles, Toolkit books or monthly Design Tips regularly, these rules shouldn't be news to you, but here we've consolidated our rules into a single rulebook that you can refer to when you are gathered to design or review your models.
Good luck!
<!--new pagination - -->
分享到:
相关推荐
- **数据仓库建模的十条戒律**:Inmon 提出了十条基本原则,指导数据仓库的建模过程,确保数据质量、一致性及易于理解。 #### 五、数据仓库开发过程 1. **数据模型的内容**:定义数据仓库中实体及其关系的细节。 2...
- **维度建模**(Dimensional Modeling): 使用事实表和维度表来组织数据,便于查询和分析。 - **规范化的建模**(Normalized Modeling): 适用于需要高度一致性和完整性的情况。 2. **数据仓库建模的十条戒律**:...
vue3 访问通义千问聊天代码例子
基于Python的Flask-vue基于Hadoop的智慧校园数据共享平台实现源码-演示视频 项目关键技术 开发工具:Pycharm 编程语言: python 数据库: MySQL5.7+ 后端技术:Flask 前端技术:HTML 关键技术:HTML、MYSQL、Python 数据库工具:Navicat、SQLyog
【实验1】:读取一次AI0通道数值 【实验2】:一次读取AI0通道多个数值 【实验3】:单次模拟量输出 【实验4】:连续模拟量输出(输出一个正弦曲线)
无人船的Smith-PID跟踪控制方法研究及实现:融合传统与最优PID策略的LOS曲线跟踪资料,基于无人船Smith-PID改进跟踪控制技术及其LOS曲线跟踪方法研究资料,基于无人船的smith-pid跟踪控制资料。 首先,针对pid进行了改进,有传统pid,最优pid和基于smith的pid三种控制方式。 然后还在smithpid基础上设计了LOS的曲线跟踪方法。 (有对应参考文献)。 有意者可直接联系,参考学习资料。 python语言。 ,基于无人船的Smith-PID跟踪控制; PID改进(传统PID、最优PID、基于Smith的PID); Smith-PID曲线跟踪方法; 参考学习资料; Python语言。,基于无人船的Smith-PID优化跟踪控制资料
自研船舶电力推进系统MATLAB仿真报告:从柴油机+同步发电机到异步电机直接转矩控制的全面模拟与实践,《船舶电力推进系统自搭MATLAB仿真报告:从柴油机同步发电机到异步电机直接转矩控制的完整过程与参数配置详解》,自己搭建的船舶电力推进系统(船舶电力推进自动控制)完全自搭MATLAB仿真,可适度,含对应27页正文的中文报告,稀缺资源,仿真包括船舶电站,变流系统和异步电机直接转矩控制,放心用吧。 三个文件逐层递进 柴油机+同步发电机(船舶电站) 柴油机+同步发电机+不控整流全桥逆变 柴油机+同步发电机+变流模块+异步电机直接转矩控制 所有参数都是配好的,最大负载参考变流系统所带负载两倍,再大柴油机和同步发电机参数就不匹配了,有能力可以自己调 ,核心关键词:船舶电力推进系统; MATLAB仿真; 船舶电站; 变流系统; 异步电机直接转矩控制; 柴油机; 同步发电机; 不控整流全桥逆变; 参数配比。,《船舶电力推进系统MATLAB仿真报告》
西门子博图WinCC V15自动化系统项目实战:多服务器客户端下的PID DCS闭环控制及参数调整实战指南,西门子博图WinCC V15自动化系统项目实战:多服务器客户端下的PID DCS闭环控制及参数调整实战指南,西门子博图WinCC V 15大型自动化系统项目,包含多台服务器客户端项目,系统采用安全1516F -3PN DP 外挂多台精智面板,1200PLC ET200SP 变频器 对整个工艺过程PID DCS 闭环过程控制,如何调整温度压力流量液位等参数,实用工程项目案例 ,西门子博图WinCC V 15; 大型自动化系统; 多台服务器客户端; 安全外挂; 精智面板; 1200PLC ET200SP; 变频器; PID DCS; 闭环过程控制; 温度压力流量液位调整; 工程项目案例,西门子博图WinCC V15大型项目:多服务器客户端的PID DCS闭环控制与实用参数调整
内容概要:本文详尽介绍了计算机网络相关资源及其各方面构成要素,首先阐述了硬件层面的各种传输媒介和设备如双绞线、同轴电缆、光纤以及台式电脑、笔记本、大型计算机等设备,还包括网络互联所需的各类组件如网卡、交换机、路由器等。其次探讨了多种操作系统的特性和主要功能,以及各类通讯和支持应用程序的概述,涵盖浏览器、图像和视频编辑等常用软件。再深入讨论了多种常见网络协议如TCP、UDP、HTTP等的功能特性。最后还提到了确保网络安全运行的重要措施和工具如MIB、SNMP以及防火墙、入侵检测系统等。并且简要提到计算机网络在不同的应用环境,从局域网到移动网络。 适合人群:所有对计算机网络技术感兴趣的初学者和希望深入了解各个组成成分的技术人员. 使用场景及目标:为用户提供计算机网络资源全面而系统的认识,帮助他们建立对于该领域的理论和技术的扎实认知基础,提高在实际环境中识别配置及维护计算机网络系统的能力.
海神之光上传的视频是由对应的完整代码运行得来的,完整代码皆可运行,亲测可用,适合小白; 1、从视频里可见完整代码的内容 主函数:main.m; 调用函数:其他m文件;无需运行 运行结果效果图; 2、代码运行版本 Matlab 2019b;若运行有误,根据提示修改;若不会,私信博主; 3、运行操作步骤 步骤一:将所有文件放到Matlab的当前文件夹中; 步骤二:双击打开main.m文件; 步骤三:点击运行,等程序运行完得到结果; 4、仿真咨询 如需其他服务,可私信博主; 4.1 博客或资源的完整代码提供 4.2 期刊或参考文献复现 4.3 Matlab程序定制 4.4 科研合作
ABAQUS中隧道结构模型的无限元应用:超声激励源的施加方法、3D无限元吸收边界的添加技巧、模型结果精确性校核流程及教学视频与CAE、INP文件解析,ABAQUS隧道模型中3D无限元吸收边界的应用:超声激励源的施加与模型结果精确性校核的实践教程,ABAQUS无限元吸收边界,abaqus隧道无限元,1.超声激励源施加;2.3D无限元吸收边界添加方法;3.模型结果精确性校核;4.提供教学视频,cae、inp文件。 ,ABAQUS无限元吸收边界;ABAQUS隧道无限元;超声激励源施加;3D无限元吸收边界添加;模型结果精确性校核;CAE和INP文件。,ABAQUS中超声激励下无限元吸收边界设置及模型精度验证教程
海神之光上传的视频是由对应的完整代码运行得来的,完整代码皆可运行,亲测可用,适合小白; 1、从视频里可见完整代码的内容 主函数:main.m; 调用函数:其他m文件;无需运行 运行结果效果图; 2、代码运行版本 Matlab 2019b;若运行有误,根据提示修改;若不会,私信博主; 3、运行操作步骤 步骤一:将所有文件放到Matlab的当前文件夹中; 步骤二:双击打开main.m文件; 步骤三:点击运行,等程序运行完得到结果; 4、仿真咨询 如需其他服务,可私信博主; 4.1 博客或资源的完整代码提供 4.2 期刊或参考文献复现 4.3 Matlab程序定制 4.4 科研合作
git自用lllllllllllllllllll
本资源与文章【Django小白项目】为一体,此为已成功项目,供给给Django初学者做参考,有不会的问题可以私信我噢~
使用一维数据表示向量和二维矩阵,支持常用运算。
1、以上文章可用于参考,请勿直接抄袭,学习、当作参考文献可以,主张借鉴学习 2、资源本身不含 对应项目代码,如需完整项目源码,请私信博主获取
基于多目标粒子群优化算法(MOPSO)的微电网多目标经济运行分析与优化策略考虑响应侧响应的协同调度策略,基于多目标粒子群优化算法(MOPSO)的微电网经济调度优化:含风光储荷一体化模型与需求侧响应策略,考虑需求侧响应的微电网多目标经济运行 建立了含风光储荷的微电网模型,以发电侧成本(包括风光储以及电网的购电成本)和负荷侧成本最小为目标,考虑功率平衡以及储能SOC约束,建立了多目标优化模型,通过分时电价引导负荷需求侧响应,得到可削减负荷量,同时求解模型,得到风光储以及电网的运行计划。 这段代码是一个使用多目标粒子群优化算法(MOPSO)解决问题的程序。下面我将对程序进行详细的分析和解释。 首先,程序的目标是通过优化算法来解决一个多目标优化问题。程序中使用的优化算法是多目标粒子群优化算法(MOPSO),该算法通过迭代更新粒子的位置和速度来搜索最优解。 程序的主要功能是对能源系统进行优化调度,包括光伏发电、风力发电、储能和电网供电。程序的目标是最小化能源系统的成本,并满足负荷需求。 程序的主要思路是使用粒子群优化算法来搜索最优解。程序中定义了一个粒子类(Particle),每个粒子代
data.gov.sg geojson部分项目整理
基于MATLAB Simulink的避障功能欠驱动无人船航迹跟踪控制仿真实验研究,基于MATLAB Simulink的欠驱动无人船避障功能路径跟踪控制仿真实验研究,包含避障功能的欠驱动无人船航迹(路径)跟踪控制仿真实验,基于MATLAB Simulink制作 ,避障功能; 欠驱动无人船; 航迹(路径)跟踪控制; MATLAB Simulink 仿真实验; 避障算法。,基于MATLAB Simulink的避障无人船航迹跟踪控制仿真实验