Keys and history
In a star schema, each dimension table is given a surrogate key. This column is a unique identifier, created exclusively for the data warehouse. Surrogate keys are assigned and maintained as part of the process that loads the star schema. The surrogate key has no intrinsic meaning; it is typically an integer. Surrogate keys are sometimes referred to as warehouse keys. The surrogate key is the primary key of the dimension table.
Figure 1-5 A simple star schema for the orders process
In this article, surrogate keys will be easily identifiable by the suffix “_key” in the column name. In Figure 1-5, for example, the surrogate key for the customer dimension is called customer_key, the surrogate key for the salesperson dimension is called salesperson_key, and so forth. Illustrations in this book will always list the surrogate key for a dimension table as its first attribute.
Dimension tables also contain key columns that uniquely identify something in an operational system. Examples in Figure 1-5 include customer_id, sku, and salesperson_id. In the operational systems, these columns identify specific customers, products, and salespeople, respectively. These key columns are referred to as natural keys.
The separation of surrogate keys and natural keys allows the data warehouse to track changes, even if the originating operational system does not. For example, suppose that customer ABC Wholesalers is identified by customer_id 10711 in an operational system.
If the customer changes its headquarters location, the operational system may simply overwrite the address for customer_id 10711. For analytic purposes, however, it may be useful to track the history of ABC Wholesalers. Since the star schema does not rely on customer_id to identify a unique row in the customer dimension, it is possible to store multiple versions of ABC Wholesalers, even though both have the same customer_id—10711. The two versions can be distinguished by different surrogate key values. While it would also be possible to support change tracking by supplementing a natural key with a sequence number, the surrogate key allows fact and dimension tables to be joined based on a single column.
Surrogate Keys and Natural Keys
In the star schema in Figure, surrogate keys for each dimension table are labeled “SK.” Separate and distinct from surrogate keys, one or more natural keys will also be present in most dimension tables.
The natural keys are identifiers carried over from source systems. They may not uniquely identify a row in the data warehouse, but they do identify a corresponding entity in the source system. In above Figure, natural key columns are designated “NK.”
Unlike surrogate key values, the values in natural key columns may have meaning to users of the data warehouse. Even when they do not carry significant meaning, their presence is necessary for the ETL routines that load fact tables.
Sometimes, the natural key for a dimension table consists of more than one column.This occurs when the source system uses a multi-part key to identify the entity. For example,a purchasing contract may be identified by a type code and sequence number. When more than one system can be the source for a dimension, the natural key may be composed of the identifier from the source system and an additional identifier that indicates which source it came from. For example, a bank may have more than one system for deposit accounts after acquiring another bank. The natural key for a customer dimension might,therefore, consist of an identifier used in a source system, in conjunction with a column that indicates the system from which the identifier came.
The use of surrogate keys as unique identifiers allows the data warehouse to respond to changes in source data in whatever manner best fits analytic requirements. Because the dimension table does not rely on the natural key to identify unique rows, it can maintain history even if the source system does not. For example, an order entry system might contain a record for customer_id 404777, which includes the customer’s address. If the system overwrites the address when a customer moves, it is not tracking history. Were the
customer dimension table to use the same customer_id to identify unique rows, it would be able to store only one row for customer_id 404777. It would be unable to maintain the history of the address. By using a surrogate key, it becomes possible to maintain two versions of customer_id 404777. This technique is known as a type 2 slow change.
A possible alternative to the use of a surrogate key is to supplement the natural key with a sequence number. For example, the primary key of the customer dimension table might consist of the customer_id and a version_number column that contains a sequence number. Like the use of surrogate keys, this technique permits the data warehouse to track history independently of the source system, allowing the table to store multiple versions of a customer. This approach provides no value in simplifying the schema design or load
process, however, which must still identify and maintain version history. More importantly,this technique requires multi-part foreign keys to be maintained in the fact table. If customers are identified by customer_id and sequence_no, this pair of columns must be present in order_facts as well. This multi-column foreign key complicates the join process, makes SQL more difficult to read, and in some cases may befuddle efforts to optimize the RDBMS for star join query execution.
Another theoretical alternative to the use of a surrogate key is to supplement a natural key with time stamps. While time stamps may be useful, a multi-part foreign key would still be required in fact tables, potentially leading to the same difficulties as the sequence number. In addition, a designer may be tempted to eliminate multi-column joins by storing only the natural key in the fact table; however, this severely complicates queries and risks error. For example, assume that customer_id 404777 has moved, and therefore has two
rows in the dimension table. Each fact table row contains only the natural key 404777. To identify which version of the customer corresponds to each fact table row, it is necessary to compare order date with the time stamps in the dimension table. This process can be onerous, particularly if one is constructing a report that aggregates a large number of orders. It also becomes difficult for database administrators to tune the system, preventing them, for example, from declaring a foreign key in the fact table and potentially leading to
poor query optimization. Worse, if the date qualification is omitted, facts associated with customer_id 404777 will be double-counted.
It is not common practice to use version numbers or time stamps as part of a unique identifier. Surrogate keys simplify the schema design and allow for clean, single-column joins. Time stamps are frequently included in dimension tables, but not as part of the unique identifier.
参考至:《Star Schema The Complete Reference》
如有错误,欢迎指正
邮箱:czmcj@163.com
相关推荐
IncompatibleClassChangeError(解决方案).md
智慧工地,作为现代建筑施工管理的创新模式,以“智慧工地云平台”为核心,整合施工现场的“人机料法环”关键要素,实现了业务系统的协同共享,为施工企业提供了标准化、精益化的工程管理方案,同时也为政府监管提供了数据分析及决策支持。这一解决方案依托云网一体化产品及物联网资源,通过集成公司业务优势,面向政府监管部门和建筑施工企业,自主研发并整合加载了多种工地行业应用。这些应用不仅全面连接了施工现场的人员、机械、车辆和物料,实现了数据的智能采集、定位、监测、控制、分析及管理,还打造了物联网终端、网络层、平台层、应用层等全方位的安全能力,确保了整个系统的可靠、可用、可控和保密。 在整体解决方案中,智慧工地提供了政府监管级、建筑企业级和施工现场级三类解决方案。政府监管级解决方案以一体化监管平台为核心,通过GIS地图展示辖区内工程项目、人员、设备信息,实现了施工现场安全状况和参建各方行为的实时监控和事前预防。建筑企业级解决方案则通过综合管理平台,提供项目管理、进度管控、劳务实名制等一站式服务,帮助企业实现工程管理的标准化和精益化。施工现场级解决方案则以可视化平台为基础,集成多个业务应用子系统,借助物联网应用终端,实现了施工信息化、管理智能化、监测自动化和决策可视化。这些解决方案的应用,不仅提高了施工效率和工程质量,还降低了安全风险,为建筑行业的可持续发展提供了有力支持。 值得一提的是,智慧工地的应用系统还围绕着工地“人、机、材、环”四个重要因素,提供了各类信息化应用系统。这些系统通过配置同步用户的组织结构、智能权限,结合各类子系统应用,实现了信息的有效触达、问题的及时跟进和工地的有序管理。此外,智慧工地还结合了虚拟现实(VR)和建筑信息模型(BIM)等先进技术,为施工人员提供了更为直观、生动的培训和管理工具。这些创新技术的应用,不仅提升了施工人员的技能水平和安全意识,还为建筑行业的数字化转型和智能化升级注入了新的活力。总的来说,智慧工地解决方案以其创新性、实用性和高效性,正在逐步改变建筑施工行业的传统管理模式,引领着建筑行业向更加智能化、高效化和可持续化的方向发展。
123
asdjhfjsnlkdmv
该代码实现了基于机器学习的车辆价格预测模型,利用不同回归算法(如线性回归、随机森林回归和 KNN 回归)对车辆的当前价格(current price)进行预测。代码首先进行数据加载与预处理,包括删除无关特征、归一化处理等;然后使用不同的机器学习模型进行训练,并评估它们的表现(通过 R²、MAE、MSE 等指标);最后通过可视化工具对模型预测效果进行分析。目的是为车辆价格预测任务找到最合适的回归模型。 适用人群: 数据科学家和机器学习工程师:对于需要进行回归建模和模型选择的从业者,尤其是对车辆数据或类似领域有兴趣的。 企业数据分析师:在汽车行业或二手车市场中,需要对车辆价格进行预测和分析的专业人员。 机器学习学习者:希望学习如何使用 Python 实现机器学习模型、数据预处理和评估的初学者或中级学习者。 使用场景及目标: 汽车定价与估值:用于为汽车或二手车定价,尤其是当需要预测车辆的当前市场价格时。 汽车行业市场分析:通过数据分析和回归预测,帮助汽车销售商、经销商或市场分析师预测未来的市场价格趋势。 二手车市场:为二手车买卖双方提供价格参考,帮助制定合理的交易价格。
基于模型预测控制(mpc)的车辆道,车辆轨迹跟踪,道轨迹为五次多项式,matlab与carsim联防控制
StoreError解决办法.md
白色精致风格的个人简历模板下载.zip
白色宽屏风格的房产介绍服务网站模板下载.zip
基于Python实现的医疗知识图谱的知识问答系统源码毕业设计(高分项目),本资源中的源码都是经过本地编译过可运行的,评审分达到98分,资源项目的难度比较适中,内容都是经过助教老师审定过的能够满足学习、毕业设计、期末大作业和课程设计使用需求,如果有需要的话可以放心下载使用。 基于Python实现的医疗知识图谱的知识问答系统源码毕业设计(高分项目)基于Python实现的医疗知识图谱的知识问答系统源码毕业设计(高分项目)基于Python实现的医疗知识图谱的知识问答系统源码毕业设计(高分项目)基于Python实现的医疗知识图谱的知识问答系统源码毕业设计(高分项目)基于Python实现的医疗知识图谱的知识问答系统源码毕业设计(高分项目)基于Python实现的医疗知识图谱的知识问答系统源码毕业设计(高分项目)基于Python实现的医疗知识图谱的知识问答系统源码毕业设计(高分项目)基于Python实现的医疗知识图谱的知识问答系统源码毕业设计(高分项目)基于Python实现的医疗知识图谱的知识问答系统源码毕业设计(高分项目)基于Python实现的医疗知识图谱的知识问答系统源码毕业设计(高分项目)基于
白色宽屏风格的生物医疗实验室企业网站模板.rar
C# 操作Access数据库
NSFileSystemError如何解决.md
白色简洁风格的商户销售统计图源码下载.zip
白色简洁风格的室内设计整站网站源码下载.zip
侧吸式油烟机sw16可编辑全套技术资料100%好用.zip
在 MATLAB 中进行人脸识别可以通过使用内置的工具箱和函数来实现。MATLAB 提供了计算机视觉工具箱(Computer Vision Toolbox),其中包含了用于图像处理、特征提取以及机器学习的函数,可以用来构建一个人脸识别系统。下面是一个简化的教程,介绍如何使用 MATLAB 进行人脸识别。 ### 准备工作 1. **安装必要的工具箱**:确保你已经安装了“计算机视觉工具箱”和“深度学习工具箱”。如果没有,可以通过 MATLAB 的附加功能管理器安装它们。 2. **获取数据集**:准备一个包含不同个体的人脸图像的数据集。你可以自己收集图片,或者使用公开的数据集如 AT&T Faces Database 或 LFW (Labeled Faces in the Wild) 数据集。 3. **安装预训练模型(可选)**:如果你打算使用深度学习方法,MATLAB 提供了一些预训练的卷积神经网络(CNN)模型,比如 AlexNet, GoogLeNet 等,可以直接加载并用于特征提取或分类。 ### 步骤指南 #### 1. 加载人脸检测器 ```matlab face
白色宽屏风格的建筑设计公司企业网站源码下载.zip
智慧工地,作为现代建筑施工管理的创新模式,以“智慧工地云平台”为核心,整合施工现场的“人机料法环”关键要素,实现了业务系统的协同共享,为施工企业提供了标准化、精益化的工程管理方案,同时也为政府监管提供了数据分析及决策支持。这一解决方案依托云网一体化产品及物联网资源,通过集成公司业务优势,面向政府监管部门和建筑施工企业,自主研发并整合加载了多种工地行业应用。这些应用不仅全面连接了施工现场的人员、机械、车辆和物料,实现了数据的智能采集、定位、监测、控制、分析及管理,还打造了物联网终端、网络层、平台层、应用层等全方位的安全能力,确保了整个系统的可靠、可用、可控和保密。 在整体解决方案中,智慧工地提供了政府监管级、建筑企业级和施工现场级三类解决方案。政府监管级解决方案以一体化监管平台为核心,通过GIS地图展示辖区内工程项目、人员、设备信息,实现了施工现场安全状况和参建各方行为的实时监控和事前预防。建筑企业级解决方案则通过综合管理平台,提供项目管理、进度管控、劳务实名制等一站式服务,帮助企业实现工程管理的标准化和精益化。施工现场级解决方案则以可视化平台为基础,集成多个业务应用子系统,借助物联网应用终端,实现了施工信息化、管理智能化、监测自动化和决策可视化。这些解决方案的应用,不仅提高了施工效率和工程质量,还降低了安全风险,为建筑行业的可持续发展提供了有力支持。 值得一提的是,智慧工地的应用系统还围绕着工地“人、机、材、环”四个重要因素,提供了各类信息化应用系统。这些系统通过配置同步用户的组织结构、智能权限,结合各类子系统应用,实现了信息的有效触达、问题的及时跟进和工地的有序管理。此外,智慧工地还结合了虚拟现实(VR)和建筑信息模型(BIM)等先进技术,为施工人员提供了更为直观、生动的培训和管理工具。这些创新技术的应用,不仅提升了施工人员的技能水平和安全意识,还为建筑行业的数字化转型和智能化升级注入了新的活力。总的来说,智慧工地解决方案以其创新性、实用性和高效性,正在逐步改变建筑施工行业的传统管理模式,引领着建筑行业向更加智能化、高效化和可持续化的方向发展。
履带车底盘sw16全套技术资料100%好用.zip