Junk Dimension
A junk dimension is a convenient grouping of typically low-cardinality flags and indicators. By creating an abstract dimension, these flags and indicators are removed from the fact table while placing them into a useful dimensional framework.A Junk Dimension is a dimension table consisting of attributes that do not belong in the fact table or in any of the existing dimension tables. The nature of these attributes is usually text or various flags, e.g. non-generic comments or just simple yes/no or true/false indicators. These kinds of attributes are typically remaining when all the obvious dimensions in the business process have been identified and thus the designer is faced with the challenge of where to put these attributes that do not belong in the other dimensions.
One solution is to create a new dimension for each of the remaining attributes, but due to their nature, it could be necessary to create a vast number of new dimensions resulting in a fact table with a very large number of foreign keys. The designer could also decide to leave the remaining attributes in the fact table but this could make the row length of the table unnecessarily large if, for example, the attributes is a long text string.
The solution to this challenge is to identify all the attributes and then put them into one or several Junk Dimensions. One Junk Dimension can hold several true/false or yes/no indicators that have no correlation with each other, so it would be convenient to convert the indicators into a more describing attribute. An example would be an indicator about whether a package had arrived, instead of indicating this as “yes” or “no”, it would be converted into “arrived” or “pending” in the junk dimension. The designer can choose to build the dimension table so it ends up holding all the indicators occurring with every other indicator so that all combinations are covered. This sets up a fixed size for the table itself which would be 2x rows, where x is the number of indicators. This solution is appropriate in situations where the designer would expect to encounter a lot of different combinations and where the possible combinations are limited to an acceptable level. In a situation where the number of indicators are large, thus creating a very big table or where the designer only expect to encounter a few of the possible combinations, it would be more appropriate to build each row in the junk dimension as new combinations are encountered. To limit the size of the tables, multiple junk dimensions might be appropriate in other situations depending on the correlation between various indicators.
Junk dimensions are also appropriate for placing attributes like non-generic comments from the fact table. Such attributes might consist of data from an optional comment field when a customer places an order and as a result will probably be blank in many cases. Therefore the junk dimension should contain a single row representing the blanks as a surrogate key that will be used in the fact table for every row returned with a blank comment field
Instance
In a junk dimension, we combine these indicator fields into a single dimension. This way, we'll only need to build a single dimension table, and the number of fields in the fact table, as well as the size of the fact table, can be decreased. The content in the junk dimension table is the combination of all possible values of the individual indicator fields.
Let's look at an example. Assuming that we have the following fact table:
FACT_TABLE
CUSTOMER_ID |
PRODUCT_ID |
TXN_ID |
STORE_ID |
TXN_CODE |
COUPON_IND |
PREPAY_IND |
TXT_AMT |
In this example, TXN_CODE, COUPON_IND, and PREPAY_IND are all indicator fields. In this existing format, each one of them is a dimension. Using the junk dimension principle, we can combine them into a single junk dimension, resulting in the following fact table:
FACT_TABLE
CUSTOMER_ID |
PRODUCT_ID |
TXN_ID |
STORE_ID |
JUNK_ID |
TXT_AMT |
Note that now the number of dimensions in the fact table went from 7 to 5.
The content of the junk dimension table would look like the following:
In this case, we have 3 possible values for the TXN_CODE field, 2 possible values for the COUPON_IND field, and 2 possible values for the PREPAY_IND field. This results in a total of 3 x 2 x 2 = 12 rows for the junk dimension table.
By using a junk dimension to replace the 3 indicator fields, we have decreased the number of dimensions by 2 and also decreased the number of fields in the fact table by 2. This will result in a data warehousing environment that offer better performance as well as being easier to manage.
参考至:《Star Schema The Complete Reference》
http://www.1keydata.com/datawarehousing/junk-dimension.html
http://en.wikipedia.org/wiki/Dimension_%28data_warehouse%29
本文原创,转载请注明出处,作者
如有错误,欢迎指正
邮箱:czmcj@163.com
相关推荐
用junk替代rm功能,删除时不直接删除,而是将待删除内容移动到/trash目录中。要求:①junk默认只能删除文件不能删除目录,而且若无法删除时要给出相应信息;②提供-r选项,通过-r可以递归删除目录;③-f选项:直接...
"junk"命令的设计与实现就是这样一个示例,它提供了一种更安全的方式来替代传统的"rm"命令,避免了误删重要文件的风险。"rm"命令在执行时会直接永久性地删除指定的文件或目录,而"junk"则将待删除内容移动到"/trash...
精品ppt模板PPT素材junkfood_for_thought029
Junk culture
junkfood_for_thought
C++ Junk Code(花码) 生成器,方便Coder快捷生成Junk Code,本源码为E语言源码
垃圾过滤掉例如.DS_Store和Thumbs.db安装 $ npm install junk用法 const { promises : fs } = require ( 'fs' ) ;const junk = require ( 'junk' ) ;( async ( ) => {const files = await fs . readdir ( 'some/path...
标题 "junk2_labview_" 暗示我们关注的是一个与LabVIEW相关的项目,可能是一个实验、程序或工具,用于处理图像数据。LabVIEW,全称Laboratory Virtual Instrument Engineering Workbench,是一款由美国国家仪器(NI...
标题中的“rEFInd-minimal-master.zip_Junk_Say_Say_Say_refind_manjaro”提到了两个关键概念:rEFInd和Manjaro。rEFInd是一款启动管理器,它允许用户在多操作系统环境下选择要启动的操作系统。Manjaro则是一个基于...
this is a junk to activate my account
结合"junk-file"和"Shell"的标签,我们可以讨论如何使用Shell命令来管理和清理垃圾文件。例如,`find`命令可以用来查找特定类型的垃圾文件,如查找所有的.log文件: ```bash find /path/to/search -name "*.log" ``...
这个"Delphi Junk (1).zip"压缩包似乎包含了一些与Delphi编程相关的源代码文件和可能的库。让我们逐一分析这些文件来探讨其中蕴含的知识点。 1. **Threading.AsyncEx.pas**: 这个文件很可能是一个实现异步编程的...
在当今社会,垃圾食品(Junk Food)充斥在我们的周围,尤其是对于正在成长中的青少年来说,这种高糖、高盐、高脂肪的食品诱惑几乎无处不在。但是,这些食品虽然口感诱人,却隐藏着对健康的巨大威胁。本文将从个人...
... ... ...1. 文档类型声明(<!...3. 头部元素():包含元信息,如字符集设置()、页面标题()等。...4. 主体元素():网页的实际内容,如文本、图像、表格等。...- `<h1>`到`<h6>`:定义不同级别的标题,`<h1>`是最重要的标题...
Cyberjunk2020 README.md Cyberjunk2020的主要仓库,即将到来的2D自上而下的Bullet Hell游戏从文件快速启动从GITHUB下载压缩的Cyberjunk2020文件。 解压缩文件文件夹以访问原始文件并设置游戏组件。 ...
"Junk Bait" 是一个开源项目,设计用于本地化垃圾邮件来源的Web应用程序。它的核心功能是通过创建虚假电子邮件地址来引诱垃圾邮件发送者,即所谓的“邮件收割机”。这些工具通常会扫描互联网,收集公开的电子邮件...
noob-junk.github.io
报告_JUNK Roundcube Webmail 插件 这个插件是致力于帮助全球打击垃圾邮件的结果。 该插件在 Roundcube Webmailer 中提供了一个新按钮,它将向等服务发布垃圾邮件的匿名标头信息 您还可以将其用作向本地脚本报告...