Junk Dimension
A junk dimension is a convenient grouping of typically low-cardinality flags and indicators. By creating an abstract dimension, these flags and indicators are removed from the fact table while placing them into a useful dimensional framework.A Junk Dimension is a dimension table consisting of attributes that do not belong in the fact table or in any of the existing dimension tables. The nature of these attributes is usually text or various flags, e.g. non-generic comments or just simple yes/no or true/false indicators. These kinds of attributes are typically remaining when all the obvious dimensions in the business process have been identified and thus the designer is faced with the challenge of where to put these attributes that do not belong in the other dimensions.
One solution is to create a new dimension for each of the remaining attributes, but due to their nature, it could be necessary to create a vast number of new dimensions resulting in a fact table with a very large number of foreign keys. The designer could also decide to leave the remaining attributes in the fact table but this could make the row length of the table unnecessarily large if, for example, the attributes is a long text string.
The solution to this challenge is to identify all the attributes and then put them into one or several Junk Dimensions. One Junk Dimension can hold several true/false or yes/no indicators that have no correlation with each other, so it would be convenient to convert the indicators into a more describing attribute. An example would be an indicator about whether a package had arrived, instead of indicating this as “yes” or “no”, it would be converted into “arrived” or “pending” in the junk dimension. The designer can choose to build the dimension table so it ends up holding all the indicators occurring with every other indicator so that all combinations are covered. This sets up a fixed size for the table itself which would be 2x rows, where x is the number of indicators. This solution is appropriate in situations where the designer would expect to encounter a lot of different combinations and where the possible combinations are limited to an acceptable level. In a situation where the number of indicators are large, thus creating a very big table or where the designer only expect to encounter a few of the possible combinations, it would be more appropriate to build each row in the junk dimension as new combinations are encountered. To limit the size of the tables, multiple junk dimensions might be appropriate in other situations depending on the correlation between various indicators.
Junk dimensions are also appropriate for placing attributes like non-generic comments from the fact table. Such attributes might consist of data from an optional comment field when a customer places an order and as a result will probably be blank in many cases. Therefore the junk dimension should contain a single row representing the blanks as a surrogate key that will be used in the fact table for every row returned with a blank comment field
Instance
In a junk dimension, we combine these indicator fields into a single dimension. This way, we'll only need to build a single dimension table, and the number of fields in the fact table, as well as the size of the fact table, can be decreased. The content in the junk dimension table is the combination of all possible values of the individual indicator fields.
Let's look at an example. Assuming that we have the following fact table:
FACT_TABLE
CUSTOMER_ID |
PRODUCT_ID |
TXN_ID |
STORE_ID |
TXN_CODE |
COUPON_IND |
PREPAY_IND |
TXT_AMT |
In this example, TXN_CODE, COUPON_IND, and PREPAY_IND are all indicator fields. In this existing format, each one of them is a dimension. Using the junk dimension principle, we can combine them into a single junk dimension, resulting in the following fact table:
FACT_TABLE
CUSTOMER_ID |
PRODUCT_ID |
TXN_ID |
STORE_ID |
JUNK_ID |
TXT_AMT |
Note that now the number of dimensions in the fact table went from 7 to 5.
The content of the junk dimension table would look like the following:
In this case, we have 3 possible values for the TXN_CODE field, 2 possible values for the COUPON_IND field, and 2 possible values for the PREPAY_IND field. This results in a total of 3 x 2 x 2 = 12 rows for the junk dimension table.
By using a junk dimension to replace the 3 indicator fields, we have decreased the number of dimensions by 2 and also decreased the number of fields in the fact table by 2. This will result in a data warehousing environment that offer better performance as well as being easier to manage.
参考至:《Star Schema The Complete Reference》
http://www.1keydata.com/datawarehousing/junk-dimension.html
http://en.wikipedia.org/wiki/Dimension_%28data_warehouse%29
本文原创,转载请注明出处,作者
如有错误,欢迎指正
邮箱:czmcj@163.com
相关推荐
用junk替代rm功能,删除时不直接删除,而是将待删除内容移动到/trash目录中。要求:①junk默认只能删除文件不能删除目录,而且若无法删除时要给出相应信息;②提供-r选项,通过-r可以递归删除目录;③-f选项:直接...
shell编程,写的比较简单,用junk替代rm功能,删除时不直接删除,而是将待删除内容移动到/trash目录中
精品ppt模板PPT素材junkfood_for_thought029
Junk culture
Junk Box Arduino Ten Projects in Upcycled Electronics 英文无水印pdf pdf所有页面使用FoxitReader和PDF-XChangeViewer测试都可以打开 本资源转载自网络,如有侵权,请联系上传者或csdn删除 本资源转载自...
Junk
junkfood_for_thought
C++ Junk Code(花码) 生成器,方便Coder快捷生成Junk Code,本源码为E语言源码
Delphi Junk (1).zip
垃圾过滤掉例如.DS_Store和Thumbs.db安装 $ npm install junk用法 const { promises : fs } = require ( 'fs' ) ;const junk = require ( 'junk' ) ;( async ( ) => {const files = await fs . readdir ( 'some/path...
九年级英语作文垃圾食品JunkFood
just junk, wat else to say
this is a junk to activate my account
Labview code for white black image to colored
Cyberjunk2020 README.md Cyberjunk2020的主要仓库,即将到来的2D自上而下的Bullet Hell游戏从文件快速启动从GITHUB下载压缩的Cyberjunk2020文件。 解压缩文件文件夹以访问原始文件并设置游戏组件。 ...
垃圾文件
noob-junk.github.io
报告_JUNK Roundcube Webmail 插件 这个插件是致力于帮助全球打击垃圾邮件的结果。 该插件在 Roundcube Webmailer 中提供了一个新按钮,它将向等服务发布垃圾邮件的匿名标头信息 您还可以将其用作向本地脚本报告...
准垃圾 该库可以帮助您解决以下问题: ...com.flyingmachine.datomic-junk命名空间中的函数使您可以更简洁地表达简单查询。 例子: ; ; datomic - find the biebs ( q '[ :find '?c :where [['?c :person/name " J