The Hybrid Response
When requirements call for changed data to result in both type 1 and type 2 behavior, the solution is disarmingly simple: provide for both. This cannot be achieved with a single attribute, but it can be achieved by providing a pair of attributes. Each attribute has the same source, but one will react to changes as a type 1 response and the other as type 2. This solution is known as a type 1/2 hybrid. It’s a good idea to name these attributes clearly, since each serves a different function. Use the suffix _current to identify the type 1 attribute, and _historic to identify the type 2 attribute. When the source for the pair of attributes changes, the ETL process must do two things:
- Update the _current value for all records that share the same natural key.
- Insert a new row, initializing both _current and _historic values to the new value.
When reporting on the data, developers choose which column to use based on the requirements for the report.
- Use the _current column to group all facts under the current values.
- Use the _historic column to group all facts under the historic values.
A Type 1/2 Hybrid Response in Action
A hybrid design for handling changes in company names is illustrated in Below figure. The customer dimension in the figure records company names in two columns:
- company_name_current will capture the current name of the company. This attribute will exhibit type 1 behavior. When a company name changes, it will be updated.
- company_name_historic will capture the detailed history of company names. This attribute will exhibit type 2 behavior. When a company name changes, a new row will be created to capture the new value. The new row will be associated with any subsequent transactions added to a fact table.
Processing a Change
Suppose one of the companies in the customer table is Apple Computer. You may recall that in 2007 the company changed its name from “Apple Computer, Inc.” to “Apple Inc.” Imagine that products have been ordered by Apple, both before and after the name change. Below figure illustrates the two-step process that occurs when the name of the company changes.
The top of the figure shows the row for Apple that was in place prior to the name change. The natural key for Apple is BB770. It has one row in the customer table, with a surrogate key value of 1011. The column company_name_current shows the name of the company as “Apple Computer, Inc.” There have been no changes as of yet, so company_name_historic contains the same value. The lower half of Figure shows what happens when the name of the company changes:
- The first row shows the previously existing record, with surrogate key value 1011. The company_name_current column is updated with the new company name “Apple Inc.” This is the type 1 response. The company_name_historic value is untouched, since it is treated as type 2.
- To process the type 2 change, a new row is added. It contains the new name “Apple Inc.” in both the company_name_current and company_name_historic columns.
This new row is assigned a new surrogate key, 2822, which will be used for new rows in associated fact tables.
This may appear convoluted, but it is no different from processing any table that contains both type 1 and type 2 attributes. The only difference is that both attributes happen to have the same source.
Using the Company Name Columns
Facts can now be studied in two ways. When you want to be able to capture all orders under the new name (Apple Inc.), you can use the column company_name_current. This might be done to filter a query or to group aggregated facts. All activity with Apple is associated with the same name: Apple Inc. This is particularly useful for producing this year versus last year comparisons; all orders will have the same name, even if it recently changed. When you want to group orders under historically accurate company names, you can use the company_name_historic. Transactions that took place before Apple’s name change are grouped under “Apple Computer, Inc.” Transactions that took place after the name
change are grouped under “Apple Inc.”Each time a company changes,the same two-step process is followed. First, all existing rows with the same natural key have their _current value updated. Second, a new row is added, with _current and _historic initialized to the new value.
Suppose that Apple decides to change its name again. The iPod and iPhone have been
so successful they might rename the company “iApple Inc.” Below figure shows how this
change would be processed.
First, the existing records in the dimension table for Apple, which is company BB770, are updated with the new company_name_current. This time around, there are two records to be updated: those identified by surrogate keys 1011 and 2822. That takes care of the type 1 processing.
Second, a new row is added to the table for iApple. This row records the new name, iApple Inc.,in both the company_name_current and company_name_historic columns.
That takes care of the type 2 processing. This new row has surrogate key value 3100. Any new orders from iApple will use this key value in the fact table.
This process can be repeated as many times as the name of a given company changes.
It will always be possible to use the current name of the company to study all facts by using company_name_current. It will always be possible to study all facts with historically accurate company names by using company_name_historic.
Evaluating and Extending the Hybrid Approach
The hybrid approach addresses conflicting slow change requirements at the expense of
understandability. Hybrid solutions generate no end of confusion among end users and
analysts, who are often uncertain about which column to use in a given report. When the
wrong column is used for a situation, the results can be perplexing.
It is, therefore, important to use hybrid solutions judiciously. Evaluate the requirements carefully. In the case of the company name changes, for example, why do some people want access to the historically accurate name? If it is so they can reproduce a historic invoice, that purpose is better served by the operational system. On the other hand, if it is needed for certain forms of statutory reporting, the requirement may be valid.
When a hybrid approach is implemented, most analysis tends to focus on one of the columns. In the company example, the column of primary interest contains the current company name. To simplify the model exposed to end users, you can create a view of the customer dimension that hides the historic version. Developers trained in the complexities of the star can be provided with the full version of the table, for use in creating canned reports.
In extremely rare situations, you may find requirements that call for a type 1/2/3 hybrid. In this case, the attribute in question will be represented by three columns: the current value, the previous value, and the historic value. This situation will compound the confusion of users trying to choose which column to show in a report. Again, you will want to try to insulate end users from this complexity, exposing them only to the version required most often.
At this point, the reader may be thinking about other ways to combine these techniques. There is no need to describe a type 1/3 hybrid, because a type 3 change already includes a column that exhibits type 1 behavior. For the same reason, a type 2/3 hybrid is no different from the 1/2/3 hybrid described in the previous paragraph. It is possible to incorporate time stamps with any hybrid technique that involves a type 2 component. Remember that complexity increases the ETL workload and reduces usability; be sure that any solution is truly warranted by the requirements.
参考至:《Star Schema The Complete Reference》
本文原创,转载请注明出处、作者
如有错误,欢迎指正
邮箱:czmcj@163.com
相关推荐
Customer self service in the hybrid library 国外原版电子书,仅供学习使用!
The hybrid system composed of a Photovoltaic (PV) array and a Proton exchange membrane fuel cell (PEMFC) is considered. The PV array normally uses a maximum power point tracking (MPPT) technique to ...
The Theory of Hybrid Automata
The book takes us on a journey of architecting, building, and operating a hybrid cloud while taking a very pragmatic approach towards it. The book starts by defining the different demographics of the ...
In this paper, a mixed-model assembly line (MMAL) ... The computational results show that the proposed hybrid algorithm outperforms the existing genetic algorithms, significantly in large-sized problems.
Hybrid
Face recognition on large-scale video in the wild with hybrid
This paper proposes the hybrid NM-PSO algorithm based on the Nelder–Mead (NM) simplex search method and particle swarm optimization (PSO) for unconstrained optimization. NM-PSO is very easy to ...
The design and implementation of the hybrid control method for a three-pole active magnetic bearing (AMB) is proposed in this paper. The system is inherently nonlinear and conventional nonlinear con ...
Bridging the Archipelago between Row-Stores and Column-Stores for Hybrid Workloads论文中文翻译
By the end of the book, you will not only have built a professional, hybrid mobile application, but will also have ensured that your app is secure and performance driven. Style and approach A step-...
This study is aimed at eliminating the influence of the higher-order modes on the frequency response functions (FRFs) of non-proportionally viscously damped systems. Based on the Neumann expansion ...
ORNL最新针对2010款第三代普锐斯混合动力系统的拆解分析报告
Response Functions (FRFs) of the system. A novel strategy for the control of the Particle Swarm Optimization (PSO) parameters based on the Nelder–Mead algorithm (Simplex method) is presented; ...
The hybrid methods were first illustrated through four test functions from the RSM literature and were compared with original NM, GA and PSO algorithms. In each test scheme, the effectiveness, ...
HOMER (Hybrid Optimization of Multiple Electric Renewables), the micropower optimization model, simplifies the task of evaluating designs of both off-grid and grid-connected power systems for a ...
Ionic has evolved as the most popular choice for Hybrid Mobile App development as it tends to match the native experience and provides robust components/tools to build apps. The Ionic Complete ...
Barney walks you through the entire hybrid iPhone development process, from creating great user interfaces to compiling, deploying, and executing applications. Along the way, he introduces techniques...
In this study, we propose a hybrid variable selection strategy based on the continuous shrinkage of variable space which is the core idea of variable combination population analysis (VCPA). The VCPA-...
This study presents a hybrid AI (artificial intelligence) approach to the implementation of trading strategies in the S&P 500 stock index futures market. The hybrid AI approach integrates the rule-...