The Hybrid Response(原创)

简介:

The Hybrid Response
When requirements call for changed data to result in both type 1 and type 2 behavior, the solution is disarmingly simple: provide for both. This cannot be achieved with a single attribute, but it can be achieved by providing a pair of attributes. Each attribute has the same source, but one will react to changes as a type 1 response and the other as type 2. This solution is known as a type 1/2 hybrid. It’s a good idea to name these attributes clearly, since each serves a different function. Use the suffix _current to identify the type 1 attribute, and _historic to identify the type 2 attribute. When the source for the pair of attributes changes, the ETL process must do two things:

  • Update the _current value for all records that share the same natural key.
  • Insert a new row, initializing both _current and _historic values to the new value.

When reporting on the data, developers choose which column to use based on the requirements for the report.

  • Use the _current column to group all facts under the current values.
  • Use the _historic column to group all facts under the historic values.

A Type 1/2 Hybrid Response in Action
A hybrid design for handling changes in company names is illustrated in Below figure. The customer dimension in the figure records company names in two columns:

  • company_name_current will capture the current name of the company. This attribute will exhibit type 1 behavior. When a company name changes, it will be updated.
  • company_name_historic will capture the detailed history of company names. This attribute will exhibit type 2 behavior. When a company name changes, a new row will be created to capture the new value. The new row will be associated with any subsequent transactions added to a fact table.

Processing a Change 
Suppose one of the companies in the customer table is Apple Computer. You may recall that in 2007 the company changed its name from “Apple Computer, Inc.” to “Apple Inc.” Imagine that products have been ordered by Apple, both before and after the name change. Below figure illustrates the two-step process that occurs when the name of the company changes.

The top of the figure shows the row for Apple that was in place prior to the name change. The natural key for Apple is BB770. It has one row in the customer table, with a surrogate key value of 1011. The column company_name_current shows the name of the company as “Apple Computer, Inc.” There have been no changes as of yet, so company_name_historic contains the same value. The lower half of Figure shows what happens when the name of the company changes:

  1. The first row shows the previously existing record, with surrogate key value 1011. The company_name_current column is updated with the new company name “Apple Inc.” This is the type 1 response. The company_name_historic value is untouched, since it is treated as type 2.
  2. To process the type 2 change, a new row is added. It contains the new name “Apple Inc.” in both the company_name_current and company_name_historic columns.

This new row is assigned a new surrogate key, 2822, which will be used for new rows in associated fact tables.

This may appear convoluted, but it is no different from processing any table that contains both type 1 and type 2 attributes. The only difference is that both attributes happen to have the same source.

Using the Company Name Columns 
Facts can now be studied in two ways. When you want to be able to capture all orders under the new name (Apple Inc.), you can use the column company_name_current. This might be done to filter a query or to group aggregated facts. All activity with Apple is associated with the same name: Apple Inc. This is particularly useful for producing this year versus last year comparisons; all orders will have the same name, even if it recently changed. When you want to group orders under historically accurate company names, you can use the company_name_historic. Transactions that took place before Apple’s name change are grouped under “Apple Computer, Inc.” Transactions that took place after the name
change are grouped under “Apple Inc.”Each time a company changes,the same two-step process is followed. First, all existing rows with the same natural key have their _current value updated. Second, a new row is added, with _current and _historic initialized to the new value.
Suppose that Apple decides to change its name again. The iPod and iPhone have been
so successful they might rename the company “iApple Inc.” Below figure shows how this
change would be processed. 

First, the existing records in the dimension table for Apple, which is company BB770, are updated with the new company_name_current. This time around, there are two records to be updated: those identified by surrogate keys 1011 and 2822. That takes care of the type 1 processing. 
Second, a new row is added to the table for iApple. This row records the new name, iApple Inc.,in both the company_name_current and company_name_historic columns.
That takes care of the type 2 processing. This new row has surrogate key value 3100. Any new orders from iApple will use this key value in the fact table.
This process can be repeated as many times as the name of a given company changes.
It will always be possible to use the current name of the company to study all facts by using company_name_current. It will always be possible to study all facts with historically accurate company names by using company_name_historic. 
Evaluating and Extending the Hybrid Approach
The hybrid approach addresses conflicting slow change requirements at the expense of
understandability. Hybrid solutions generate no end of confusion among end users and
analysts, who are often uncertain about which column to use in a given report. When the
wrong column is used for a situation, the results can be perplexing.
It is, therefore, important to use hybrid solutions judiciously. Evaluate the requirements carefully. In the case of the company name changes, for example, why do some people want access to the historically accurate name? If it is so they can reproduce a historic invoice, that purpose is better served by the operational system. On the other hand, if it is needed for certain forms of statutory reporting, the requirement may be valid.
When a hybrid approach is implemented, most analysis tends to focus on one of the columns. In the company example, the column of primary interest contains the current company name. To simplify the model exposed to end users, you can create a view of the customer dimension that hides the historic version. Developers trained in the complexities of the star can be provided with the full version of the table, for use in creating canned reports. 
In extremely rare situations, you may find requirements that call for a type 1/2/3 hybrid. In this case, the attribute in question will be represented by three columns: the current value, the previous value, and the historic value. This situation will compound the confusion of users trying to choose which column to show in a report. Again, you will want to try to insulate end users from this complexity, exposing them only to the version required most often. 
At this point, the reader may be thinking about other ways to combine these techniques. There is no need to describe a type 1/3 hybrid, because a type 3 change already includes a column that exhibits type 1 behavior. For the same reason, a type 2/3 hybrid is no different from the 1/2/3 hybrid described in the previous paragraph. It is possible to incorporate time stamps with any hybrid technique that involves a type 2 component. Remember that complexity increases the ETL workload and reduces usability; be sure that any solution is truly warranted by the requirements.

 

参考至:《Star Schema The Complete Reference》

本文原创,转载请注明出处、作者

如有错误,欢迎指正

邮箱:czmcj@163.com

作者:czmmiao  文章出处:http://czmmiao.iteye.com/blog/2214552
相关文章
艾伟:[原创]谈谈WCF中的Data Contract(4):WCF Data Contract Versioning
软件工程是一门独特的工程艺术,需要解决的是不断改变的需求变化。而对于WCF,对于SOA,由于涉及的是对多个系统之间的交互问题,如何有效地解决不断改变的需求所带来的问题就显得更为重要:Service端版本的变化能否保持现有Consumer的正常调用,Consumer端的改变不至于影响对Service 的正常调用。
865 0
|
.NET
艾伟:[原创]谈谈WCF中的Data Contract(2):WCF Data Contract对Generic的支持
通过第一部分的介绍,我们可以体会到,WCF 的Data Contract在CLR Type和Neutral Contract之间搭建了一座桥梁,弥合了.NET世界和厂商中立世界的差异。通过WCF Data Contract我们将CLR Data Type暴露成一个厂商中立的数据结构的描述,同样通过WCF Data Contract我们将一个现有的CLR Data Type和既定的Neutral contract进行适配。
701 0
艾伟:[原创]谈谈WCF中的Data Contract(3):WCF Data Contract对Collection & Dictionary的支持
在本篇文章上一部分Order Processing的例子中,我们看到原本已Collection形式定义的DetailList属性(public IList DetailList),在Data Contract中却以Array的方式体现(public OrderDetail[] DetailList)。
837 0
|
SQL 关系型数据库 网络架构
kbmmw 5.03 发布
版本一小数,功能一大步 We are happy to announce v5.03 of our popular middleware for Delphi and C++Builder. If you like kbmMW, please let others know! Share th...
1181 0