The Hybrid Response(原创)

简介:

The Hybrid Response
When requirements call for changed data to result in both type 1 and type 2 behavior, the solution is disarmingly simple: provide for both. This cannot be achieved with a single attribute, but it can be achieved by providing a pair of attributes. Each attribute has the same source, but one will react to changes as a type 1 response and the other as type 2. This solution is known as a type 1/2 hybrid. It’s a good idea to name these attributes clearly, since each serves a different function. Use the suffix _current to identify the type 1 attribute, and _historic to identify the type 2 attribute. When the source for the pair of attributes changes, the ETL process must do two things:

  • Update the _current value for all records that share the same natural key.
  • Insert a new row, initializing both _current and _historic values to the new value.

When reporting on the data, developers choose which column to use based on the requirements for the report.

  • Use the _current column to group all facts under the current values.
  • Use the _historic column to group all facts under the historic values.

A Type 1/2 Hybrid Response in Action
A hybrid design for handling changes in company names is illustrated in Below figure. The customer dimension in the figure records company names in two columns:

  • company_name_current will capture the current name of the company. This attribute will exhibit type 1 behavior. When a company name changes, it will be updated.
  • company_name_historic will capture the detailed history of company names. This attribute will exhibit type 2 behavior. When a company name changes, a new row will be created to capture the new value. The new row will be associated with any subsequent transactions added to a fact table.

Processing a Change 
Suppose one of the companies in the customer table is Apple Computer. You may recall that in 2007 the company changed its name from “Apple Computer, Inc.” to “Apple Inc.” Imagine that products have been ordered by Apple, both before and after the name change. Below figure illustrates the two-step process that occurs when the name of the company changes.

The top of the figure shows the row for Apple that was in place prior to the name change. The natural key for Apple is BB770. It has one row in the customer table, with a surrogate key value of 1011. The column company_name_current shows the name of the company as “Apple Computer, Inc.” There have been no changes as of yet, so company_name_historic contains the same value. The lower half of Figure shows what happens when the name of the company changes:

  1. The first row shows the previously existing record, with surrogate key value 1011. The company_name_current column is updated with the new company name “Apple Inc.” This is the type 1 response. The company_name_historic value is untouched, since it is treated as type 2.
  2. To process the type 2 change, a new row is added. It contains the new name “Apple Inc.” in both the company_name_current and company_name_historic columns.

This new row is assigned a new surrogate key, 2822, which will be used for new rows in associated fact tables.

This may appear convoluted, but it is no different from processing any table that contains both type 1 and type 2 attributes. The only difference is that both attributes happen to have the same source.

Using the Company Name Columns 
Facts can now be studied in two ways. When you want to be able to capture all orders under the new name (Apple Inc.), you can use the column company_name_current. This might be done to filter a query or to group aggregated facts. All activity with Apple is associated with the same name: Apple Inc. This is particularly useful for producing this year versus last year comparisons; all orders will have the same name, even if it recently changed. When you want to group orders under historically accurate company names, you can use the company_name_historic. Transactions that took place before Apple’s name change are grouped under “Apple Computer, Inc.” Transactions that took place after the name
change are grouped under “Apple Inc.”Each time a company changes,the same two-step process is followed. First, all existing rows with the same natural key have their _current value updated. Second, a new row is added, with _current and _historic initialized to the new value.
Suppose that Apple decides to change its name again. The iPod and iPhone have been
so successful they might rename the company “iApple Inc.” Below figure shows how this
change would be processed. 

First, the existing records in the dimension table for Apple, which is company BB770, are updated with the new company_name_current. This time around, there are two records to be updated: those identified by surrogate keys 1011 and 2822. That takes care of the type 1 processing. 
Second, a new row is added to the table for iApple. This row records the new name, iApple Inc.,in both the company_name_current and company_name_historic columns.
That takes care of the type 2 processing. This new row has surrogate key value 3100. Any new orders from iApple will use this key value in the fact table.
This process can be repeated as many times as the name of a given company changes.
It will always be possible to use the current name of the company to study all facts by using company_name_current. It will always be possible to study all facts with historically accurate company names by using company_name_historic. 
Evaluating and Extending the Hybrid Approach
The hybrid approach addresses conflicting slow change requirements at the expense of
understandability. Hybrid solutions generate no end of confusion among end users and
analysts, who are often uncertain about which column to use in a given report. When the
wrong column is used for a situation, the results can be perplexing.
It is, therefore, important to use hybrid solutions judiciously. Evaluate the requirements carefully. In the case of the company name changes, for example, why do some people want access to the historically accurate name? If it is so they can reproduce a historic invoice, that purpose is better served by the operational system. On the other hand, if it is needed for certain forms of statutory reporting, the requirement may be valid.
When a hybrid approach is implemented, most analysis tends to focus on one of the columns. In the company example, the column of primary interest contains the current company name. To simplify the model exposed to end users, you can create a view of the customer dimension that hides the historic version. Developers trained in the complexities of the star can be provided with the full version of the table, for use in creating canned reports. 
In extremely rare situations, you may find requirements that call for a type 1/2/3 hybrid. In this case, the attribute in question will be represented by three columns: the current value, the previous value, and the historic value. This situation will compound the confusion of users trying to choose which column to show in a report. Again, you will want to try to insulate end users from this complexity, exposing them only to the version required most often. 
At this point, the reader may be thinking about other ways to combine these techniques. There is no need to describe a type 1/3 hybrid, because a type 3 change already includes a column that exhibits type 1 behavior. For the same reason, a type 2/3 hybrid is no different from the 1/2/3 hybrid described in the previous paragraph. It is possible to incorporate time stamps with any hybrid technique that involves a type 2 component. Remember that complexity increases the ETL workload and reduces usability; be sure that any solution is truly warranted by the requirements.

 

参考至:《Star Schema The Complete Reference》

本文原创,转载请注明出处、作者

如有错误,欢迎指正

邮箱:czmcj@163.com

作者:czmmiao  文章出处:http://czmmiao.iteye.com/blog/2214552
相关文章
|
15天前
|
存储 关系型数据库 分布式数据库
PostgreSQL 18 发布,快来 PolarDB 尝鲜!
PostgreSQL 18 发布,PolarDB for PostgreSQL 全面兼容。新版本支持异步I/O、UUIDv7、虚拟生成列、逻辑复制增强及OAuth认证,显著提升性能与安全。PolarDB-PG 18 支持存算分离架构,融合海量弹性存储与极致计算性能,搭配丰富插件生态,为企业提供高效、稳定、灵活的云数据库解决方案,助力企业数字化转型如虎添翼!
|
10天前
|
缓存 并行计算 PyTorch
144_推理时延优化:Profiling与瓶颈分析 - 使用PyTorch Profiler诊断推理延迟,优化矩阵运算的独特瓶颈
在2025年的大模型时代,推理时延优化已经成为部署LLM服务的关键挑战之一。随着模型规模的不断扩大(从数亿参数到数千亿甚至万亿参数),即使在最先进的硬件上,推理延迟也常常成为用户体验和系统吞吐量的主要瓶颈。
349 147
|
10天前
|
机器学习/深度学习 存储 缓存
92_自我反思提示:输出迭代优化
在大型语言模型(LLM)应用日益普及的今天,如何持续提升模型输出质量成为了业界关注的核心问题。传统的提示工程方法往往依赖一次性输入输出,难以应对复杂任务中的多轮优化需求。2025年,自我反思提示技术(Self-Reflection Prompting)作为提示工程的前沿方向,正在改变我们与LLM交互的方式。这项技术通过模拟人类的自我反思认知过程,让模型能够对自身输出进行评估、反馈和优化,从而实现输出质量的持续提升。
400 136
|
4天前
|
人工智能 移动开发 自然语言处理
阿里云百炼产品月刊【2025年9月】
本月通义千问模型大升级,新增多模态、语音、视频生成等高性能模型,支持图文理解、端到端视频生成。官网改版上线全新体验中心,推出高代码应用与智能体多模态知识融合,RAG能力增强,助力企业高效部署AI应用。
255 1
|
14天前
|
存储 人工智能 搜索推荐
终身学习型智能体
当前人工智能前沿研究的一个重要方向:构建能够自主学习、调用工具、积累经验的小型智能体(Agent)。 我们可以称这种系统为“终身学习型智能体”或“自适应认知代理”。它的设计理念就是: 不靠庞大的内置知识取胜,而是依靠高效的推理能力 + 动态获取知识的能力 + 经验积累机制。
405 135
|
14天前
|
存储 人工智能 Java
AI 超级智能体全栈项目阶段二:Prompt 优化技巧与学术分析 AI 应用开发实现上下文联系多轮对话
本文讲解 Prompt 基本概念与 10 个优化技巧,结合学术分析 AI 应用的需求分析、设计方案,介绍 Spring AI 中 ChatClient 及 Advisors 的使用。
531 133
AI 超级智能体全栈项目阶段二:Prompt 优化技巧与学术分析 AI 应用开发实现上下文联系多轮对话
|
14天前
|
人工智能 Java API
AI 超级智能体全栈项目阶段一:AI大模型概述、选型、项目初始化以及基于阿里云灵积模型 Qwen-Plus实现模型接入四种方式(SDK/HTTP/SpringAI/langchain4j)
本文介绍AI大模型的核心概念、分类及开发者学习路径,重点讲解如何选择与接入大模型。项目基于Spring Boot,使用阿里云灵积模型(Qwen-Plus),对比SDK、HTTP、Spring AI和LangChain4j四种接入方式,助力开发者高效构建AI应用。
545 122
AI 超级智能体全栈项目阶段一:AI大模型概述、选型、项目初始化以及基于阿里云灵积模型 Qwen-Plus实现模型接入四种方式(SDK/HTTP/SpringAI/langchain4j)