离线数仓(八)【DWD 层开发】(2)https://developer.aliyun.com/article/1532415
1.5、交易域退单事务事实表
我们这里只考虑申请退单,不考虑申请退单后卖家怎么处理,退单状态怎么变化。
1.5.1、设计表结构
- 选择业务过程:退单
- 声明粒度:谁+什么时候+退了哪件商品
- 确认维度:时间、用户、商品、退单类型、退单原因类型
- 确认事实:退单件数、退单金额
1.5.2、建表语句
DROP TABLE IF EXISTS dwd_trade_order_refund_inc; CREATE EXTERNAL TABLE dwd_trade_order_refund_inc ( `id` STRING COMMENT '编号', `user_id` STRING COMMENT '用户ID', `order_id` STRING COMMENT '订单ID', `sku_id` STRING COMMENT '商品ID', `province_id` STRING COMMENT '地区ID', `date_id` STRING COMMENT '日期ID', `create_time` STRING COMMENT '退单时间', `refund_type_code` STRING COMMENT '退单类型编码', `refund_type_name` STRING COMMENT '退单类型名称', `refund_reason_type_code` STRING COMMENT '退单原因类型编码', `refund_reason_type_name` STRING COMMENT '退单原因类型名称', `refund_reason_txt` STRING COMMENT '退单原因描述', `refund_num` BIGINT COMMENT '退单件数', `refund_amount` DECIMAL(16, 2) COMMENT '退单金额' ) COMMENT '交易域退单事务事实表' PARTITIONED BY (`dt` STRING) STORED AS ORC LOCATION '/warehouse/gmall/dwd/dwd_trade_order_refund_inc/' TBLPROPERTIES ("orc.compress" = "snappy");
1.5.3、数据流向
同样需要分析退单这个业务过程会对哪些表产生影响:order_info 的 order_status 字段会发生变化(变为 1005 (退款中)),order_refund_info 表中会插入一条数据。
1.5.4、首日装载
insert overwrite table dwd_trade_order_refund_inc partition(dt) select ri.id, user_id, order_id, sku_id, province_id, date_format(create_time,'yyyy-MM-dd') date_id, create_time, refund_type, type_dic.dic_name, refund_reason_type, reason_dic.dic_name, refund_reason_txt, refund_num, refund_amount, date_format(create_time,'yyyy-MM-dd') from ( select data.id, data.user_id, data.order_id, data.sku_id, data.refund_type, data.refund_num, data.refund_amount, data.refund_reason_type, data.refund_reason_txt, data.create_time from ods_order_refund_info_inc where dt='2020-06-14' and type='bootstrap-insert' )ri left join ( select data.id, data.province_id from ods_order_info_inc --为了拿到 province_id where dt='2020-06-14' and type='bootstrap-insert' )oi on ri.order_id=oi.id left join ( select dic_code, dic_name from ods_base_dic_full where dt='2020-06-14' and parent_code = '15' )type_dic on ri.refund_type=type_dic.dic_code left join ( select dic_code, dic_name from ods_base_dic_full where dt='2020-06-14' and parent_code = '13' )reason_dic on ri.refund_reason_type=reason_dic.dic_code;
1.5.5、每日装载语句
insert overwrite table dwd_trade_order_refund_inc partition(dt='2020-06-15') select ri.id, user_id, order_id, sku_id, province_id, date_format(create_time,'yyyy-MM-dd') date_id, create_time, refund_type, type_dic.dic_name, refund_reason_type, reason_dic.dic_name, refund_reason_txt, refund_num, refund_amount from ( select data.id, data.user_id, data.order_id, data.sku_id, data.refund_type, data.refund_num, data.refund_amount, data.refund_reason_type, data.refund_reason_txt, data.create_time from ods_order_refund_info_inc where dt='2020-06-15' and type='insert' )ri left join ( select data.id, data.province_id from ods_order_info_inc where dt='2020-06-15' and type='update' and array_contains(map_keys(old),'old_status') and order_status='1005' )oi on ri.order_id=oi.id left join ( select dic_code, dic_name from ods_base_dic_full where dt='2020-06-15' and parent_code = '15' )type_dic on ri.refund_type=type_dic.dic_code left join ( select dic_code, dic_name from ods_base_dic_full where dt='2020-06-15' and parent_code = '13' )reason_dic on ri.refund_reason_type=reason_dic.dic_code;
1.6、交易域退款成功事务事实表
1.6.1、设计表结构
- 选择业务过程:退款成功
- 声明粒度:谁+什么时候+哪件商品退款成功
- 确认维度:用户、地区、时间、商品、退款方式
- 确认事实:退款件数、退款金额
1.6.2、建表语句
DROP TABLE IF EXISTS dwd_trade_refund_pay_suc_inc; CREATE EXTERNAL TABLE dwd_trade_refund_pay_suc_inc ( `id` STRING COMMENT '编号', `user_id` STRING COMMENT '用户ID', `order_id` STRING COMMENT '订单编号', `sku_id` STRING COMMENT 'SKU编号', `province_id` STRING COMMENT '地区ID', `payment_type_code` STRING COMMENT '支付类型编码', `payment_type_name` STRING COMMENT '支付类型名称', `date_id` STRING COMMENT '日期ID', `callback_time` STRING COMMENT '支付成功时间', `refund_num` DECIMAL(16, 2) COMMENT '退款件数', `refund_amount` DECIMAL(16, 2) COMMENT '退款金额' ) COMMENT '交易域提交退款成功事务事实表' PARTITIONED BY (`dt` STRING) STORED AS ORC LOCATION '/warehouse/gmall/dwd/dwd_trade_refund_pay_suc_inc/' TBLPROPERTIES ("orc.compress" = "snappy");
1.6.3、数据流向
用户退款会对哪些表产生影响:order_info 的 order_status 字段会发生变化(1006),refund_payment 的 redund_status 也会发生变化。
和上面的退单一样,我们都需要精确到商品,因为退单可能是退订单中的一件或多件商品。
1.6.4、首日装载
insert overwrite table dwd_trade_refund_pay_suc_inc partition(dt) select rp.id, user_id, rp.order_id, rp.sku_id, province_id, payment_type, dic_name, date_format(callback_time,'yyyy-MM-dd') date_id, callback_time, refund_num, total_amount, date_format(callback_time,'yyyy-MM-dd') from ( select data.id, data.order_id, data.sku_id, data.payment_type, data.callback_time, data.total_amount from ods_refund_payment_inc where dt='2020-06-14' and type = 'bootstrap-insert' and data.refund_status='1602' )rp left join ( select data.id, data.user_id, data.province_id from ods_order_info_inc where dt='2020-06-14' and type='bootstrap-insert' )oi on rp.order_id=oi.id left join ( select data.order_id, data.sku_id, data.refund_num from ods_order_refund_info_inc where dt='2020-06-14' and type='bootstrap-insert' )ri on rp.order_id=ri.order_id and rp.sku_id=ri.sku_id --必须保证同一订单同一商品 left join ( select dic_code, dic_name from ods_base_dic_full where dt='2020-06-14' and parent_code='11' )dic on rp.payment_type=dic.dic_code;
1.6.5、每日装载
insert overwrite table dwd_trade_refund_pay_suc_inc partition(dt='2020-06-15') select rp.id, user_id, rp.order_id, rp.sku_id, province_id, payment_type, dic_name, date_format(callback_time,'yyyy-MM-dd') date_id, callback_time, refund_num, total_amount from ( select data.id, data.order_id, data.sku_id, data.payment_type, data.callback_time, data.total_amount from ods_refund_payment_inc where dt='2020-06-15' and type = 'update' and array_contains(map_keys(old),'refund_status') and data.refund_status='1602' )rp left join ( select data.id, data.user_id, data.province_id from ods_order_info_inc where dt='2020-06-15' and type='update' and array_contains(map_keys(old),'order_status') and data.order_status='1006' )oi on rp.order_id=oi.id left join ( select data.order_id, data.sku_id, data.refund_num from ods_order_refund_info_inc where dt='2020-06-15' and type='update' and array_contains(map_keys(old),'refund_status') and data.refund_status='0705' )ri on rp.order_id=ri.order_id and rp.sku_id=ri.sku_id left join ( select dic_code, dic_name from ods_base_dic_full where dt='2020-06-15' and parent_code='11' )dic on rp.payment_type=dic.dic_code;
1.7、交易域购物车周期快照表
所谓周期快照表,区别于我们前面的事务事实表,周期快照表是全量表,一般用来解决存量型指标(比如库存、余额等,这是事务事实表的不足)。
1.7.1、设计表结构
对于事务型事实表,通常都是一张事实表对应一个业务过程;对于累积快照事实表,一张表对应多个业务过程。对于这里的周期快照表,并没有讨论的意义。因为无法确定它对应几个业务过程。
DROP TABLE IF EXISTS dwd_trade_cart_full; CREATE EXTERNAL TABLE dwd_trade_cart_full ( `id` STRING COMMENT '编号', `user_id` STRING COMMENT '用户id', `sku_id` STRING COMMENT '商品id', `sku_name` STRING COMMENT '商品名称', `sku_num` BIGINT COMMENT '加购物车件数' ) COMMENT '交易域购物车周期快照事实表' PARTITIONED BY (`dt` STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS ORC LOCATION '/warehouse/gmall/dwd/dwd_trade_cart_full/' TBLPROPERTIES ('orc.compress' = 'snappy');DROP TABLE IF EXISTS dwd_trade_cart_full; CREATE EXTERNAL TABLE dwd_trade_cart_full ( `id` STRING COMMENT '编号', `user_id` STRING COMMENT '用户id', `sku_id` STRING COMMENT '商品id', `sku_name` STRING COMMENT '商品名称', `sku_num` BIGINT COMMENT '加购物车件数' ) COMMENT '交易域购物车周期快照事实表' PARTITIONED BY (`dt` STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS ORC LOCATION '/warehouse/gmall/dwd/dwd_trade_cart_full/' TBLPROPERTIES ('orc.compress' = 'snappy');
1.7.2、装载语句
对于购物车表,我们在 ODS 层建了两张表(增量和全量),这里我们选择全量表作为数据来源:
insert overwrite table dwd_trade_cart_full partition(dt='2020-06-14') select id, user_id, sku_id, sku_name, sku_num from ods_cart_info_full where dt='2020-06-14' and is_ordered='0';
这里的 is_ordered 是一个删除标记,当它的值为 1 的时候代表用户已下单,也就意味着从购物车删除了该商品。所以我们每日装载的时候需要注意筛选出 is_ordered = 0 的数据。
1.8、工具域优惠券领取事务事实表
有关优惠券的业务过程:领券、使用券下单、使用券支付。它们都会对表 coupon_use 产生影响:领券后表 coupon_use 会插入一条新的记录;使用券下单时会对字段 order_id、coupon_status、using_time 都会发生变化;使用券支付时会对字段 coupon_status、used_time 发生变化。
1.8.1、建表语句
需要注意,对于优惠券事实表,它并没有一个明显的度量值。虽然我们说事实表是由维度外键和度量值组成的,但规矩也不是死的,这里的度量值是隐含的,也就是一行代表一个优惠券的记录。
DROP TABLE IF EXISTS dwd_tool_coupon_get_inc; CREATE EXTERNAL TABLE dwd_tool_coupon_get_inc ( `id` STRING COMMENT '编号', `coupon_id` STRING COMMENT '优惠券ID', `user_id` STRING COMMENT 'userid', `date_id` STRING COMMENT '日期ID', `get_time` STRING COMMENT '领取时间' ) COMMENT '优惠券领取事务事实表' PARTITIONED BY (`dt` STRING) STORED AS ORC LOCATION '/warehouse/gmall/dwd/dwd_tool_coupon_get_inc/' TBLPROPERTIES ("orc.compress" = "snappy");
1.8.2、首日装载
insert overwrite table dwd_tool_coupon_get_inc partition(dt) select data.id, data.coupon_id, data.user_id, date_format(data.get_time,'yyyy-MM-dd') date_id, data.get_time, date_format(data.get_time,'yyyy-MM-dd') from ods_coupon_use_inc where dt='2020-06-14' and type='bootstrap-insert';
1.8.3、每日装载
只要是 insert 就一定是领券操作,只要是 update 就是使用券操作。
insert overwrite table dwd_tool_coupon_get_inc partition (dt='2020-06-15') select data.id, data.coupon_id, data.user_id, date_format(data.get_time,'yyyy-MM-dd') date_id, data.get_time from ods_coupon_use_inc where dt='2020-06-15' and type='insert';
1.9、工具域优惠券使用(下单)事务事实表
1.9.1、建表语句
DROP TABLE IF EXISTS dwd_tool_coupon_order_inc; CREATE EXTERNAL TABLE dwd_tool_coupon_order_inc ( `id` STRING COMMENT '编号', `coupon_id` STRING COMMENT '优惠券ID', `user_id` STRING COMMENT 'user_id', `order_id` STRING COMMENT 'order_id', `date_id` STRING COMMENT '日期ID', `order_time` STRING COMMENT '使用下单时间' ) COMMENT '优惠券使用下单事务事实表' PARTITIONED BY (`dt` STRING) STORED AS ORC LOCATION '/warehouse/gmall/dwd/dwd_tool_coupon_order_inc/' TBLPROPERTIES ("orc.compress" = "snappy");
1.9.2、首日装载
根据 used_time 是否为空或者 coupon_status 是否等于 1402 都是可以的。
insert overwrite table dwd_tool_coupon_order_inc partition(dt) select data.id, data.coupon_id, data.user_id, data.order_id, date_format(data.using_time,'yyyy-MM-dd') date_id, data.using_time, date_format(data.using_time,'yyyy-MM-dd') from ods_coupon_use_inc where dt='2020-06-14' and type='bootstrap-insert' and data.using_time is not null;
1.9.3、每日装载
insert overwrite table dwd_tool_coupon_order_inc partition(dt='2020-06-15') select data.id, data.coupon_id, data.user_id, data.order_id, date_format(data.using_time,'yyyy-MM-dd') date_id, data.using_time from ods_coupon_use_inc where dt='2020-06-15' and type='update' and array_contains(map_keys(old),'using_time');
1.10、工具域优惠券使用(支付)事务事实表
1.10.1、建表语句
DROP TABLE IF EXISTS dwd_tool_coupon_pay_inc; CREATE EXTERNAL TABLE dwd_tool_coupon_pay_inc ( `id` STRING COMMENT '编号', `coupon_id` STRING COMMENT '优惠券ID', `user_id` STRING COMMENT 'user_id', `order_id` STRING COMMENT 'order_id', `date_id` STRING COMMENT '日期ID', `payment_time` STRING COMMENT '使用下单时间' ) COMMENT '优惠券使用支付事务事实表' PARTITIONED BY (`dt` STRING) STORED AS ORC LOCATION '/warehouse/gmall/dwd/dwd_tool_coupon_pay_inc/' TBLPROPERTIES ("orc.compress" = "snappy");
1.10.2、首日装载
根据 used_time 是否为空或者 coupon_status 是否等于 1403 都是可以的。
insert overwrite table dwd_tool_coupon_pay_inc partition(dt) select data.id, data.coupon_id, data.user_id, data.order_id, date_format(data.used_time,'yyyy-MM-dd') date_id, data.used_time, date_format(data.used_time,'yyyy-MM-dd') from ods_coupon_use_inc where dt='2020-06-14' and type='bootstrap-insert' and data.used_time is not null;
1.10.3、每日装载
insert overwrite table dwd_tool_coupon_pay_inc partition(dt='2020-06-15') select data.id, data.coupon_id, data.user_id, data.order_id, date_format(data.used_time,'yyyy-MM-dd') date_id, data.used_time from ods_coupon_use_inc where dt='2020-06-15' and type='update' and array_contains(map_keys(old),'used_time');
离线数仓(八)【DWD 层开发】(4)https://developer.aliyun.com/article/1532419