离线数仓(九)【DWS 层开发】(3)https://developer.aliyun.com/article/1532437
1.1.7、交易域用户粒度退单最近1日汇总表
1)建表语句
DROP TABLE IF EXISTS dws_trade_user_order_refund_1d; CREATE EXTERNAL TABLE dws_trade_user_order_refund_1d ( `user_id` STRING COMMENT '用户id', `order_refund_count_1d` BIGINT COMMENT '最近1日退单次数', `order_refund_num_1d` BIGINT COMMENT '最近1日退单商品件数', `order_refund_amount_1d` DECIMAL(16, 2) COMMENT '最近1日退单金额' ) COMMENT '交易域用户粒度退单最近1日汇总事实表' PARTITIONED BY (`dt` STRING) STORED AS ORC LOCATION '/warehouse/gmall/dws/dws_trade_user_order_refund_1d' TBLPROPERTIES ('orc.compress' = 'snappy');
2)数据装载(首日/每日)
set hive.exec.dynamic.partition.mode=nonstrict; insert overwrite table dws_trade_user_order_refund_1d partition(dt) select user_id, count(*) order_refund_count, sum(refund_num) order_refund_num, sum(refund_amount) order_refund_amount, dt from dwd_trade_order_refund_inc group by user_id,dt;
insert overwrite table dws_trade_user_order_refund_1d partition(dt='2020-06-15') select user_id, count(*), sum(refund_num), sum(refund_amount) from dwd_trade_order_refund_inc where dt='2020-06-15' group by user_id;
1.1.8、流量域会话粒度页面浏览最近1日汇总表
1)建表语句
同样这里除了会话 id 还做了一些维度退化:
DROP TABLE IF EXISTS dws_traffic_session_page_view_1d; CREATE EXTERNAL TABLE dws_traffic_session_page_view_1d ( `session_id` STRING COMMENT '会话id', `mid_id` string comment '设备id', `brand` string comment '手机品牌', `model` string comment '手机型号', `operate_system` string comment '操作系统', `version_code` string comment 'app版本号', `channel` string comment '渠道', `during_time_1d` BIGINT COMMENT '最近1日访问时长', `page_count_1d` BIGINT COMMENT '最近1日访问页面数' ) COMMENT '流量域会话粒度页面浏览最近1日汇总表' PARTITIONED BY (`dt` STRING) STORED AS ORC LOCATION '/warehouse/gmall/dws/dws_traffic_session_page_view_1d' TBLPROPERTIES ('orc.compress' = 'snappy');
2)数据装载
没有首日每日的区分,因为流量域的数据都来自日志,而日志没有历史数据。
insert overwrite table dws_traffic_session_page_view_1d partition(dt='2020-06-14') select session_id, mid_id, brand, model, operate_system, version_code, channel, sum(during_time), count(*) from dwd_traffic_page_view_inc where dt='2020-06-14' group by session_id,mid_id,brand,model,operate_system,version_code,channel;
1.1.9、流量域访客页面粒度页面浏览最近1日汇总表
1)建表语句
DROP TABLE IF EXISTS dws_traffic_page_visitor_page_view_1d; CREATE EXTERNAL TABLE dws_traffic_page_visitor_page_view_1d ( `mid_id` STRING COMMENT '访客id', `brand` string comment '手机品牌', `model` string comment '手机型号', `operate_system` string comment '操作系统', `page_id` STRING COMMENT '页面id', `during_time_1d` BIGINT COMMENT '最近1日浏览时长', `view_count_1d` BIGINT COMMENT '最近1日访问次数' ) COMMENT '流量域访客页面粒度页面浏览最近1日汇总事实表' PARTITIONED BY (`dt` STRING) STORED AS ORC LOCATION '/warehouse/gmall/dws/dws_traffic_page_visitor_page_view_1d' TBLPROPERTIES ('orc.compress' = 'snappy');
2)数据装载
insert overwrite table dws_traffic_page_visitor_page_view_1d partition(dt='2020-06-14') select mid_id, brand, model, operate_system, page_id, sum(during_time), count(*) from dwd_traffic_page_view_inc where dt='2020-06-14' group by mid_id,brand,model,operate_system,page_id;
2.1、 最近n日汇总表
2.1.1、交易域用户商品粒度订单最近n日汇总表
1)建表语句
DROP TABLE IF EXISTS dws_trade_user_sku_order_nd; CREATE EXTERNAL TABLE dws_trade_user_sku_order_nd ( `user_id` STRING COMMENT '用户id', `sku_id` STRING COMMENT 'sku_id', `sku_name` STRING COMMENT 'sku名称', `category1_id` STRING COMMENT '一级分类id', `category1_name` STRING COMMENT '一级分类名称', `category2_id` STRING COMMENT '一级分类id', `category2_name` STRING COMMENT '一级分类名称', `category3_id` STRING COMMENT '一级分类id', `category3_name` STRING COMMENT '一级分类名称', `tm_id` STRING COMMENT '品牌id', `tm_name` STRING COMMENT '品牌名称', `order_count_7d` STRING COMMENT '最近7日下单次数', `order_num_7d` BIGINT COMMENT '最近7日下单件数', `order_original_amount_7d` DECIMAL(16, 2) COMMENT '最近7日下单原始金额', `activity_reduce_amount_7d` DECIMAL(16, 2) COMMENT '最近7日活动优惠金额', `coupon_reduce_amount_7d` DECIMAL(16, 2) COMMENT '最近7日优惠券优惠金额', `order_total_amount_7d` DECIMAL(16, 2) COMMENT '最近7日下单最终金额', `order_count_30d` BIGINT COMMENT '最近30日下单次数', `order_num_30d` BIGINT COMMENT '最近30日下单件数', `order_original_amount_30d` DECIMAL(16, 2) COMMENT '最近30日下单原始金额', `activity_reduce_amount_30d` DECIMAL(16, 2) COMMENT '最近30日活动优惠金额', `coupon_reduce_amount_30d` DECIMAL(16, 2) COMMENT '最近30日优惠券优惠金额', `order_total_amount_30d` DECIMAL(16, 2) COMMENT '最近30日下单最终金额' ) COMMENT '交易域用户商品粒度订单最近n日汇总事实表' PARTITIONED BY (`dt` STRING) STORED AS ORC LOCATION '/warehouse/gmall/dws/dws_trade_user_sku_order_nd' TBLPROPERTIES ('orc.compress' = 'snappy');
2)数据装载
这里并不需要区分首日和每日。
insert overwrite table dws_trade_user_sku_order_nd partition(dt='2020-06-14') select user_id, sku_id, sku_name, category1_id, category1_name, category2_id, category2_name, category3_id, category3_name, tm_id, tm_name, sum(if(dt>=date_add('2020-06-14',-6),order_count_1d,0)), sum(if(dt>=date_add('2020-06-14',-6),order_num_1d,0)), sum(if(dt>=date_add('2020-06-14',-6),order_original_amount_1d,0)), sum(if(dt>=date_add('2020-06-14',-6),activity_reduce_amount_1d,0)), sum(if(dt>=date_add('2020-06-14',-6),coupon_reduce_amount_1d,0)), sum(if(dt>=date_add('2020-06-14',-6),order_total_amount_1d,0)), sum(order_count_1d), sum(order_num_1d), sum(order_original_amount_1d), sum(activity_reduce_amount_1d), sum(coupon_reduce_amount_1d), sum(order_total_amount_1d) from dws_trade_user_sku_order_1d where dt>=date_add('2020-06-14',-29) group by user_id,sku_id,sku_name,category1_id,category1_name,category2_id,category2_name,category3_id,category3_name,tm_id,tm_name;
这里我们 group by 的字段有很多,但其实并不影响。因为 group by 影响的是粒度,但是这里 group by 的字段都是商品的一些维度,所以不管 group by user_id,sku_id 还是 group by user_id,sku_id 和其它商品维度,最终的粒度依然是用户商品粒度。
2.1.2、交易域用户商品粒度退单最近n日汇总表
1)建表语句
每行代表:每个人每个最近7/30天某个商品退单的次数、件数。
列由维度和统计值组成。
DROP TABLE IF EXISTS dws_trade_user_sku_order_refund_nd; CREATE EXTERNAL TABLE dws_trade_user_sku_order_refund_nd ( `user_id` STRING COMMENT '用户id', `sku_id` STRING COMMENT 'sku_id', `sku_name` STRING COMMENT 'sku名称', `category1_id` STRING COMMENT '一级分类id', `category1_name` STRING COMMENT '一级分类名称', `category2_id` STRING COMMENT '一级分类id', `category2_name` STRING COMMENT '一级分类名称', `category3_id` STRING COMMENT '一级分类id', `category3_name` STRING COMMENT '一级分类名称', `tm_id` STRING COMMENT '品牌id', `tm_name` STRING COMMENT '品牌名称', `order_refund_count_7d` BIGINT COMMENT '最近7日退单次数', `order_refund_num_7d` BIGINT COMMENT '最近7日退单件数', `order_refund_amount_7d` DECIMAL(16, 2) COMMENT '最近7日退单金额', `order_refund_count_30d` BIGINT COMMENT '最近30日退单次数', `order_refund_num_30d` BIGINT COMMENT '最近30日退单件数', `order_refund_amount_30d` DECIMAL(16, 2) COMMENT '最近30日退单金额' ) COMMENT '交易域用户商品粒度退单最近n日汇总事实表' PARTITIONED BY (`dt` STRING) STORED AS ORC LOCATION '/warehouse/gmall/dws/dws_trade_user_sku_order_refund_nd' TBLPROPERTIES ('orc.compress' = 'snappy');
2)数据装载
找到与之相对的 1d 表,拿 30 哥分区过滤一下,按照用户和商品(所有商品属性,因为这些商品属性只对应一个商品,不用担心 group by 会破坏粒度)分组。
insert overwrite table dws_trade_user_sku_order_refund_nd partition(dt='2020-06-14') select user_id, sku_id, sku_name, category1_id, category1_name, category2_id, category2_name, category3_id, category3_name, tm_id, tm_name, sum(if(dt>=date_add('2020-06-14',-6),order_refund_count_1d,0)), sum(if(dt>=date_add('2020-06-14',-6),order_refund_num_1d,0)), sum(if(dt>=date_add('2020-06-14',-6),order_refund_amount_1d,0)), sum(order_refund_count_1d), sum(order_refund_num_1d), sum(order_refund_amount_1d) from dws_trade_user_sku_order_refund_1d where dt>=date_add('2020-06-14',-29) and dt<='2020-06-14' group by user_id,sku_id,sku_name,category1_id,category1_name,category2_id,category2_name,category3_id,category3_name,tm_id,tm_name;
2.1.3、交易域用户粒度订单最近n日汇总表
1)建表语句
行:每个用户最近7/30天的下单信息(次数、件数和总额等)
DROP TABLE IF EXISTS dws_trade_user_order_nd; CREATE EXTERNAL TABLE dws_trade_user_order_nd ( `user_id` STRING COMMENT '用户id', `order_count_7d` BIGINT COMMENT '最近7日下单次数', `order_num_7d` BIGINT COMMENT '最近7日下单商品件数', `order_original_amount_7d` DECIMAL(16, 2) COMMENT '最近7日下单原始金额', `activity_reduce_amount_7d` DECIMAL(16, 2) COMMENT '最近7日下单活动优惠金额', `coupon_reduce_amount_7d` DECIMAL(16, 2) COMMENT '最近7日下单优惠券优惠金额', `order_total_amount_7d` DECIMAL(16, 2) COMMENT '最近7日下单最终金额', `order_count_30d` BIGINT COMMENT '最近30日下单次数', `order_num_30d` BIGINT COMMENT '最近30日下单商品件数', `order_original_amount_30d` DECIMAL(16, 2) COMMENT '最近30日下单原始金额', `activity_reduce_amount_30d` DECIMAL(16, 2) COMMENT '最近30日下单活动优惠金额', `coupon_reduce_amount_30d` DECIMAL(16, 2) COMMENT '最近30日下单优惠券优惠金额', `order_total_amount_30d` DECIMAL(16, 2) COMMENT '最近30日下单最终金额' ) COMMENT '交易域用户粒度订单最近n日汇总事实表' PARTITIONED BY (`dt` STRING) STORED AS ORC LOCATION '/warehouse/gmall/dws/dws_trade_user_order_nd' TBLPROPERTIES ('orc.compress' = 'snappy');
2)数据装载
上面的 1d 表我们对 order_id 进行了去重,因为一个订单多个商品,会在 dwd_order_detail 重罚产生多条记录。但是这里的 nd 表并不需要去重,因为一个订单不可能多次下单。
insert overwrite table dws_trade_user_order_nd partition(dt='2020-06-14') select user_id, sum(if(dt>=date_add('2020-06-14',-6),order_count_1d,0)), sum(if(dt>=date_add('2020-06-14',-6),order_num_1d,0)), sum(if(dt>=date_add('2020-06-14',-6),order_original_amount_1d,0)), sum(if(dt>=date_add('2020-06-14',-6),activity_reduce_amount_1d,0)), sum(if(dt>=date_add('2020-06-14',-6),coupon_reduce_amount_1d,0)), sum(if(dt>=date_add('2020-06-14',-6),order_total_amount_1d,0)), sum(order_count_1d), sum(order_num_1d), sum(order_original_amount_1d), sum(activity_reduce_amount_1d), sum(coupon_reduce_amount_1d), sum(order_total_amount_1d) from dws_trade_user_order_1d where dt>=date_add('2020-06-14',-29) and dt<='2020-06-14' group by user_id;
离线数仓(九)【DWS 层开发】(5)https://developer.aliyun.com/article/1532440