HQL 55 题【已完结】(1)https://developer.aliyun.com/article/1532397
3.17
1、购买过商品1和商品2但是没有购买商品3的顾客
order_detail:
order_info:
select user_id from( select user_id, collect_set(sku_id) skus from order_detail od join order_info oi on od.order_id = oi.order_id group by user_id )t1 where array_contains(skus,'1') and array_contains(skus,'2') and !array_contains(skus,'3');
知识点:
- array_contains() 可以判断集合里是否包含某个元素,set list arr 都可以
2、统计每日商品1和商品2销量的差值
order_detail:
思路1
我的做法是分别查询商品1和商品2的销量,因为两件商品有时候不是每天都有卖出去的,所以选择用 full join,对 create_date 为 null 的字段进行处理,并对当天销量为 null 的字段补 0。
select '1' sku_1,'2' sku_2,coalesce(t1.create_date,t2.create_date),(sum(`if`(t1.sku_num is null,0,t1.sku_num))-sum(`if`(t2.sku_num is null,0,t2.sku_num))) diff from ( select sku_id,create_date,sku_num from order_detail where sku_id='1' )t1 full outer join( select sku_id,create_date,sku_num from order_detail where sku_id='2' )t2 on t1.create_date=t2.create_date group by t1.create_date,t2.create_date;
思路2
select create_date, sum(if(sku_id = '1', sku_num, 0)) - sum(if(sku_id = '2', sku_num, 0)) diff from order_detail where sku_id in ('1', '2') group by create_date;
这种思路比较简单,思路1是我想复杂了。
3、根据商品销售情况进行商品分类
需求:通过订单详情表(order_detail)的数据,根据销售件数对商品进行分类,销售件数0-5000为冷门商品,5001-19999为一般商品,20000以上为热门商品,统计不同类别商品的数量。
order_detail:
select category,count(category) from ( select case when 0<sku_sum and sku_sum<=5000 then '冷门商品' when 5000<sku_sum and sku_sum<=19999 then '一般商品' when 20000<sku_sum then '热门商品' end category from ( select sum(sku_num) sku_sum from order_detail group by sku_id )t1 )t2 group by category;
或者
SELECT category, count(sku_id) FROM ( SELECT sku_id, CASE WHEN sum(sku_num)>=0 AND sum(sku_num)<=5000 THEN '冷门商品' WHEN sum(sku_num)>=5001 AND sum(sku_num)<=19999 THEN '一般商品' WHEN sum(sku_num)>=20000 THEN '热门商品' END category FROM order_detail GROUP BY sku_id )t1 GROUP BY category;
4、查询有新增用户的日期的新增用户数和新增用户一日留存率
需求:
- 从用户登录明细表(user_login_detail)中统计有新增用户的日期的新增用户数(若某日未新增用户,则不出现在统计结果中),并统计这些新增用户的一日留存率。
- 用户首次登录为当天新增,次日也登录则为一日留存。一日留存用户占新增用户数的比率为一日留存率。
4.1、计算新用户注册日期
-- 查询出每天新用户的个数 select date_format(login_ts,'yyyy-MM-dd') register_date, count(user_id) cnt from( select user_id, login_ts, row_number() over (partition by user_id order by login_ts) rk from user_login_detail -- group by user_id,login_ts 不需要,over中已经分区过了 )t1 where t1.rk=1 group by date_format(login_ts,'yyyy-MM-dd');
注意:
- row_numer() 内我们已经分区过了,所以不需要再 group by
- 最外层的 group by 不能直接使用引用(register_date),要把它转为表达式
其实查询用户注册日期可以使用 min 来巧妙计算:
select user_id,date_format(min(login_ts),'yyyy-MM-dd') from user_login_detail group by user_id;
4.2、留存率实现
select t3.register_date, t3.register_count, cast(cast(t3.retention_1_count as decimal(16,2)) / t3.register_count as decimal(16,2)) retention_1_rate from( select t1.register_date, count(distinct t1.user_id) register_count, count(distinct t2.user_id) retention_1_count -- 必须去重,因为有人一天多次登录 from( select user_id, date_format(min(login_ts),'yyyy-MM-dd') register_date from user_login_detail group by user_id )t1 left join user_login_detail t2 on t1.user_id=t2.user_id and datediff(date_format(t2.login_ts,'yyyy-MM-dd'),t1.register_date)=1 group by t1.register_date )t3;
主要思路就是对用户注册时间表进行一个 left join ,因为 left join 可能会匹配到很多行,所以用 datediff 过滤出注册后第二天也登录的记录。因为有的用户可能一天多次登录,所以需要对结果去重。最后 昨天注册人数/第二天登录人数 得出留存率。
5、登录次数及交易次数统计
user_login_detail:
delivery_info:
delivery_id | order_id | user_id | order_date | costom_date |
1 | 1 | 101 | 2021-09-27 | 2021-09-29 |
2 | 2 | 101 | 2021-09-28 | 2021-09-28 |
3 | 3 | 101 | 2021-09-29 | 2021-09-30 |
4 | 4 | 101 | 2021-09-30 | 2021-10-01 |
需求:分别从登录明细表(user_login_detail)和配送信息表(delivery_info)中的用户登录时间和下单时间字段,统计登陆次数和交易次数
select t1.user_id,login_date,login_count,`if`(order_count is null,0,order_count) from ( select user_id, date_format(login_ts,'yyyy-MM-dd') login_date, count(*) login_count from user_login_detail group by user_id, date_format(login_ts,'yyyy-MM-dd') )t1 left join ( select user_id, order_date, count(*) order_count from delivery_info group by user_id, order_date )t2 on t1.user_id=t2.user_id and login_date=order_date order by user_id,login_date;
6、统计每个商品各年度销售总额
需求:从订单明细表(order_detail)中统计每个商品各年度的销售总额。
-- 统计每个商品各年度销售总额 select sku_id, year(create_date) year_date, cast(sum(price * sku_num) as decimal(16,2)) sku_sum from order_detail group by sku_id,year(create_date);
太简单了,不需要解释。
7、某周内每件商品每天销售情况
需求:从订单详情表(order_detail)中查询2021年9月27号-2021年10月3号这一周所有商品每天销售件数
select sku_id, sum(`if`(create_date='2021-09-27',sku_num,0)) Monday, sum(`if`(create_date='2021-09-28',sku_num,0)) Tuesday, sum(`if`(create_date='2021-09-29',sku_num,0)) Wednesday, sum(`if`(create_date='2021-09-30',sku_num,0)) Thursday, sum(`if`(create_date='2021-10-01',sku_num,0)) Friday, sum(`if`(create_date='2021-10-02',sku_num,0)) Saturday, sum(`if`(create_date='2021-10-03',sku_num,0)) Sunday from order_detail where create_date>='2021-09-27' and create_date<='2021-10-03' group by sku_id;
或者:
select sku_id, sum(case when `dayofweek`(create_date)='2' then sku_num else 0 end) Monday, sum(case when `dayofweek`(create_date)='3' then sku_num else 0 end) Tuesday, sum(case when `dayofweek`(create_date)='4' then sku_num else 0 end) Wednesday, sum(case when `dayofweek`(create_date)='5' then sku_num else 0 end) Thursday, sum(case when `dayofweek`(create_date)='6' then sku_num else 0 end) Friday, sum(case when `dayofweek`(create_date)='7' then sku_num else 0 end) Saturday, sum(case when `dayofweek`(create_date)='1' then sku_num else 0 end) Sunday from order_detail where create_date>='2021-09-27' and create_date<='2021-10-03' group by sku_id;
8、同期商品售卖分析表
需求:从订单明细表(order_detail)中,统计同一个商品在2021年和2022年中同一个月的销量对比。
思路1
分别求出2021年和 2020年每个商品每个月的总销量(注意:按照月份进行 group by),然后将两张表进行 full join:
select coalesce(t1.sku_id,t2.sku_id), coalesce(t1.month,t2.month), if(t1.2020_sku_sum is null ,0,t1.2020_sku_sum), if(t2.2021_sku_sum is null ,0,t2.2021_sku_sum) from( select sku_id, month(create_date) `month`, sum(sku_num) 2020_sku_sum from order_detail where year(create_date)='2020' group by sku_id,month(create_date) )t1 full outer join ( select sku_id, month(create_date) `month`, sum(sku_num) 2021_sku_sum from order_detail where year(create_date)='2021' group by sku_id,month(create_date) )t2 on t1.sku_id=t2.sku_id and t1.month=t2.month
思路2
select sku_id, month(create_date) `month`, sum(`if`(year(create_date)='2020',sku_num,0)) 2020_sku_sum, sum(`if`(year(create_date)='2021',sku_num,0)) 2021_sku_sum from order_detail where year(create_date) = '2021' or year(create_date) = '2020' group by sku_id, month(create_date);
这种也好理解,第一次写写错了。
注意点:这里还是要重复提醒一下,group by 后面不能跟函数表达式的别名字段,必须跟函数表达式,或者表格自身的字段。
9、国庆期间每个sku的收藏量和购买量
需求:从订单明细表(order_detail)和收藏信息表(favor_info)中统计2021年国庆节期间(10月1日-10月7日),每个商品的购买总数量和总收藏次数。
-- 国庆期间每个sku的收藏量和购买量 select t1.sku_id, t1.sku_sum, `if`(t2.favor_sum is null,0,t2.favor_sum) from( select sku_id, sum(sku_num) sku_sum from order_detail where create_date>='2021-10-01' and create_date<='2021-10-07' group by sku_id )t1 full join ( select sku_id, count(*) favor_sum from favor_info where create_date>='2021-10-01' and create_date<='2021-10-07' group by sku_id )t2 on t1.sku_id=t2.sku_id;
知识点:这里我们和之前一样对于 null 的单元格填充 0 ,除了用 if 函数用 nvl 更加专业。
10、国庆节期间各品类商品的7日动销率和滞销率
需求:
- 动销率的定义为某品类的商品中一段时间内有销量的商品种类数占当前已上架总商品种类数的比例(有销量的商品种类数/已上架总商品种类数)。
- 滞销率的定义为某分类商品中一段时间内没有销量的商品种类数占当前已上架总商品种类数的比例(没有销量的商品种类数/已上架总商品种类数)。
- 只要当天任一店铺有任何商品的销量就输出该天的统计结果。
- 从订单明细表(order_detail)和商品信息表(sku_info)表中统计国庆节期间(10月1日-10月7日)每天每个分类的商品的动销率和滞销率
select category_id, cast(first_sale_rate as decimal(16, 2)) first_sale_rate, cast((1 - first_sale_rate) as decimal(16, 2)) first_unsale_rate, cast(second_sale_rate as decimal(16, 2)) second_sale_rate, cast((1 - second_sale_rate) as decimal(16, 2)) second_unsale_rate, cast(third_sale_rate as decimal(16, 2)) third_sale_rate, cast((1 - third_sale_rate) as decimal(16, 2)) third_unsale_rate, cast(fourth_sale_rate as decimal(16, 2)) fourth_sale_rate, cast((1 - fourth_sale_rate) as decimal(16, 2)) fourth_unsale_rate, cast(fifth_sale_rate as decimal(16, 2)) fifth_sale_rate, cast((1 - fifth_sale_rate) as decimal(16, 2)) fifth_unsale_rate, cast(sixth_sale_rate as decimal(16, 2)) sixth_sale_rate, cast((1 - sixth_sale_rate) as decimal(16, 2)) sixth_unsale_rate, cast(seventh_sale_rate as decimal(16, 2)) seventh_sale_rate, cast((1 - seventh_sale_rate) as decimal(16, 2)) seventh_unsale_rate from (select category_id, cast(count(distinct if(t1.create_date = '2021-10-01', t1.sku_id, null)) as decimal(16, 2)) / count(distinct if(sku_info.from_date <= '2021-10-01', sku_info.sku_id, null)) first_sale_rate, cast(count(distinct if(t1.create_date = '2021-10-02', t1.sku_id, null)) as decimal(16, 2)) / count(distinct if(sku_info.from_date <= '2021-10-02', sku_info.sku_id, null)) second_sale_rate, cast(count(distinct if(t1.create_date = '2021-10-03', t1.sku_id, null)) as decimal(16, 2)) / count(distinct if(sku_info.from_date <= '2021-10-03', sku_info.sku_id, null)) third_sale_rate, cast(count(distinct if(t1.create_date = '2021-10-04', t1.sku_id, null)) as decimal(16, 2)) / count(distinct if(sku_info.from_date <= '2021-10-04', sku_info.sku_id, null)) fourth_sale_rate, cast(count(distinct if(t1.create_date = '2021-10-05', t1.sku_id, null)) as decimal(16, 2)) / count(distinct if(sku_info.from_date <= '2021-10-05', sku_info.sku_id, null)) fifth_sale_rate, cast(count(distinct if(t1.create_date = '2021-10-06', t1.sku_id, null)) as decimal(16, 2)) / count(distinct if(sku_info.from_date <= '2021-10-06', sku_info.sku_id, null)) sixth_sale_rate, cast(count(distinct if(t1.create_date = '2021-10-07', t1.sku_id, null)) as decimal(16, 2)) / count(distinct if(sku_info.from_date <= '2021-10-07', sku_info.sku_id, null)) seventh_sale_rate from (select sku_id, create_date from order_detail where create_date >= '2021-10-01' and create_date <= '2021-10-07') t1 right join sku_info on t1.sku_id = sku_info.sku_id group by category_id) t2;
HQL 55 题【已完结】(3)https://developer.aliyun.com/article/1532399