*2.22 查询相同时刻多地登陆的用户
2.22.1 题目需求
从登录明细表(user_login_detail)中查询在相同时刻,多地登陆(ip_address不同)的用户,期望结果如下:
user_id(用户id) |
101 |
102 |
104 |
107 |
2.22.2 代码实现
hive>
select distinct t2.user_id from ( select t1.user_id, if(t1.max_logout is null ,2,if(t1.max_logout<t1.login_ts,1,0)) flag from ( select user_id, login_ts, logout_ts, max(logout_ts)over(partition by user_id order by login_ts rows between unbounded preceding and 1 preceding) max_logout from user_login_detail )t1 )t2 where t2.flag=0
2.23 销售额完成任务指标的商品
2.23.1 题目需求
商家要求每个商品每个月需要售卖出一定的销售总额
假设1号商品销售总额大于21000,2号商品销售总额大于10000,其余商品没有要求
请写出SQL从订单详情表中(order_detail)查询连续两个月销售总额大于等于任务总额的商品
结果如下:
sku_id(商品id) |
1 |
2.23.2 代码实现及步骤
hive>
-- 求出1号商品 和 2号商品 每个月的购买总额 并过滤掉没有满足指标的商品
select sku_id, concat(substring(create_date,0,7),'-01') ymd, sum(price*sku_num) sku_sum from order_detail where sku_id=1 or sku_id=2 group by sku_id,substring(create_date,0,7) having (sku_id=1 and sku_sum>=21000) or (sku_id=2 and sku_sum>=10000)
-- 判断是否为连续两个月 select distinct t3.sku_id from ( select t2.sku_id, count(*)over(partition by t2.sku_id,t2.rymd) cn from ( select t1.sku_id, add_months(t1.ymd,-row_number()over(partition by t1.sku_id order by t1.ymd)) rymd from ( select sku_id, concat(substring(create_date,0,7),'-01') ymd, sum(price*sku_num) sku_sum from order_detail where sku_id=1 or sku_id=2 group by sku_id,substring(create_date,0,7) having (sku_id=1 and sku_sum>=21000) or (sku_id=2 and sku_sum>=10000) )t1 )t2 )t3 where t3.cn>=2
2.24 根据商品销售情况进行商品分类
2.24.1 题目需求
从订单详情表中(order_detail)对销售件数对商品进行分类,0-5000为冷门商品,5001-19999位一般商品,20000往上为热门商品,并求出不同类别商品的数量
结果如下:
Category(类型) |
Cn(数量) |
一般商品 |
1 |
冷门商品 |
10 |
热门商品 |
1 |
2.24.2 代码实现
hive>
select t2.category, count(*) cn from ( select t1.sku_id, case when t1.sku_sum >=0 and t1.sku_sum<=5000 then '冷门商品' when t1.sku_sum >=5001 and t1.sku_sum<=19999 then '一般商品' when t1.sku_sum >=20000 then '热门商品' end category from ( select sku_id, sum(sku_num) sku_sum from order_detail group by sku_id )t1 )t2 group by t2.category
2.25 各品类销量前三的所有商品
2.25.1 题目需求
从订单详情表中(order_detail)和商品(sku_info)中查询各个品类销售数量前三的商品。如果该品类小于三个商品,则输出所有的商品销量。
结果如下:
Sku_id(商品id) |
Category_id(品类id) |
2 |
1 |
4 |
1 |
1 |
1 |
8 |
2 |
7 |
2 |
5 |
2 |
12 |
3 |
11 |
3 |
10 |
3 |
2.25.2 代码实现
hive>
select t2.sku_id, t2.category_id from ( select t1.sku_id, si.category_id, rank()over(partition by category_id order by t1.sku_sum desc) rk from ( select sku_id, sum(sku_num) sku_sum from order_detail group by sku_id )t1 join sku_info si on t1.sku_id=si.sku_id )t2 where t2.rk<=3;
2.26 各品类中商品价格的中位数
2.26.1 题目需求
从商品(sku_info)中球中位数如果是偶数则输出中间两个值的平均值,如果是奇数,则输出中间数即可。
结果如下:
Category_id(品类id) |
Medprice(中位数) |
1 |
3500.0 |
2 |
1250.0 |
3 |
510.0 |
2.26.2 代码实现
hive>
--求个每个品类价格排序商品数量以及打上奇偶数的标签 select sku_id, category_id, price, row_number()over(partition by category_id order by price desc) rk, count(*)over(partition by category_id) cn, count(*)over(partition by category_id)%2 falg from sku_info t1
--求出偶数品类的中位数 select distinct t1.category_id, avg(t1.price)over(partition by t1.category_id) medprice from ( select sku_id, category_id, price, row_number()over(partition by category_id order by price desc) rk, count(*)over(partition by category_id) cn, count(*)over(partition by category_id)%2 falg from sku_info )t1 where t1.falg=0 and (t1.rk=cn/2 or t1.rk=cn/2+1)
--求出奇数品类的中位数 select t1.category_id, t1.price from ( select sku_id, category_id, price, row_number()over(partition by category_id order by price desc) rk, count(*)over(partition by category_id) cn, count(*)over(partition by category_id)%2 falg from sku_info )t1 where t1.falg=1 and t1.rk=round(cn/2)
-- 竖向拼接 select distinct t1.category_id, avg(t1.price)over(partition by t1.category_id) medprice from ( select sku_id, category_id, price, row_number()over(partition by category_id order by price desc) rk, count(*)over(partition by category_id) cn, count(*)over(partition by category_id)%2 falg from sku_info )t1 where t1.falg=0 and (t1.rk=cn/2 or t1.rk=cn/2+1) union select t1.category_id, t1.price/1 from ( select sku_id, category_id, price, row_number()over(partition by category_id order by price desc) rk, count(*)over(partition by category_id) cn, count(*)over(partition by category_id)%2 falg from sku_info )t1 where t1.falg=1 and t1.rk=round(cn/2)
2.27 找出销售额连续3天超过100的商品
2.27.1 题目需求
从订单详情表(order_detail)中找出销售额连续3天超过100的商品
结果如下:
Sku_id(商品id) |
1 |
10 |
11 |
12 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
2.27.2 代码实现
hive>
-- 每个商品每天的销售总额 select sku_id, create_date, sum(price*sku_num) sku_sum from order_detail group by sku_id,create_date having sku_sum>=100
-- 判断连续三天以上 select distinct t3.sku_id from ( select t2.sku_id, count(*)over(partition by t2.sku_id,t2.date_drk) cdrk from ( select t1.sku_id, t1.create_date, date_sub(t1.create_date,rank()over(partition by t1.sku_id order by t1.create_date)) date_drk from ( select sku_id, create_date, sum(price*sku_num) sku_sum from order_detail group by sku_id,create_date having sku_sum>=100 )t1 )t2 )t3 where t3.cdrk>=3
2.28 查询有新注册用户的当天的新用户数量、新用户的第一天留存率
2.28.1 题目需求
从用户登录明细表(user_login_detail)中首次登录算作当天新增,第二天也登录了算作一日留存
结果如下:
2.28.2 代码实现
hive>
-- 每个用户首次登录时间 和 第二天是否登录 并看每天新增和留存数量 select t1.first_login, count(t1.user_id) register, count(t2.user_id) remain_1 from ( select user_id, date_format(min(login_ts),'yyyy-MM-dd') first_login from user_login_detail group by user_id )t1 left join user_login_detail t2 on t1.user_id=t2.user_id and datediff(date_format(t2.login_ts,'yyyy-MM-dd'),t1.first_login)=1 group by t1.first_login
-- 新增数量和留存率 select t3.first_login, t3.register, t3.remain_1/t3.register retention from ( select t1.first_login, count(t1.user_id) register, count(t2.user_id) remain_1 from ( select user_id, date_format(min(login_ts),'yyyy-MM-dd') first_login from user_login_detail group by user_id )t1 left join user_login_detail t2 on t1.user_id=t2.user_id and datediff(date_format(t2.login_ts,'yyyy-MM-dd'),t1.first_login)=1 group by t1.first_login )t3
2.29 求出商品连续售卖的时间区间
2.29.1 题目需求
从订单详情表(order_detail)中,求出商品连续售卖的时间区间
结果如下(截取部分):
2.29.2 代码实现
hive>
-- 每个商品售卖的日期以及拿到按排序后日期的差值 select sku_id, create_date, date_sub(create_date,rank()over(partition by sku_id order by create_date)) ddrk from order_detail group by sku_id,create_date
-- 拿到每次售卖的区间 select distinct sku_id, first_value(t1.create_date)over(partition by t1.sku_id,t1.ddrk order by t1.create_date rows between unbounded preceding and unbounded following) start_date, last_value(t1.create_date)over(partition by t1.sku_id,t1.ddrk order by t1.create_date rows between unbounded preceding and unbounded following) end_date from ( select sku_id, create_date, date_sub(create_date,rank()over(partition by sku_id order by create_date)) ddrk from order_detail group by sku_id,create_date )t1
2.30 登录次数及交易次数统计
2.30.1 题目需求
分别从登陆明细表(user_login_detail)和配送信息表中用户登录时间和下单时间统计登陆次数和交易次数
结果如下(截取部分):
2.30.2 代码实现
hive>
-- 拿到每个用户每天的登录次数 select user_id, date_format(login_ts,'yyyy-MM-dd') login_date, count(*) login_count from user_login_detail group by user_id,date_format(login_ts,'yyyy-MM-dd')
-- 拿到每个用户每天的交易次数 select t1.user_id, t1.login_date, collect_set(t1.login_count)[0] login_count , count(di.user_id) order_count from ( select user_id, date_format(login_ts,'yyyy-MM-dd') login_date, count(*) login_count from user_login_detail group by user_id,date_format(login_ts,'yyyy-MM-dd') )t1 left join delivery_info di on t1.user_id=di.user_id and t1.login_date=di.order_date group by t1.user_id,t1.login_date
2.31 按年度列出每个商品销售总额
2.31.1 题目需求
从订单明细表(order_detail)中列出每个商品每个年度的购买总额
结果如下(截取部分):
2.31.2 代码实现
hive> select sku_id, year(create_date) year_date, sum(price*sku_num) sku_sum from order_detail group by sku_id,year(create_date)
2.32. 某周内每件商品每天销售情况
2.32.1 题目需求
从订单详情表(order_detail)中查询2021年9月27号-2021年10月3号这一周所有商品每天销售情况。
结果如下:
2.31.2 代码实现
hive> select sku_id, year(create_date) year_date, sum(price*sku_num) sku_sum from order_detail group by sku_id,year(create_date)
2.33 查看每件商品的售价涨幅情况
2.33.1 题目需求
从商品价格变更明细表(sku_price_modify_detail),得到最近一次价格的涨幅情况,并按照涨幅升序排序。
结果如下:
Sku_id(商品id) |
Price_change(涨幅) |
8 |
-200.00 |
9 |
-100.00 |
2 |
-70.00 |
11 |
-16.00 |
12 |
-15.00 |
3 |
1.00 |
5 |
10.00 |
10 |
10.00 |
7 |
12.00 |
6 |
12.00 |
1 |
100.00 |
4 |
400.00 |
2.33.2 代码实现
hive>
-- 对每个商品按照修改日期倒序排序 并求出差值 select sku_id, new_price-lead(new_price,1,0)over(partition by sku_id order by change_date desc) price_change, rank()over(partition by sku_id order by change_date desc) rk from sku_price_modify_detail t1
-- 最近一次修改的价格 select t1.sku_id, t1.price_change from ( select sku_id, new_price-lead(new_price,1,0)over(partition by sku_id order by change_date desc) price_change, rank()over(partition by sku_id order by change_date desc) rk from sku_price_modify_detail )t1 where rk=1 order by t1.price_change