同时在线问题在游戏、电商、直播、教育等互联网行业中很常见,该指标直接展示了各行业的竞争力,Boss也是很关注这个点,结果放到大屏展示也是在显眼的位置。这个问题如何进行分析呢,话不多说,直接上示例
案例:某游戏玩家玩游戏时间表如下:玩家ID、上线时间、下线时间
player_id |
on_time |
off_time |
10001 |
2021-08-19 08:08:08 |
2021-08-19 12:11:23 |
10002 |
2021-08-19 09:01:07 |
2021-08-19 13:15:12 |
10003 |
2021-08-19 10:12:14 |
2021-08-19 10:50:18 |
10004 |
2021-08-19 09:20:21 |
2021-08-19 12:46:25 |
10005 |
2021-08-19 10:14:22 |
2021-08-19 13:26:15 |
10006 |
2021-08-19 11:11:25 |
2021-08-19 16:24:18 |
10007 |
2021-08-19 12:00:00 |
2021-08-19 17:48:03 |
10008 |
2021-08-19 13:12:14 |
2021-08-19 18:18:14 |
10009 |
2021-08-19 15:21:21 |
2021-08-19 21:15:17 |
10010 |
2021-08-19 18:20:02 |
2021-08-19 23:45:12 |
... |
... |
... |
要求展示每天的最高在线人数:
date |
online_ct |
2021-08-01 |
254120 |
2021-08-02 |
198652 |
... |
... |
分析:在之前的文章中我们也使用过增加一个列用作标记,应用在这里可以对每个玩家上线标记1,下线标记-1,应用流式思想来一条数据就累加这个标记值,当然这个需要按时间先后顺序。累加的值就代表当前在线的人数问题。
- 首先生成示例数据
with tb1 as(select player_id, on_time, off_time, substr(on_time,1,10)as dt fromvalues('10001','2021-08-19 08:08:08','2021-08-19 12:11:23'),('10002','2021-08-19 09:01:07','2021-08-19 13:15:12'),('10003','2021-08-19 10:12:14','2021-08-19 10:50:18'),('10004','2021-08-19 09:20:21','2021-08-19 12:46:25'),('10005','2021-08-19 10:14:22','2021-08-19 13:26:15'),('10006','2021-08-19 11:11:25','2021-08-19 16:24:18'),('10007','2021-08-19 12:00:00','2021-08-19 17:48:03'),('10008','2021-08-19 13:12:14','2021-08-19 18:18:14'),('10009','2021-08-19 15:21:21','2021-08-19 21:15:17'),('10010','2021-08-19 18:20:02','2021-08-19 23:45:12') t(player_id,on_time,off_time))
- 利用前面分析的列转行方法,将每个玩家的上下线时间转成一列
tb2 as(select player_id, on_time as on_off_time, dt,1as flag from tb1 union all select player_id, off_time as on_off_time, dt,-1as flag from tb1 )
- 按时间先后顺序进行累加
tb3 as(select player_id, on_off_time, dt, flag, sum(flag) over(partition by dt orderby on_off_time) sums from tb2 )--结果如下player_id on_off_time dt flag sums 100012021-08-1908:08:082021-08-1911100022021-08-1909:01:072021-08-1912100042021-08-1909:20:212021-08-1913100032021-08-1910:12:142021-08-1914100052021-08-1910:14:222021-08-1915100032021-08-1910:50:182021-08-19-14100062021-08-1911:11:252021-08-1915100072021-08-1912:00:002021-08-1916100012021-08-1912:11:232021-08-19-15100042021-08-1912:46:252021-08-19-14100082021-08-1913:12:142021-08-1915100022021-08-1913:15:122021-08-19-14100052021-08-1913:26:152021-08-19-13100092021-08-1915:21:212021-08-1914100062021-08-1916:24:182021-08-19-13100072021-08-1917:48:032021-08-19-12100082021-08-1918:18:142021-08-19-11100102021-08-1918:20:022021-08-1912100092021-08-1921:15:172021-08-19-11100102021-08-1923:45:122021-08-19-10
- 按天统计最高人数
select dt asdate, max(sums)as online_ct from tb3 groupby dt;--结果如下date online_ct 2021-08-196
掌握该类问题分析方法对业务需求分析有很大帮助,该类问题也可细化为每个小时在线人数等等。
拜了个拜