题目:给定表t,字段信息如下
date |
c1 |
c2 |
c3 |
2022-09-15 |
AB |
12 |
bc |
2022-09-16 |
23 |
||
2022-09-17 |
d |
||
2022-09-18 |
BC |
问题:如何补全数据,结果如下
date |
c1 |
c2 |
c3 |
2022-09-15 |
AB |
12 |
bc |
2022-09-16 |
AB |
23 |
bc |
2022-09-17 |
AB |
23 |
d |
2022-09-18 |
BC |
23 |
bc |
思路:可以对缺失的数据进行分组,组内累加该字段值,但是目前这题不能使用此方法,因为字段值为含有字符串不能直接累加,可以采用分组之后取第一值。
构造数据:使用SQL先构造上述数据
with t as(selectdate, c1, c2, c3 fromvalues('2022-09-15','AB',12,'bc'),('2022-09-16',null,23,null),('2022-09-17',null,null,'d'),('2022-09-18','BC',null,null) tmp(date,c1,c2,c3))date c1 c2 c3 2022-09-15 AB 12 bc 2022-09-16 \N 23 \N 2022-09-17 \N \N d 2022-09-18 BC \N \N
解法一:分组累加思想
selectdate, first_value(c1) over(partition by flag1 orderbydate) c1, first_value(c2) over(partition by flag2 orderbydate) c2, first_value(c3) over(partition by flag3 orderbydate) c3 from(selectdate, c1, c2, c3, lag_c1, lag_c2, lag_c3, sum(if(c1 isnotnulland lag_c1 isnull,1,0)) over(orderbydate) flag1, sum(if((c2 isnotnulland lag_c2 isnull)or(c2 isnotnulland lag_c2 isnotnull),1,0)) over(orderbydate) flag2, sum(if(c3 isnotnulland lag_c3 isnull,1,0)) over(orderbydate) flag3 from(selectdate, c1, c2, c3, lag(c1,1) over(orderbydate) lag_c1, lag(c2,1) over(orderbydate) lag_c2, lag(c3,1) over(orderbydate) lag_c3 from t ) a ) b date c1 c2 c3 2022-09-15 AB 12 bc 2022-09-16 AB 23 bc 2022-09-17 AB 23 d 2022-09-18 BC 23 d
解法二:解法一存在特殊情况,比如c2的连续不为空,分组时就得单独考虑,以下这个方法无需考虑改情况且较为巧妙用count代替sum累加
selectdate, first_value(c1) over(partition by aa orderbydate)as c1, first_value(c2) over(partition by bb orderbydate)as c2, first_value(c3) over(partition by cc orderbydate)as c3 from(selectdate, c1, c2, c3,count(c1) over(orderbydate)as aa,count(c2) over(orderbydate)as bb,count(c3) over(orderbydate)as cc from t ) a;date c1 c2 c3 2022-09-15 AB 12 bc 2022-09-16 AB 23 bc 2022-09-17 AB 23 d 2022-09-18 BC 23 d
此题应该还有其他解法,欢迎交流
拜了个拜