我们知道SQL中有个explode函数,经常使用的也是该函数,但还有个函数posexplode用法和其类似,使用好这个函数,可以简化不少工作。话不多说,直接上示例。
案例:有一张表,其数据如下
seat_no |
bd_time |
A,B,C,D |
8:00,8:10,8:20,8:30 |
H,I,J,K |
9:10,9:20,9:30,9:40 |
... |
... |
展示如下:
sn |
bt |
A |
8:00 |
B |
8:10 |
C |
8:20 |
D |
8:30 |
... |
... |
分析:看到字段数据,可以想到使用lateral view explode展开,我们先使用explode展开看下
with tb1 as(select seat_no, bd_time fromvalues('A,B,C,D','8:00,8:10,8:20,8:30'),('H,I,J,K','9:10,9:20,9:30,9:40') t(seat_no,bd_time))select seat_no, bd_time, exp_sn from tb1 lateral view explode(split(seat_no,',')) t as exp_sn;--只展开seat_no 结果如下seat_no bd_time exp_sn A,B,C,D 8:00,8:10,8:20,8:30 A A,B,C,D 8:00,8:10,8:20,8:30 B A,B,C,D 8:00,8:10,8:20,8:30 C A,B,C,D 8:00,8:10,8:20,8:30 D H,I,J,K 9:10,9:20,9:30,9:40 H H,I,J,K 9:10,9:20,9:30,9:40 I H,I,J,K 9:10,9:20,9:30,9:40 J H,I,J,K 9:10,9:20,9:30,9:40 K select seat_no, bd_time, exp_sn, exp_bt from tb1 lateral view explode(split(seat_no,',')) t as exp_sn lateral view explode(split(bd_time,',')) t as exp_bt;--两个字段均展开seat_no bd_time exp_sn exp_bt A,B,C,D 8:00,8:10,8:20,8:30 A 8:00A,B,C,D 8:00,8:10,8:20,8:30 A 8:10A,B,C,D 8:00,8:10,8:20,8:30 A 8:20A,B,C,D 8:00,8:10,8:20,8:30 A 8:30A,B,C,D 8:00,8:10,8:20,8:30 B 8:00A,B,C,D 8:00,8:10,8:20,8:30 B 8:10A,B,C,D 8:00,8:10,8:20,8:30 B 8:20A,B,C,D 8:00,8:10,8:20,8:30 B 8:30A,B,C,D 8:00,8:10,8:20,8:30 C 8:00A,B,C,D 8:00,8:10,8:20,8:30 C 8:10A,B,C,D 8:00,8:10,8:20,8:30 C 8:20A,B,C,D 8:00,8:10,8:20,8:30 C 8:30A,B,C,D 8:00,8:10,8:20,8:30 D 8:00A,B,C,D 8:00,8:10,8:20,8:30 D 8:10A,B,C,D 8:00,8:10,8:20,8:30 D 8:20A,B,C,D 8:00,8:10,8:20,8:30 D 8:30H,I,J,K 9:10,9:20,9:30,9:40 H 9:10H,I,J,K 9:10,9:20,9:30,9:40 H 9:20H,I,J,K 9:10,9:20,9:30,9:40 H 9:30H,I,J,K 9:10,9:20,9:30,9:40 H 9:40H,I,J,K 9:10,9:20,9:30,9:40 I 9:10H,I,J,K 9:10,9:20,9:30,9:40 I 9:20H,I,J,K 9:10,9:20,9:30,9:40 I 9:30H,I,J,K 9:10,9:20,9:30,9:40 I 9:40H,I,J,K 9:10,9:20,9:30,9:40 J 9:10H,I,J,K 9:10,9:20,9:30,9:40 J 9:20H,I,J,K 9:10,9:20,9:30,9:40 J 9:30H,I,J,K 9:10,9:20,9:30,9:40 J 9:40H,I,J,K 9:10,9:20,9:30,9:40 K 9:10H,I,J,K 9:10,9:20,9:30,9:40 K 9:20H,I,J,K 9:10,9:20,9:30,9:40 K 9:30H,I,J,K 9:10,9:20,9:30,9:40 K 9:40
从上述结果可以看到,按两个字段展开相当于直接笛卡尔积了,在之前的文章中我们也分析过生成索引的posexplode函数,接下来我们加上索引试试看
select seat_no, bd_time, inx_sn, exp_sn from tb1 lateral view posexplode(split(seat_no,',')) t as inx_sn,exp_sn;--只展开seat_no 结果如下seat_no bd_time inx_sn exp_sn A,B,C,D 8:00,8:10,8:20,8:300 A A,B,C,D 8:00,8:10,8:20,8:301 B A,B,C,D 8:00,8:10,8:20,8:302 C A,B,C,D 8:00,8:10,8:20,8:303 D H,I,J,K 9:10,9:20,9:30,9:400 H H,I,J,K 9:10,9:20,9:30,9:401 I H,I,J,K 9:10,9:20,9:30,9:402 J H,I,J,K 9:10,9:20,9:30,9:403 K select seat_no, bd_time, inx_sn, exp_sn, inx_bt, exp_bt from tb1 lateral view posexplode(split(seat_no,',')) t as inx_sn,exp_sn lateral view posexplode(split(bd_time,',')) t as inx_bt,exp_bt;--两个字段均展开seat_no bd_time inx_sn exp_sn inx_bt exp_bt A,B,C,D 8:00,8:10,8:20,8:300 A 08:00A,B,C,D 8:00,8:10,8:20,8:300 A 18:10A,B,C,D 8:00,8:10,8:20,8:300 A 28:20A,B,C,D 8:00,8:10,8:20,8:300 A 38:30A,B,C,D 8:00,8:10,8:20,8:301 B 08:00A,B,C,D 8:00,8:10,8:20,8:301 B 18:10A,B,C,D 8:00,8:10,8:20,8:301 B 28:20A,B,C,D 8:00,8:10,8:20,8:301 B 38:30A,B,C,D 8:00,8:10,8:20,8:302 C 08:00A,B,C,D 8:00,8:10,8:20,8:302 C 18:10A,B,C,D 8:00,8:10,8:20,8:302 C 28:20A,B,C,D 8:00,8:10,8:20,8:302 C 38:30A,B,C,D 8:00,8:10,8:20,8:303 D 08:00A,B,C,D 8:00,8:10,8:20,8:303 D 18:10A,B,C,D 8:00,8:10,8:20,8:303 D 28:20A,B,C,D 8:00,8:10,8:20,8:303 D 38:30H,I,J,K 9:10,9:20,9:30,9:400 H 09:10H,I,J,K 9:10,9:20,9:30,9:400 H 19:20H,I,J,K 9:10,9:20,9:30,9:400 H 29:30H,I,J,K 9:10,9:20,9:30,9:400 H 39:40H,I,J,K 9:10,9:20,9:30,9:401 I 09:10H,I,J,K 9:10,9:20,9:30,9:401 I 19:20H,I,J,K 9:10,9:20,9:30,9:401 I 29:30H,I,J,K 9:10,9:20,9:30,9:401 I 39:40H,I,J,K 9:10,9:20,9:30,9:402 J 09:10H,I,J,K 9:10,9:20,9:30,9:402 J 19:20H,I,J,K 9:10,9:20,9:30,9:402 J 29:30H,I,J,K 9:10,9:20,9:30,9:402 J 39:40H,I,J,K 9:10,9:20,9:30,9:403 K 09:10H,I,J,K 9:10,9:20,9:30,9:403 K 19:20H,I,J,K 9:10,9:20,9:30,9:403 K 29:30H,I,J,K 9:10,9:20,9:30,9:403 K 39:40
从上述结果看,posexplode的索引的确省下了不少事,可以利用两个索引值相同进行过滤,可得到想要的结果
select seat_no, bd_time, exp_sn, exp_bt from tb1 lateral view posexplode(split(seat_no,',')) t as inx_sn,exp_sn lateral view posexplode(split(bd_time,',')) t as inx_bt,exp_bt where inx_sn = inx_bt;--结果如下seat_no bd_time exp_sn exp_bt A,B,C,D 8:00,8:10,8:20,8:30 A 8:00A,B,C,D 8:00,8:10,8:20,8:30 B 8:10A,B,C,D 8:00,8:10,8:20,8:30 C 8:20A,B,C,D 8:00,8:10,8:20,8:30 D 8:30H,I,J,K 9:10,9:20,9:30,9:40 H 9:10H,I,J,K 9:10,9:20,9:30,9:40 I 9:20H,I,J,K 9:10,9:20,9:30,9:40 J 9:30H,I,J,K 9:10,9:20,9:30,9:40 K 9:40
关于posexplode具体细节用法,可参考阿里云文档
https://help.aliyun.com/document_detail/293597.html#section-2yc-ymd-p11
拜了个拜