前言
本篇文章讲解的主要内容是:目前Oracle支持的行列互换有两种方式:case when、pivot\unpivot,我将通过几个案例来给大家详解如何通过这两种方式实现“行转列”,“列转行”的需求,并通过执行计划看case when、pivot\unpivot二者的底层逻辑关系以及效率上的影响。
【SQL开发实战技巧】这一系列博主当作复习旧知识来进行写作,毕竟SQL开发在数据分析场景非常重要且基础,面试也会经常问SQL开发和调优经验,相信当我写完这一系列文章,也能再有所收获,未来面对SQL面试也能游刃有余~。
一、行转列
"行转列"的这个写法,在做报表或语句改写时要经常用到,是一个非常重要的语句。
在Oracle中,有CASE WHEN END
和Oracle11g新增的pivot函数两种方式。其中:
- CASE WHEN END编写和维护较麻烦,但适合的场景较多。
- PIVOT编写和维护简单,但有较大限制。
下面简单介绍这两种方法。
现在我有这么一个需求:
对emp表按job分组汇总,每个部门显示为一列。首先看一下最常用的CASE WHEN END
的方法。
根据不同的条件来取值,就可以把数据分为几列。
SQL> select a.job,
2 case when a.deptno= 10 then sal end as d10,
3 case when a.deptno= 20 then sal end as d20,
4 case when a.deptno= 30 then sal end as d30
5 from emp a
6 order by 1;
JOB D10 D20 D30
--------- ---------- ---------- ----------
ANALYST 3000
CLERK 800
CLERK 950
CLERK 1100
MANAGER 2450
MANAGER 2975
MANAGER 2850
MGR
PRESIDENT 5000
SALESMAN 1250
SALESMAN 1600
SALESMAN 1250
SALESMAN 1500
sdf 1300
17 rows selected
只是这样的数据看上去杂乱无章,需要再按job分组汇总,所以一般"行转列"语句里都会有聚集函数,就是为了把同类数据转为一行显示。
另外,要注意最后一列,我们增加了合计工资的显示,这在后面介绍的PIVOT函数中是做不到的,PIVOT 函数只能按同一个规则分类各数据,各列之间的数据不能交叉重复。
SQL> select a.job,
2 sum(case when a.deptno= 10 then sal end) as d10,
3 sum(case when a.deptno= 20 then sal end) as d20,
4 sum(case when a.deptno= 30 then sal end) as d30,
5 sum(sal)as sm
6 from emp a
7 group by job
8 order by 1;
JOB D10 D20 D30 SM
--------- ---------- ---------- ---------- ----------
ANALYST 3000 3000
CLERK 1900 950 2850
MANAGER 2450 2975 2850 8275
MGR
PRESIDENT 5000 5000
SALESMAN 5600 5600
sdf 1300 1300
8 rows selected
下面看一下Oracle11g新增的"行转列"函数PIVOT,对简单的PIVOT环境提供了简单的实现方法。
SQL>
SQL> select * from (
2 --先查询出来要进行操作的数据
3 select job,sal,deptno from emp
4 )pivot(
5 sum(sal) as sm /*这里写聚集函数以及起别名,SUM、MAX等聚集函数+列别名,若不设置别名,则后面生成的列名字默认只使用后面in里设的别名,否则两个别名相加*/
6 for deptno in(
7 10 as d10, /*这里等价于前面的查询:sum(case when deptno =10 then sal end) as d10,列名称为d10_sm,这个sm是前面聚集函数的别名*/
8 20 as d20,/*这里等价于前面的查询:sum(case when deptno =20 then sal end) as d20*/
9 30 as d30/*这里等价于前面的查询:sum(case when deptno =30 then sal end) as d30*/
10 )
11 )
12 order by 1;
JOB D10_SM D20_SM D30_SM
--------- ---------- ---------- ----------
ANALYST 3000
CLERK 1900 950
MANAGER 2450 2975 2850
MGR
PRESIDENT 5000
SALESMAN 5600
sdf 1300
8 rows selected
大家可以看一下两种方式的对比,如果还要增加提成的返回,用PIVOT则只需要增加一个设定即可。
SQL> select * from (
2 --先查询出来要进行操作的数据
3 select job,sal,comm,deptno from emp
4 )pivot(
5 sum(sal) as sm,/*这里写聚集函数以及起别名,SUM、MAX等聚集函数+列别名,若不设置别名,则后面生成的列名字默认只使用后面in里设的别名,否则两个别名相加*/
6 sum(comm)as sc /*这里写聚集函数以及起别名,SUM、MAX等聚集函数+列别名,若不设置别名,则后面生成的列名字默认只使用后面in里设的别名,否则两个别名相加*/
7 for deptno in(
8 10 as d10, /*这里等价于前面的查询:sum(case when deptno =10 then sal end) as d10,sum(case when deptno =10 then comm end) as d10,列名称为d10_sc,这个sm是前面聚集函数的别名*/
9 20 as d20,/*这里等价于前面的查询:sum(case when deptno =20 then sal end) as d20*/
10 30 as d30/*这里等价于前面的查询:sum(case when deptno =30 then sal end) as d30*/
11 )
12 )
13 order by 1;
JOB D10_SM D10_SC D20_SM D20_SC D30_SM D30_SC
--------- ---------- ---------- ---------- ---------- ---------- ----------
ANALYST 3000
CLERK 1900 950
MANAGER 2450 2975 2850
MGR
PRESIDENT 5000
SALESMAN 5600 2200
sdf 1300
8 rows selected
如果用CASE WHEN要增加三行语句。
SQL> select a.job,
2 sum(case when a.deptno= 10 then sal end) as d10_sm,
3 sum(case when a.deptno= 10 then sal end) as d10_sc,
4 sum(case when a.deptno= 20 then sal end) as d20_sm,
5 sum(case when a.deptno= 20 then sal end) as d20_sc,
6 sum(case when a.deptno= 30 then sal end) as d30_sm,
7 sum(case when a.deptno= 30 then sal end) as d30_sc
8 from emp a
9 group by job
10 order by 1;
JOB D10_SM D10_SC D20_SM D20_SC D30_SM D30_SC
--------- ---------- ---------- ---------- ---------- ---------- ----------
ANALYST 3000 3000
CLERK 1900 1900 950 950
MANAGER 2450 2450 2975 2975 2850 2850
MGR
PRESIDENT 5000 5000
SALESMAN 5600 5600
sdf 1300 1300
8 rows selected
PIVOT一次只能按一个条件来完成"行转列",如果同时把工作与部门都转为列,并汇总为一行时,PIVOT就无能为力了,这时只能用CASE WHEN。
SQL> select
2 case when deptno=10 then ename end as d10,
3 case when deptno=20 then ename end as d20,
4 case when deptno=30 then ename end as d30,
5 case when job='ANALYST' then ename end as ANALYST,
6 case when job='CLERK' then ename end as CLERK,
7 case when job='MANAGER' then ename end as MANAGER,
8 case when job='MGR' then ename end as MGR,
9 case when job='PRESIDENT' then ename end as PRESIDENT,
10 case when job='SALESMAN' then ename end as SALESMAN
11 from emp;
D10 D20 D30 ANALYST CLERK MANAGER MGR PRESIDENT SALESMAN
---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
zhaoyd zhaoyd
er er
ALLEN ALLEN
WARD WARD
JONES JONES
MARTIN MARTIN
BLAKE BLAKE
CLARK CLARK
KING KING
TURNER TURNER
ADAMS ADAMS
JAMES JAMES
FORD FORD
sdf
17 rows selected
SQL>
最后分析一下PIVOT的PLAN(用dbms_xplan.display_cursor
看):
select/*zydtest*/ * from (
--先查询出来要进行操作的数据
select job,sal,comm,deptno from emp
)pivot(
sum(sal) as sm /*这里写聚集函数以及起别名,SUM、MAX等聚集函数+列别名,若不设置别名,则后面生成的列名字默认只使用后面in里设的别名,否则两个别名相加*/
for deptno in(
10 as d10, /*这里等价于前面的查询:sum(case when deptno =10 then sal end) as d10,sum(case when deptno =10 then comm end) as d10,列名称为d10_sc,这个sm是前面聚集函数的别名*/
20 as d20,/*这里等价于前面的查询:sum(case when deptno =20 then sal end) as d20*/
30 as d30 /*这里等价于前面的查询:sum(case when deptno =30 then sal end) as d30*/
)
)
order by 1;
select * from v$sql aa where aa.SQL_TEXT like '%zydtest%';
SQL> select * from table(dbms_xplan.display_cursor('5ss3y129x7p0a',0,'advanced'));
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
SQL_ID 5ss3y129x7p0a, child number 0
-------------------------------------
select/*zydtest*/ * from ( --�������������������������� select
job,sal,comm,deptno from emp )pivot( sum(sal) as sm
/*��������������������������SUM��MAX������������������,������������,����
����������������������������in����������,����������������*/ for deptno
in( 10 as d10, /*����������������������sum(case when deptno =10 then
sal end) as d10,sum(case when deptno =10 then comm end) as
d10,��������d10_sc������sm��������������������*/ 20 as
d20,/*����������������������sum(case when deptno =20 then sal end) as
d20*/ 30 as d30 /*��������
Plan hash value: 1018027214
----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 4 (100)| |
| 1 | SORT GROUP BY PIVOT| | 19 | 285 | 4 (25)| 00:00:01 |
| 2 | TABLE ACCESS FULL | EMP | 19 | 285 | 3 (0)| 00:00:01 |
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
----------------------------------------------------------------------------
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
1 - SEL$117FC0EF
2 - SEL$117FC0EF / EMP@SEL$2
Outline Data
-------------
/*+
BEGIN_OUTLINE_DATA
IGNORE_OPTIM_EMBEDDED_HINTS
OPTIMIZER_FEATURES_ENABLE('11.2.0.4')
DB_VERSION('11.2.0.4')
ALL_ROWS
OUTLINE_LEAF(@"SEL$117FC0EF")
MERGE(@"SEL$F5BB74E1")
OUTLINE(@"SEL$3")
OUTLINE(@"SEL$F5BB74E1")
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
MERGE(@"SEL$2")
OUTLINE(@"SEL$1")
OUTLINE(@"SEL$2")
FULL(@"SEL$117FC0EF" "EMP"@"SEL$2")
END_OUTLINE_DATA
*/
Column Projection Information (identified by operation id):
-----------------------------------------------------------
1 - (#keys=2) "JOB"[VARCHAR2,9], "COMM"[NUMBER,22], SUM(CASE WHEN
("DEPTNO"=10) THEN "SAL" END )[22], SUM(CASE WHEN ("DEPTNO"=20) THEN
"SAL" END )[22], SUM(CASE WHEN ("DEPTNO"=30) THEN "SAL" END )[22]
2 - "JOB"[VARCHAR2,9], "SAL"[NUMBER,22], "COMM"[NUMBER,22],
"DEPTNO"[NUMBER,22]
57 rows selected
通过上面PLAN可以看到,PIVOT被转换成了如下语句:
SUM(CASE WHEN
("DEPTNO"=10) THEN "SAL" END )[22], SUM(CASE WHEN ("DEPTNO"=20) THEN
"SAL" END )[22], SUM(CASE WHEN ("DEPTNO"=30) THEN "SAL" END )[22]
也就是说,PIVOT只是写法简单了一些,实际上仍用的是CASE WHEN语句。
二、列转行
测试数据如下:
drop table test purge;
create table test as
select *
from (select deptno, sal from emp)
pivot(count(*) as ct, sum(sal) as s
for deptno in(10 as d10, 20 as d20, 30 as d30));
SQL> select * from test;
D10_CT D10_S D20_CT D20_S D30_CT D30_S
---------- ---------- ---------- ---------- ---------- ----------
3 8750 5 7875 6 9400
SQL>
要求把三个部门的"人次"转为一列显示。以前这种需求一直用UNION ALL来写:
SQL> SELECT'10'AS 部门编码,d10_ct AS 人次 FROM test UNION ALL
2 SELECT'20'AS 部门编码,d20_ct AS 人次 FROM test UNION ALL
3 SELECT'30'AS 部门编码,d30_ct AS 人次 FROM test
4 ;
部门编码 人次
---------- ----------
10 3
20 5
30 6
这时PLAN如下:
Plan Hash Value : 2176849128
---------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost | Time |
---------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3 | 39 | 9 | 00:00:01 |
| 1 | UNION-ALL | | | | | |
| 2 | TABLE ACCESS FULL | TEST | 1 | 13 | 3 | 00:00:01 |
| 3 | TABLE ACCESS FULL | TEST | 1 | 13 | 3 | 00:00:01 |
| 4 | TABLE ACCESS FULL | TEST | 1 | 13 | 3 | 00:00:01 |
---------------------------------------------------------------------
Note
-----
- dynamic sampling used for this statement
需要扫描test三次,而且如果列数较多,这种查询编写与维护都比较麻烦,而用UNPIVOT就不一样了。
SQL>
SQL> select * from test unpivot(
2 ct for deptno in(d10_ct,d20_ct,d30_ct)
3 );
D10_S D20_S D30_S DEPTNO CT
---------- ---------- ---------- ------ ----------
8750 7875 9400 D10_CT 3
8750 7875 9400 D20_CT 5
8750 7875 9400 D30_CT 6
SQL>
UNPIYOT函数生成两个新列:"deptno"与"ct"。
而in()中的D10_CT、D20_CT和D30_CT三列,其列名成为行"deptno"的值,原来D10_CT等列中的值分别转为"ct"列中的三行:
那怎么恢复成原来deptno号的样子?只要将语句整理如下:
SQL> select deptno AS lm,substr(deptno,-5, 2)AS deptno,ct
2 from test unpivot(
3 ct for deptno in(d10_ct,d20_ct,d30_ct)
4 );
LM DEPTNO CT
------ -------- ----------
D10_CT 10 3
D20_CT 20 5
D30_CT 30 6
这时PLAN如下:
Plan Hash Value : 734873962
-----------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost | Time |
-----------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3 | 63 | 9 | 00:00:01 |
| * 1 | VIEW | | 3 | 63 | 9 | 00:00:01 |
| 2 | UNPIVOT | | | | | |
| 3 | TABLE ACCESS FULL | TEST | 1 | 39 | 3 | 00:00:01 |
-----------------------------------------------------------------------
Predicate Information (identified by operation id):
------------------------------------------
* 1 - filter("unpivot_view_006"."CT" IS NOT NULL)
Note
-----
- dynamic sampling used for this statement
可以看到,与PIVOT不一样,UNPIVOT不仅语句简略,而且只需要扫描test一次。我们可以很容易地在后面的unpivot列表里维护要转换的列。当然,UNPIVOT同样有限制。
如果现在有这么一个需求:还是用前面的表test:
SQL> select * from test;
D10_CT D10_S D20_CT D20_S D30_CT D30_S
---------- ---------- ---------- ---------- ---------- ----------
3 8750 5 7875 6 9400
我们要求将人次和sal聚合的结果恢复成按照部门统计的行列表,也就是下面这样的:
DEPTNO RC SM
-------- ---------- ----------
10 3 8750
20 5 7875
30 6 9400
那我们该怎么实现?
要记得如果同时有人次与工资合计要转换,unpivot就不能一次性完成,只有分别转换后再用JOIN连接。
select *
from (select substr(deptno, 1, 3) as lm,
substr(deptno, -5, 2) AS deptno,
rc
from (select *
from test unpivot include nulls(rc for deptno in(d10_ct, d20_ct, d30_ct)))) a
inner join (select substr(deptno, 1, 3) as lm,
substr(deptno, -4, 2) AS deptno,
sm
from (select *
from test unpivot include nulls(sm for deptno in(d10_s,
d20_s,
d30_s)))) b
on a.lm = b.lm
LM DEPTNO RC LM DEPTNO SM
------------ -------- ---------- ------------ -------- ----------
D10 10 3 D10 10 8750
D20 20 5 D20 20 7875
D30 30 6 D30 30 9400
上面的结果只要去掉几个列名就实现了前面的要求。这里为了让两个结果集一致,使用了参数include nulls
这样即使数据为空,也显示一行。
是否有办法只用UNPIVOT,而不用JOIN呢?看下面的示例:
SQL> select *
2 from test
3 unpivot include nulls(rc for deptno in(d10_ct as 10, d20_ct as 20, d30_ct as 30))
4 unpivot include nulls(sm for deptno2 in(d10_s as 10, d20_s as 20, d30_s as 30));
DEPTNO RC DEPTNO2 SM
---------- ---------- ---------- ----------
10 3 10 8750
10 3 20 7875
10 3 30 9400
20 5 10 8750
20 5 20 7875
20 5 30 9400
30 6 10 8750
30 6 20 7875
30 6 30 9400
9 rows selected
可以看到,当有两个UNPJVOT时,生成的结果是一个笛卡儿积。
上面的语句实际上就是一个嵌套语句,前一个UNPIVOT结果出来后,再执行另一个
SQL> with t as (
2 select *
3 from test
4 unpivot include nulls(rc for deptno in(d10_ct as 10, d20_ct as 20, d30_ct as 30))
5 )
6 select * from t unpivot include nulls(sm for deptno2 in(d10_s as 10, d20_s as 20, d30_s as 30));
DEPTNO RC DEPTNO2 SM
---------- ---------- ---------- ----------
10 3 10 8750
10 3 20 7875
10 3 30 9400
20 5 10 8750
20 5 20 7875
20 5 30 9400
30 6 10 8750
30 6 20 7875
30 6 30 9400
9 rows selected
回来继续说,那上面这样的数据就不能用了吗?必须用join吗?
不不不,既然他笛卡尔积了,其实针对需要的数据,在上面的查询上加一个过滤即可。
SQL> select *
2 from test
3 unpivot include nulls(rc for deptno in(d10_ct as 10, d20_ct as 20, d30_ct as 30))
4 unpivot include nulls(sm for deptno2 in(d10_s as 10, d20_s as 20, d30_s as 30))
5 where deptno=deptno2;
DEPTNO RC DEPTNO2 SM
---------- ---------- ---------- ----------
10 3 10 8750
20 5 20 7875
30 6 30 9400
SQL>
三、将结果集反向转置为一列
有时会要求数据竖向显示,如CLARK的数据显示如下(各行之间用空格隔开):
CLARK
MANAGER
2450
我们使用刚学到的UNPIVOT,再加一点小技巧就可以。
select emps from (
select ename, job, to_char(sal) as sal, null as t_col
from emp
where deptno = 10)
unpivot include nulls(emps for aa in(ename,job,sal,t_col));
EMPS
------
CLARK
MANAGER
2450
KING
PRESIDENT
5000
sdf
sdf
1300
这里要注意以下两点。
- 与UNION ALL一样,要合并的几列数据类型必须相同,如果sal不用to_char转换,就会报错:
SQL>
SQL> select emps from (
2 select ename, job, sal as sal, null as t_col
3 from emp
4 where deptno = 10)
5 unpivot include nulls(emps for aa in(ename,job,sal,t_col));
select emps from (
select ename, job, sal as sal, null as t_col
from emp
where deptno = 10)
unpivot include nulls(emps for aa in(ename,job,sal,t_col))
ORA-01790: expression must have same datatype as corresponding expression
- 如果不加include nu11s,将不会显示空行:
select emps from (
select ename, job, to_Char(sal) as sal, null as t_col
from emp
where deptno = 10)
unpivot (emps for aa in(ename,job,sal,t_col));
EMPS
------
CLARK
MANAGER
2450
KING
PRESIDENT
5000
sdf
sdf
1300
总结
本篇文章讲解的主要内容是:目前Oracle支持的行列互换有两种方式:case when、pivot\unpivot,我将通过几个案例来给大家详解如何通过这两种方式实现“行转列”,“列转行”的需求,并通过执行计划看case when、pivot\unpivot二者的底层逻辑关系以及效率上的影响。