前言
本篇文章讲解的主要内容是:通过执行计划看开窗函数开窗语法rows\range between preceding and current row以及rows\range between unbounded preceding and unbounded following对移动范围的值进行聚集的原理以及区别、如何通过一个SQL打印九九乘法口表!!!
【SQL开发实战技巧】这一系列博主当作复习旧知识来进行写作,毕竟SQL开发在数据分析场景非常重要且基础,面试也会经常问SQL开发和调优经验,相信当我写完这一系列文章,也能再有所收获,未来面对SQL面试也能游刃有余~。
一、对移动范围的值进行聚集
现在有个需求:要求在员工明细表中显示之前90天(包含90)以内聘用人员的工资总和。
我以部门30为例,在这里给大家标量及分析函数两种方式的写法示例及解释。
SQL> select hiredate,
2 sal,
3 (select sum(sal)
4 from emp aa
5 where aa.hiredate <= a.hiredate
6 and aa.hiredate >= (a.hiredate - 90)
7 and aa.deptno = 30) as 标量求值,
8 '(' || to_char(hiredate - 90, 'yyyy-mm-dd') || '~' ||
9 to_char(hiredate, 'yyyy-mm-dd') || ')' as 时间范围,
10 sum(sal) over(order by hiredate range between 90 preceding and current row) as 分析函数求值,
11 (select listagg(sal, '+') within group(order by hiredate)
12 from emp s
13 where deptno = 30
14 and s.hiredate >= (a.hiredate - 90)
15 and s.hiredate <= a.hiredate) as 公式推导
16 from emp a
17 where a.deptno = 30;
HIREDATE SAL 标量求值 时间范围 分析函数求值 公式推导
----------- --------- ---------- -------------------------------------------------------------------------------- ---------- --------------------------------------------------------------------------------
1981-2-20 1600.00 1600 (1980-11-22~1981-02-20) 1600 1600
1981-2-22 1250.00 2850 (1980-11-24~1981-02-22) 2850 1600+1250
1981-5-1 2850.00 5700 (1981-01-31~1981-05-01) 5700 1600+1250+2850
1981-9-8 1500.00 1500 (1981-06-10~1981-09-08) 1500 1500
1981-9-28 1250.00 2750 (1981-06-30~1981-09-28) 2750 1500+1250
1981-12-3 950.00 3700 (1981-09-04~1981-12-03) 3700 1500+1250+950
6 rows selected
因用RANGE关键字表示要对相应的字段做加减运算,所以只有对日期与数值两类字段使用RANGE开窗。
对于日期RANGE开窗,默认单位是“天”,如果需求改为三个月内的数据呢?我们可以用INTERVAL来写明间隔单位:
SQL> select hiredate,
2 sal,
3 sum(sal) over(order by hiredate range between interval '3' month preceding and current row) as 分析函数求值
4 from emp a
5 where a.deptno = 30;
HIREDATE SAL 分析函数求值
----------- --------- ----------
1981-2-20 1600.00 1600
1981-2-22 1250.00 2850
1981-5-1 2850.00 5700
1981-9-8 1500.00 1500
1981-9-28 1250.00 2750
1981-12-3 950.00 3700
6 rows selected
如果按分钟开窗呢?
SQL> select aa.ADDRESS,
2 count(*) over(order by aa.LAST_ACTIVE_TIME range between interval '30' minute preceding and current row) as interval_cnt,
3 count(*) over(order by aa.LAST_ACTIVE_TIME range between 30 / 24 / 60 preceding and current row) as ceil_cnt
4 from v$sql aa
5 where aa.LAST_ACTIVE_TIME >= trunc(sysdate)
6 and rownum<=10;
ADDRESS INTERVAL_CNT CEIL_CNT
---------------- ------------ ----------
0000000EF7A8E3F0 1 1
0000000F1E29E158 1 1
0000000F18C1C570 2 2
0000000F1A5C9B20 1 1
0000000F1AB3DBB0 1 1
0000000F06D2B370 1 1
0000000F0D8CF5D0 1 1
0000000F06D2B370 1 1
0000000EFCE2BA78 1 1
0000000F0D8CF5D0 1 1
10 rows selected
用30 / 24 / 60
或interval '30' minute
都可以,当然后一种更直观。
二、详细分析常用分析函数开窗过程
本节汇总演示分析函数常见的几种用法及区别。
1、工资排序后取第一行到当前行范围内的最小值。
SQL> select ename,
2 sal,
3 /*因是按工资排序,所以这个语句返回的结果就是所有行的最小值*/
4 min(sal) over(order by sal) as min1,
5 /*上述语句默认参数如下,下面plan中可以看到*/
6 min(sal) over(order by sal range between unbounded preceding and current row) min2,
7 /*这种情况卡,rows与RANGE返回数据一样*/
8 min(sal) over(order by sal rows between unbounded preceding and current row) min3,
9 /*取所有行内最小值,可以与前面返回的值对比查看*/
10 min(sal) over() as min4,
11 /*如果明确写出上面min14的范围就是*/
12 min(sal) over(order by sal range between unbounded preceding and unbounded following) as min5,
13 /*这种情况下,rows与RANGE返回数据一样*/
14 min(sal) over(order by sal rows between unbounded preceding and unbounded following) as min6
15 from emp
16 where deptno = 30;
ENAME SAL MIN1 MIN2 MIN3 MIN4 MIN5 MIN6
---------- --------- ---------- ---------- ---------- ---------- ---------- ----------
JAMES 950.00 950 950 950 950 950 950
WARD 1250.00 950 950 950 950 950 950
MARTIN 1250.00 950 950 950 950 950 950
TURNER 1500.00 950 950 950 950 950 950
ALLEN 1600.00 950 950 950 950 950 950
BLAKE 2850.00 950 950 950 950 950 950
6 rows selected
SQL>
为了看到上面 min(sal) over(order by sal) as min1
在执行计划中到底是转译成了什么语句,这里专门改写成一个独立sql后,然后拿到的实际执行计划:
SQL> set linesize 200
set pagesize 2000
select * from table(dbms_xplan.display_cursor('&A',0,'advanced'));
SQL> SQL> Enter value for a: 5pffdrwf22rk8
old 1: select * from table(dbms_xplan.display_cursor('&A',0,'advanced'))
new 1: select * from table(dbms_xplan.display_cursor('5pffdrwf22rk8',0,'advanced'))
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID 5pffdrwf22rk8, child number 0
-------------------------------------
select /*12345*/ ename, sal,
/*因按工资排序,所以这个语句与上面sal返回的值一样*/ min(sal) over(order by sal) as min1
from emp where deptno = :"SYS_B_0"
Plan hash value: 3145491563
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 4 (100)| |
| 1 | WINDOW SORT | | 6 | 78 | 4 (25)| 00:00:01 |
|* 2 | TABLE ACCESS FULL| EMP | 6 | 78 | 3 (0)| 00:00:01 |
---------------------------------------------------------------------------
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
1 - SEL$1
2 - SEL$1 / EMP@SEL$1
Outline Data
-------------
/*+
BEGIN_OUTLINE_DATA
IGNORE_OPTIM_EMBEDDED_HINTS
OPTIMIZER_FEATURES_ENABLE('12.1.0.2')
DB_VERSION('12.1.0.2')
OPT_PARAM('_b_tree_bitmap_plans' 'false')
OPT_PARAM('_bloom_filter_enabled' 'false')
OPT_PARAM('_optimizer_extended_cursor_sharing' 'none')
OPT_PARAM('_gby_hash_aggregation_enabled' 'false')
OPT_PARAM('_optimizer_extended_cursor_sharing_rel' 'none')
OPT_PARAM('_optimizer_adaptive_cursor_sharing' 'false')
OPT_PARAM('_optimizer_use_feedback' 'false')
OPT_PARAM('_optimizer_unnest_scalar_sq' 'false')
OPT_PARAM('_px_adaptive_dist_method' 'off')
OPT_PARAM('_optimizer_dsdir_usage_control' 0)
OPT_PARAM('_optimizer_adaptive_plans' 'false')
OPT_PARAM('_optimizer_strans_adaptive_pruning' 'false')
OPT_PARAM('_optimizer_null_accepting_semijoin' 'false')
OPT_PARAM('_optimizer_gather_feedback' 'false')
OPT_PARAM('_optimizer_aggr_groupby_elim' 'false')
OPT_PARAM('_optimizer_reduce_groupby_key' 'false')
OPT_PARAM('_optimizer_nlj_hj_adaptive_join' 'false')
OPT_PARAM('_fix_control' '8611462:0 14826303:0')
ALL_ROWS
OUTLINE_LEAF(@"SEL$1")
FULL(@"SEL$1" "EMP"@"SEL$1")
END_OUTLINE_DATA
*/
Peeked Binds (identified by position):
--------------------------------------
1 - :SYS_B_0 (NUMBER): 30
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("DEPTNO"=:SYS_B_0)
Column Projection Information (identified by operation id):
-----------------------------------------------------------
1 - (#keys=1) "SAL"[NUMBER,22], "ENAME"[VARCHAR2,10],
"DEPTNO"[NUMBER,22], MIN("SAL") OVER ( ORDER BY "SAL" RANGE BETWEEN
UNBOUNDED PRECEDING AND CURRENT ROW )[22]
2 - "ENAME"[VARCHAR2,10], "SAL"[NUMBER,22], "DEPTNO"[NUMBER,22]
72 rows selected.
从上面执行计划能够看到, min(sal) over(order by sal)
实际改写成了
MIN("SAL") OVER ( ORDER BY "SAL" RANGE BETWEEN
UNBOUNDED PRECEDING AND CURRENT ROW )[22]
2、工资排序后取第一行到当前行范围内的最大值。
SQL> select ename,
2 sal,
3 /*因按工资排序,所以这个语句与上面sal返回的值一样*/
4 max(sal) over(order by sal) as max1,
5 /*上述语句默认参数如下,plan中可以看到*/
6 MAX(sal) over(order by sal range between unbounded preceding and current row) MAX2,
7 /*这种情况卡,rows与RANGE返回数据一样*/
8 MAX(sal) over(order by sal rows between unbounded preceding and current row) MAX3,
9 /*取所有行的内最大值,可以与前面返回的值对比查看*/
10 MAX(sal) over() as MAX4,
11 /*如果明确写出上面MAX4的范围就是*/
12 MAX(sal) over(order by sal range between unbounded preceding and unbounded following) as MAX5,
13 /*这种情况下,rows与RANGE返回数据一样*/
14 MAX(sal) over(order by sal rows between unbounded preceding and unbounded following) as MAX6
15 from emp
16 where deptno = 30;
ENAME SAL MAX1 MAX2 MAX3 MAX4 MAX5 MAX6
---------- --------- ---------- ---------- ---------- ---------- ---------- ----------
JAMES 950.00 950 950 950 2850 2850 2850
WARD 1250.00 1250 1250 1250 2850 2850 2850
MARTIN 1250.00 1250 1250 1250 2850 2850 2850
TURNER 1500.00 1500 1500 1500 2850 2850 2850
ALLEN 1600.00 1600 1600 1600 2850 2850 2850
BLAKE 2850.00 2850 2850 2850 2850 2850 2850
6 rows selected
为了看到上面 max(sal) over(order by sal) as max1
在执行计划中到底是转译成了什么语句,这里专门改写成一个独立sql后,然后拿到的实际执行计划:
select * from table(dbms_xplan.display_cursor('&A',0,'advanced'));
SQL> SQL> Enter value for a: 5snuncy0dcrhf
old 1: select * from table(dbms_xplan.display_cursor('&A',0,'advanced'))
new 1: select * from table(dbms_xplan.display_cursor('5snuncy0dcrhf',0,'advanced'))
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID 5snuncy0dcrhf, child number 0
-------------------------------------
select /*12345*/ ename, sal,
/*因按工资排序,所以这个语句与上面sal返回的值一样*/ max(sal) over(order by sal) as
max1 from emp where deptno = :"SYS_B_0"
Plan hash value: 3145491563
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 4 (100)| |
| 1 | WINDOW SORT | | 6 | 78 | 4 (25)| 00:00:01 |
|* 2 | TABLE ACCESS FULL| EMP | 6 | 78 | 3 (0)| 00:00:01 |
---------------------------------------------------------------------------
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
1 - SEL$1
2 - SEL$1 / EMP@SEL$1
Outline Data
-------------
/*+
BEGIN_OUTLINE_DATA
IGNORE_OPTIM_EMBEDDED_HINTS
OPTIMIZER_FEATURES_ENABLE('12.1.0.2')
DB_VERSION('12.1.0.2')
OPT_PARAM('_b_tree_bitmap_plans' 'false')
OPT_PARAM('_bloom_filter_enabled' 'false')
OPT_PARAM('_optimizer_extended_cursor_sharing' 'none')
OPT_PARAM('_gby_hash_aggregation_enabled' 'false')
OPT_PARAM('_optimizer_extended_cursor_sharing_rel' 'none')
OPT_PARAM('_optimizer_adaptive_cursor_sharing' 'false')
OPT_PARAM('_optimizer_use_feedback' 'false')
OPT_PARAM('_optimizer_unnest_scalar_sq' 'false')
OPT_PARAM('_px_adaptive_dist_method' 'off')
OPT_PARAM('_optimizer_dsdir_usage_control' 0)
OPT_PARAM('_optimizer_adaptive_plans' 'false')
OPT_PARAM('_optimizer_strans_adaptive_pruning' 'false')
OPT_PARAM('_optimizer_null_accepting_semijoin' 'false')
OPT_PARAM('_optimizer_gather_feedback' 'false')
OPT_PARAM('_optimizer_aggr_groupby_elim' 'false')
OPT_PARAM('_optimizer_reduce_groupby_key' 'false')
OPT_PARAM('_optimizer_nlj_hj_adaptive_join' 'false')
OPT_PARAM('_fix_control' '8611462:0 14826303:0')
ALL_ROWS
OUTLINE_LEAF(@"SEL$1")
FULL(@"SEL$1" "EMP"@"SEL$1")
END_OUTLINE_DATA
*/
Peeked Binds (identified by position):
--------------------------------------
1 - :SYS_B_0 (NUMBER): 30
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("DEPTNO"=:SYS_B_0)
Column Projection Information (identified by operation id):
-----------------------------------------------------------
1 - (#keys=1) "SAL"[NUMBER,22], "ENAME"[VARCHAR2,10],
"DEPTNO"[NUMBER,22], MAX("SAL") OVER ( ORDER BY "SAL" RANGE BETWEEN
UNBOUNDED PRECEDING AND CURRENT ROW )[22]
2 - "ENAME"[VARCHAR2,10], "SAL"[NUMBER,22], "DEPTNO"[NUMBER,22]
72 rows selected.
从上面执行计划能够看到,max(sal) over(order by sal)
实际改写成了
MAX("SAL") OVER ( ORDER BY "SAL" RANGE BETWEEN
UNBOUNDED PRECEDING AND CURRENT ROW )[22]
3、工资排序后取第一行到当前行范围内的工资和,这里要注意区别。
SQL> select ename,
2 sal,
3 /*累加工资,要注意工资重复时的现象*/
4 sum(sal) over(order by sal) as sum1,
5 /*上述语句默认参数如下,在PLAN中可以看到*/
6 sum(sal) over(order by sal range between unbounded preceding and current row) as sum2,
7 /*这种情况卡,rows与RANGE返回数据不一样,见第二行*/
8 sum(sal) over(order by sal rows between unbounded preceding and current row) as sum3,
9 /*工资合计*/
10 sum(sal) over() as sum4,
11 /*如果明确写出上面sum4的范围就是*/
12 sum(sal) over(order by sal range between unbounded preceding and unbounded following) as sum5,
13 /*这种情况下,rows与RANGE返回数据一样*/
14 sum(sal) over(order by sal rows between unbounded preceding and unbounded following) as sum6
15 from emp
16 where deptno = 30;
ENAME SAL SUM1 SUM2 SUM3 SUM4 SUM5 SUM6
---------- --------- ---------- ---------- ---------- ---------- ---------- ----------
JAMES 950.00 950 950 950 9400 9400 9400
WARD 1250.00 3450 3450 2200 9400 9400 9400
MARTIN 1250.00 3450 3450 3450 9400 9400 9400
TURNER 1500.00 4950 4950 4950 9400 9400 9400
ALLEN 1600.00 6550 6550 6550 9400 9400 9400
BLAKE 2850.00 9400 9400 9400 9400 9400 9400
6 rows selected
因为使用关键字"RANGE"时,第二行"SUM1"、"SUM2"对应的条件是"<=1250",
而1250有两个,所以会计算两次,产生结果为:950+1250+1250=3450。而"SUM3"不同,它只计算到当前行,所以结果是950+1250=2200。
4、前后都有限定条件。
SQL> select ename,
2 sal,
3 /*当前行(+-1500)范围内的最大值*/
4 max(sal) over(order by sal range between 500 preceding and 500 following) as max1,
5 /*前后各一行,共三行中的最大值*/
6 sum(sal) over(order by sal rows between 1 preceding and 1 following) as sum3
7 from emp
8 where deptno = 30;
ENAME SAL MAX1 SUM3
---------- --------- ---------- ----------
JAMES 950.00 1250 2200
WARD 1250.00 1600 3450
MARTIN 1250.00 1600 4000
TURNER 1500.00 1600 4350
ALLEN 1600.00 1600 5950
BLAKE 2850.00 2850 4450
6 rows selected
三、使用Listagg一个SQL打印九九乘法表
现在有个需求:要求你用一条SQL打印一个九九乘法口诀!!
是不是有点懵逼啊,在其他语言中,这也是一个经常见的面试笔试题,那么用SQL怎么实现呢?
下面可以用listagg的分类汇总功能来实现!
首先要生成基础数据:
SQL>
SQL> with t as (
2 select level as lv from dual connect by level<=9
3 )
4 select * from t;
LV
----------
1
2
3
4
5
6
7
8
9
9 rows selected
然后根据这个数据做一下枚举:
SQL> with t as (
2 select level as lv from dual connect by level<=9
3 ),
4 t1 as (
5 select a.lv la,b.lv lb,b.lv||'*'||a.lv||'='||rpad(a.lv*b.lv,2,' ') as txt
6 from t a,t b
7 where b.lv<=a.lv)
8 select * from t1;
LA LB TXT
---------- ---------- --------------------------------------------------------------------------------
1 1 1*1=1
2 1 1*2=2
2 2 2*2=4
3 1 1*3=3
3 2 2*3=6
3 3 3*3=9
4 1 1*4=4
4 2 2*4=8
4 3 3*4=12
4 4 4*4=16
5 1 1*5=5
5 2 2*5=10
5 3 3*5=15
5 4 4*5=20
5 5 5*5=25
6 1 1*6=6
6 2 2*6=12
6 3 3*6=18
6 4 4*6=24
6 5 5*6=30
6 6 6*6=36
7 1 1*7=7
7 2 2*7=14
7 3 3*7=21
7 4 4*7=28
7 5 5*7=35
7 6 6*7=42
7 7 7*7=49
8 1 1*8=8
8 2 2*8=16
8 3 3*8=24
8 4 4*8=32
8 5 5*8=40
8 6 6*8=48
8 7 7*8=56
8 8 8*8=64
9 1 1*9=9
9 2 2*9=18
9 3 3*9=27
9 4 4*9=36
9 5 5*9=45
9 6 6*9=54
9 7 7*9=63
9 8 8*9=72
9 9 9*9=81
45 rows selected
SQL>
最后用listagg分类汇总,把LA相同的数据合并为一行:
SQL> with t as (
2 select level as lv from dual connect by level<=9
3 ),
4 t1 as (
5 select a.lv la,b.lv lb,b.lv||'*'||a.lv||'='||rpad(a.lv*b.lv,2,' ') as txt
6 from t a,t b
7 where b.lv<=a.lv)
8 select listagg(txt,' ')within group(order by t1.la) as 九九乘法口诀
9 from t1
10 group by la;
九九乘法口诀
--------------------------------------------------------------------------------
1*1=1
1*2=2 2*2=4
1*3=3 2*3=6 3*3=9
1*4=4 2*4=8 3*4=12 4*4=16
1*5=5 2*5=10 3*5=15 4*5=20 5*5=25
1*6=6 2*6=12 3*6=18 4*6=24 5*6=30 6*6=36
1*7=7 2*7=14 3*7=21 4*7=28 5*7=35 6*7=42 7*7=49
1*8=8 2*8=16 3*8=24 4*8=32 5*8=40 6*8=48 7*8=56 8*8=64
1*9=9 2*9=18 3*9=27 4*9=36 5*9=45 6*9=54 7*9=63 8*9=72 9*9=81
9 rows selected
这就达到了我们的目的,可以看到,其实实现起来并不复杂,就是需要思考一下逻辑。
总结
本篇文章讲解的主要内容是:通过执行计划看开窗函数开窗语法rows\range between preceding and current row以及rows\range between unbounded preceding and unbounded following对移动范围的值进行聚集的原理以及区别】、如何通过一个SQL打印九九乘法口表!!!