前言
本篇文章讲解的主要内容是:通过模拟计算消费流水账及计算银行流水累计和讲解sum()over()函数使用场景、通过计算各部门工资排名前三位的员工小案例来介绍ROW_NUMBER、RANK、DENSE_RANK使用方法及区别
【SQL开发实战技巧】这一系列博主当作复习旧知识来进行写作,毕竟SQL开发在数据分析场景非常重要且基础,面试也会经常问SQL开发和调优经验,相信当我写完这一系列文章,也能再有所收获,未来面对SQL面试也能游刃有余~。
一、模拟计算消费流水账
首先生成样例数据:
SQL> with t as (
2 select 1000 as 编号,'余额' as 项目,30000 as 金额 from dual
3 union all
4 select empno,to_char(rownum),sal from emp where deptno=10
5 )
6 select * from t;
编号 项目 金额
---------- ---------------------------------------- ----------
1000 余额 30000
7782 1 2450
7839 2 5000
7934 3 1300
这是模拟的一个消费流水账,账户余额初始是30000,后面是三次消费的数据,现在有个需求:得到每笔费用消费后的余额。
如何做这个需求呢?
这里给出解决方案,如下:
SQL> with t as (
2 select 1000 as 编号,'余额' as 项目,30000 as 金额 from dual
3 union all
4 select empno,to_char(rownum),sal from emp where deptno=10
5 )
6 select t.*,sum(case when 项目='余额' then 金额 else -金额 end )over(order by 编号) as 余额
7 from t;
编号 项目 金额 余额
---------- ---------------------------------------- ---------- ----------
1000 余额 30000 30000
7782 1 2450 27550
7839 2 5000 22550
7934 3 1300 21250
SQL>
二、计算银行流水累计和
有如下临时表:
SQL> with t as
2 (select 1 as id, 100 as amt, 'PR' as trx
3 from dual
4 union all
5 select 2 as id, 100 as amt, 'PR' as trx
6 from dual
7 union all
8 select 3 as id, 50 as amt, 'PY' as trx
9 from dual
10 union all
11 select 4 as id, 100 as amt, 'PR' as trx
12 from dual
13 union all
14 select 5 as id, 200 as amt, 'PY' as trx
15 from dual
16 union all
17 select 6 as id, 50 as amt, 'PY' as trx
18 from dual)
19 select * from t;
ID AMT TRX
---------- ---------- ---
1 100 PR
2 100 PR
3 50 PY
4 100 PR
5 200 PY
6 50 PY
6 rows selected
这是一个存取/款列表。
- id是唯一列
- amt列表示每次事务处理(取款或存款)涉及的金额
- trx列定义了事务处理的类型,取款是"PY",存款是"PR"
现在有一个需求:计算每次存/取款后的余额:如果TRX是"PR",则加上amt值代表的金额,否则要减去amt代表的金额。
怎么做呢?
看下面解决方案:
SQL> with t as
2 (select 1 as id, 100 as amt, 'PR' as trx
3 from dual
4 union all
5 select 2 as id, 100 as amt, 'PR' as trx
6 from dual
7 union all
8 select 3 as id, 50 as amt, 'PY' as trx
9 from dual
10 union all
11 select 4 as id, 100 as amt, 'PR' as trx
12 from dual
13 union all
14 select 5 as id, 200 as amt, 'PY' as trx
15 from dual
16 union all
17 select 6 as id, 50 as amt, 'PY' as trx
18 from dual)
19 select id,
20 case
21 when trx = 'PR' then
22 '存款'
23 else
24 '取款'
25 end as 存取类型,
26 amt,
27 sum(case
28 when trx = 'PR' then
29 amt
30 else
31 -amt
32 end)over(order by id) as 余额
33 from t
34 order by id;
ID 存取类型 AMT 余额
---------- ---- ---------- ----------
1 存款 100 100
2 存款 100 200
3 取款 50 150
4 存款 100 250
5 取款 200 50
6 取款 50 0
6 rows selected
三、返回各部门工资排名前三位的员工
有经验的人一看到标题,马上就会指出:这个需求太模糊了。为什么?先看下面的示
例。
SQL> select deptno,
2 empno,
3 sal,
4 row_number() over(partition by deptno order by sal desc) as row_number,
5 rank() over(partition by deptno order by sal desc) as rank,
6 dense_rank() over(partition by deptno order by sal desc) as dense_rank
7 from emp
8 where deptno in (20, 30)
9 order by 1, 3 desc;
DEPTNO EMPNO SAL ROW_NUMBER RANK DENSE_RANK
------ ----- --------- ---------- ---------- ----------
20 7788 3000.00 1 1 1
20 7902 3000.00 2 1 1
20 7566 2975.00 3 3 2
20 7876 1100.00 4 4 3
20 7369 800.00 5 5 4
30 7698 2850.00 1 1 1
30 7499 1600.00 2 2 2
30 7844 1500.00 3 3 3
30 7521 1250.00 4 4 4
30 7654 1250.00 5 4 4
30 7900 950.00 6 6 5
11 rows selected
该示例用了PARTITION BY
子句,通过这个子句可以把主查询返回的数据分组进行分析。在查询中分别用了三个分析函数ROW_NUMBER、RANK、DENSE_RANK
来分组(PARTITION BY deptno)
生成序号。
注意粗体标识的部分,当排序列(工资)有重复数据时,会出现以下情况。ROW_NUMBER
仍然会生成序号1、2、3。
RANK
相同的工资会生成同样的序号,而且其后的序号与ROW_NUMBER
相同(empno=7566,生成的序号是3)。DENSE_RANK
相同的工资会生成同样的序号,而且其后的序号递增(empno=7566,生成的序号是2)。
这里如果用ROW_NUMBER
取排名第一的员工,显然会漏掉7902这名员工。如果用DENSE_RANK
取排名前两位的员工,很明显会返回三条记录。所以具体要分析清楚需求,再决定用哪一个函数。
这里选用DENSE_RANK
(因需求不定,所以随意选取了一个)取排名前三的员工,返回数据如下:
SQL> select * from (
2 select deptno,
3 empno,
4 sal,
5 row_number() over(partition by deptno order by sal desc) as row_number,
6 rank() over(partition by deptno order by sal desc) as rank,
7 dense_rank() over(partition by deptno order by sal desc) as dense_rank
8 from emp
9 where deptno in (20, 30)
10 order by 1, 3 desc
11 )where dense_rank<=3;
DEPTNO EMPNO SAL ROW_NUMBER RANK DENSE_RANK
------ ----- --------- ---------- ---------- ----------
20 7788 3000.00 1 1 1
20 7902 3000.00 2 1 1
20 7566 2975.00 3 3 2
20 7876 1100.00 4 4 3
30 7698 2850.00 1 1 1
30 7499 1600.00 2 2 2
30 7844 1500.00 3 3 3
7 rows selected
总结
上面文章是sum()over() 、ROW_NUMBER、RANK、DENSE_RANK几个分析函数常用的场景,这里博主拿了三个工作案例场景给大家讲解。