前言
本篇文章讲解的主要内容是:如何识别重叠的日期范围、日期出现次数、确定当前记录和下一条记录之间相差的天数
【SQL开发实战技巧】这一系列博主当作复习旧知识来进行写作,毕竟SQL开发在数据分析场景非常重要且基础,面试也会经常问SQL开发和调优经验,相信当我写完这一系列文章,也能再有所收获,未来面对SQL面试也能游刃有余~。
一、识别重叠的日期范围
下面是一个有关工程的明细数据:
create or replace view zyd as
SELECT 7782 empno,'CLARK' as ename,1 as proj_id,date'2023-06-16' proj_start,date'2023-06-18' proj_end from dual union all
SELECT 7782 empno,'CLARK' as ename,4 as proj_id,date'2023-06-19' proj_start,date'2023-06-24' proj_end from dual union all
SELECT 7782 empno,'CLARK' as ename,7 as proj_id,date'2023-06-22' proj_start,date'2023-06-25' proj_end from dual union all
SELECT 7782 empno,'CLARK' as ename,10 as proj_id,date'2023-06-25' proj_start,date'2023-06-28' proj_end from dual union all
SELECT 7782 empno,'CLARK' as ename,13 as proj_id,date'2023-06-28' proj_start,date'2023-07-02' proj_end from dual union all
SELECT 7839 empno,'KING' as ename,2 as proj_id,date'2023-06-17' proj_start,date'2023-06-21' proj_end from dual union all
SELECT 7839 empno,'KING' as ename,8 as proj_id,date'2023-06-23' proj_start,date'2023-06-25' proj_end from dual union all
SELECT 7839 empno,'KING' as ename,14 as proj_id,date'2023-06-29' proj_start,date'2023-06-30' proj_end from dual union all
SELECT 7839 empno,'KING' as ename,11 as proj_id,date'2023-06-26' proj_start,date'2023-06-27' proj_end from dual union all
SELECT 7839 empno,'KING' as ename,5 as proj_id,date'2023-06-20' proj_start,date'2023-06-24' proj_end from dual union all
SELECT 7934 empno,'MILLER' as ename,3 as proj_id,date'2023-06-18' proj_start,date'2023-06-22' proj_end from dual union all
SELECT 7934 empno,'MILLER' as ename,12 as proj_id,date'2023-06-27' proj_start,date'2023-06-28' proj_end from dual union all
SELECT 7934 empno,'MILLER' as ename,15 as proj_id,date'2023-06-30' proj_start,date'2023-07-03' proj_end from dual union all
SELECT 7934 empno,'MILLER' as ename,9 as proj_id,date'2023-06-24' proj_start,date'2023-06-27' proj_end from dual union all
SELECT 7934 empno,'MILLER' as ename,6 as proj_id,date'2023-06-21' proj_start,date'2023-06-23' proj_end from dual;
select * from zyd;
EMPNO ENAME PROJ_ID PROJ_START PROJ_END
---------- ------ ---------- ----------- -----------
7782 CLARK 1 2023-6-16 2023-6-18
7782 CLARK 4 2023-6-19 2023-6-24
7782 CLARK 7 2023-6-22 2023-6-25
7782 CLARK 10 2023-6-25 2023-6-28
7782 CLARK 13 2023-6-28 2023-7-2
7839 KING 2 2023-6-17 2023-6-21
7839 KING 8 2023-6-23 2023-6-25
7839 KING 14 2023-6-29 2023-6-30
7839 KING 11 2023-6-26 2023-6-27
7839 KING 5 2023-6-20 2023-6-24
7934 MILLER 3 2023-6-18 2023-6-22
7934 MILLER 12 2023-6-27 2023-6-28
7934 MILLER 15 2023-6-30 2023-7-3
7934 MILLER 9 2023-6-24 2023-6-27
7934 MILLER 6 2023-6-21 2023-6-23
15 rows selected
通过数据可以看到,有很多员工在旧的工程结束之前就开始了新的工程(如员工 7782 的工程4结束日期是6月24日,而工程7开始日期是6月22日),现要求返回这些工程时间重复的数据。
前面介绍了Oracle中有两个分析函数LAG
和LEAD
,分别用于访问结果集中的前一行和后一行。我们可以用分析函数LAG
取得员工各自的上一个工程的结束日期及工程号,然后与当前工程相比较。
- 1、取信息
SQL> select empno,
2 ename,
3 proj_id as 工程号,
4 proj_start as 开始日期,
5 proj_end as 结束日期,
6 lag(proj_end) over(partition by empno order by proj_start) as 上一工程结束日期,
7 lag(proj_id) over(partition by empno order by proj_start) as 上一工程号
8 from zyd;
EMPNO ENAME 工程号 开始日期 结束日期 上一工程结束日期 上一工程号
---------- ------ ---------- ----------- ----------- ----------- ----------
7782 CLARK 1 2023-6-16 2023-6-18
7782 CLARK 4 2023-6-19 2023-6-24 2023-6-18 1
7782 CLARK 7 2023-6-22 2023-6-25 2023-6-24 4
7782 CLARK 10 2023-6-25 2023-6-28 2023-6-25 7
7782 CLARK 13 2023-6-28 2023-7-2 2023-6-28 10
7839 KING 2 2023-6-17 2023-6-21
7839 KING 5 2023-6-20 2023-6-24 2023-6-21 2
7839 KING 8 2023-6-23 2023-6-25 2023-6-24 5
7839 KING 11 2023-6-26 2023-6-27 2023-6-25 8
7839 KING 14 2023-6-29 2023-6-30 2023-6-27 11
7934 MILLER 3 2023-6-18 2023-6-22
7934 MILLER 6 2023-6-21 2023-6-23 2023-6-22 3
7934 MILLER 9 2023-6-24 2023-6-27 2023-6-23 6
7934 MILLER 12 2023-6-27 2023-6-28 2023-6-27 9
7934 MILLER 15 2023-6-30 2023-7-3 2023-6-28 12
15 rows selected
这里增加了partition by empno这样就可以对数据分组进行分析,不同的empno之间互不影响。
- 2、比较
SQL> with t as
2 (select empno,
3 ename,
4 proj_id as 工程号,
5 proj_start as 开始日期,
6 proj_end as 结束日期,
7 lag(proj_end) over(partition by empno order by proj_start) as 上一工程结束日期,
8 lag(proj_id) over(partition by empno order by proj_start) as 上一工程号
9 from zyd)
10 select t.empno,
11 t.ename,
12 t.工程号,
13 t.开始日期,
14 t.结束日期,
15 case
16 when 上一工程结束日期 >= 开始日期 then
17 '工程' || lpad(工程号, 2, '0') || '与工程' || lpad(上一工程号, 2, '0') || '重复'
18 end as 描述
19 from t;
EMPNO ENAME 工程号 开始日期 结束日期 描述
---------- ------ ---------- ----------- ----------- --------------------------------------------------------------------------------
7782 CLARK 1 2023-6-16 2023-6-18
7782 CLARK 4 2023-6-19 2023-6-24
7782 CLARK 7 2023-6-22 2023-6-25 工程07与工程04重复
7782 CLARK 10 2023-6-25 2023-6-28 工程10与工程07重复
7782 CLARK 13 2023-6-28 2023-7-2 工程13与工程10重复
7839 KING 2 2023-6-17 2023-6-21
7839 KING 5 2023-6-20 2023-6-24 工程05与工程02重复
7839 KING 8 2023-6-23 2023-6-25 工程08与工程05重复
7839 KING 11 2023-6-26 2023-6-27
7839 KING 14 2023-6-29 2023-6-30
7934 MILLER 3 2023-6-18 2023-6-22
7934 MILLER 6 2023-6-21 2023-6-23 工程06与工程03重复
7934 MILLER 9 2023-6-24 2023-6-27
7934 MILLER 12 2023-6-27 2023-6-28 工程12与工程09重复
7934 MILLER 15 2023-6-30 2023-7-3
15 rows selected
如果只想看重复数据,直接加个过滤就行啦!!!
二、按指定间隔汇总数据
现在有个需求:按指定的时间间隔(10分钟)汇总数据,分别汇总至0分、10分、20分、30分等。
我这里用v$sql表来当作案例数据,先看下这里的部分数据:
select a.LAST_ACTIVE_TIME from v$sql a where rownum<=5;
LAST_ACTIVE_TIME
----------------
2023-2-13 22:14:27
2023-2-15 3:00:59
2023-2-7 1:05:29
2023-2-8 1:05:30
2023-2-15 15:59:03
我们一步步来。
- 1、截取数据到分钟,并提取分钟信息
select sql_id,trunc(a.LAST_ACTIVE_TIME,'mi') as tim,to_char(a.LAST_ACTIVE_TIME,'mi') as mi from v$sql a where sql_id='gcsnqzu9q0004'
SQL_ID TIM MI
gcsnqzu9q0004 2023-2-13 22:14:00 14
2、对14和10取余
SQL> select mod(14,10) from dual;
MOD(14,10)
----------
4
SQL>
- 2、对比上面结果,我们可以知道如果想计算整10分钟的间隔,那就直接用MI-MOD(14,10)就算出来这个分钟对应在整十的哪个范围了。
select sql_id,
trunc(a.LAST_ACTIVE_TIME, 'mi') as tim,
to_char(a.LAST_ACTIVE_TIME, 'mi') as mi,
to_char(a.LAST_ACTIVE_TIME, 'mi') -
mod(to_char(a.LAST_ACTIVE_TIME, 'mi'), 10) as new_mi
from v$sql a
where sql_id = 'gcsnqzu9q0004';
SQL_ID TIM MI NEW_MI
gcsnqzu9q0004 2023-2-13 22:14:00 14 10
那么铺垫做完了,这个需求的最终实现sql如下:
SQL> with t as
2 (select sql_id,
3 trunc(a.LAST_ACTIVE_TIME, 'mi') -
4 mod(to_char(a.LAST_ACTIVE_TIME, 'mi'), 10) / 24 / 60 as new_tim
5 from v$sql a)
6 select * from (
7 select new_tim, count(*) from t group by new_tim order by new_tim desc nulls last
8 )where rownum<=10;
NEW_TIM COUNT(*)
----------- ----------
2023-2-15 19:00:00 476
2023-2-15 18:50:00 44
2023-2-15 18:40:00 20
2023-2-15 18:30:00 21
2023-2-15 18:20:00 52
2023-2-15 18:10:00 4
2023-2-15 18:00:00 7
2023-2-15 17:50:00 2
2023-2-15 17:40:00 7
2023-2-15 17:30:00 2
10 rows selected
SQL>
总结
提示:这里对文章进行总结:
例如:以上就是今天要讲的内容,本文仅仅简单介绍了pandas的使用,而pandas提供了大量能使我们快速便捷地处理数据的函数和方法。