本文将对一个任意范围按ID分组查出每个ID对应的最新记录的CASE做一个极致的优化体验。
优化后性能维持在可控范围内，任意数据量，毫秒级返回，性能平稳可控。
比优化前性能提升1万倍。

CASE

有一张数据表，结构：   

CREATE TABLE target_position ( 
target_id varchar(80), 
time bigint, 
content text 
); 

数据量是 100 亿条左右   
target_id 大约 20 万个   

数据库使用的是 PostgreSQL 9.4    

需求：   
查询每个目标指定时间段的最新一条数据，要求1秒内返回数据。  
时间段不确定     

现在是使用窗口函数来实现，如下：   
select target_id,time,content from (select *,row_number() over (partition by target_id order by time desc) rid from target_position where time>开始时间 and time<=结束时间) as t where rid=1; 
效果很差。

分析一下原理，这个case其实慢就慢在扫描的时间段，因为需要遍历整个时间段的数据，然后分组排序，取出该时间段内每个target_id的最新一条记录。
这个语句决定了时间段越大，可能的扫描量就越大，时间越久。
直奔最优方案，CASE里有提到，target_id大约20万个，理论上不管要扫描的范围有多大，最多只需要扫描20万条tuple。
怎样做到呢，用函数即可。
首先要开另外一种表维护target_id的唯一值，方便取数据，这个需要应用层配合来做到这一点，其实不难的，就是关系解耦。
下面是测试样本

postgres=# create unlogged table t1(id int, crt_time timestamp);
CREATE TABLE
postgres=# create unlogged table t2(id int primary key);
CREATE TABLE
postgres=# insert into t1 select trunc(random()*200000),clock_timestamp() from generate_series(1,100000000);
INSERT 0 100000000
postgres=# create index idx_t1_1 on t1(id,crt_time desc);
CREATE INDEX
postgres=# select * from t1 limit 10;
   id   |          crt_time          
--------+----------------------------
  49092 | 2016-05-06 16:50:29.88595
    947 | 2016-05-06 16:50:29.887553
 179124 | 2016-05-06 16:50:29.887562
 197308 | 2016-05-06 16:50:29.887564
  93558 | 2016-05-06 16:50:29.887566
 127133 | 2016-05-06 16:50:29.887568
 163507 | 2016-05-06 16:50:29.887569
 110546 | 2016-05-06 16:50:29.887571
  65363 | 2016-05-06 16:50:29.887573
 122666 | 2016-05-06 16:50:29.887575
(10 rows)
postgres=# insert into t2 select generate_series(1,200000);
INSERT 0 200000

来看一个未优化的查询计划和耗时，从查询计划来看，已经很优了，但是由于提供的查询范围内数据量有450多万，所以最后查询的耗时也达到了15秒。

postgres=# explain analyze select * from (select *,row_number() over(partition by id order by crt_time desc) rn from t1 where crt_time between '2016-05-06 16:50:29.887566' and '2016-05-06 16:50:34.887566') t where rn=1;
                                                                                   QUERY PLAN                                                                                    
----------------------------------------------------------------------------------------------------------------------------
 Subquery Scan on t  (cost=0.57..1819615.87 rows=2500 width=20) (actual time=0.083..15301.915 rows=200000 loops=1)
   Filter: (t.rn = 1)
   Rows Removed by Filter: 4320229
   ->  WindowAgg  (cost=0.57..1813365.87 rows=500000 width=12) (actual time=0.078..14012.867 rows=4520229 loops=1)
         ->  Index Only Scan using idx_t1_1 on t1  (cost=0.57..1804615.87 rows=500000 width=12) (actual time=0.066..10603.161 rows=4520229 loops=1)
               Index Cond: ((crt_time >= '2016-05-06 16:50:29.887566'::timestamp without time zone) AND (crt_time <= '2016-05-06 16:50:34.887566'::timestamp without time zone))
               Heap Fetches: 4520229
 Planning time: 0.202 ms
 Execution time: 15356.066 ms
(9 rows)

优化阶段1

通过online code循环，性能提升到了秒级。

postgres=# do language plpgsql 
$$
  
declare
x int;
begin
  for x in select id from t2 loop
    perform * from t1 where id=x and crt_time between '2016-05-06 16:50:29.887566' and '2016-05-06 16:50:34.887566' order by crt_time desc limit 1;
  end loop;
end;

$$
;
DO
Time: 2311.081 ms

写成函数更通用

postgres=# create or replace function f(start_time timestamp, end_time timestamp) returns setof t1 as 
$$

declare
  x int;
begin
  for x in select id from t2 loop
    return query select * from t1 where id=x and crt_time between '2016-05-06 16:50:29.887566' and '2016-05-06 16:50:32.887566' order by crt_time desc limit 1;
  end loop;
  return;
end;

$$
 language plpgsql strict;
CREATE FUNCTION

postgres=# explain analyze select * from f('2016-05-06 16:50:29.887566', '2016-05-06 16:50:34.887566');
                                                   QUERY PLAN                                                   
----------------------------------------------------------------------------------------------------------------
 Function Scan on f  (cost=0.25..10.25 rows=1000 width=12) (actual time=2802.565..2850.445 rows=199999 loops=1)
 Planning time: 0.036 ms
 Execution time: 2885.924 ms
(3 rows)
Time: 2886.314 ms

postgres=# select * from f('2016-05-06 16:50:29.887566', '2016-05-06 16:50:34.887566') limit 10;
 id |          crt_time          
----+----------------------------
  1 | 2016-05-06 16:50:32.507124
  2 | 2016-05-06 16:50:32.774655
  3 | 2016-05-06 16:50:32.48621
  4 | 2016-05-06 16:50:32.874258
  5 | 2016-05-06 16:50:32.677812
  6 | 2016-05-06 16:50:32.091517
  7 | 2016-05-06 16:50:32.724287
  8 | 2016-05-06 16:50:32.669251
  9 | 2016-05-06 16:50:32.815634
 10 | 2016-05-06 16:50:32.812239
(10 rows)
Time: 3108.222 ms

把时间范围放大到扫描约5000万记录的范围。
用原来的方法需要104秒，时间随数据量范围变大而增加。

postgres=# explain analyze select * from (select *,row_number() over(partition by id order by crt_time desc) rn from t1 where crt_time between '2016-05-06 16:50:29.887566' and '2016-05-06 16:51:19.887566') t where rn=1;
                                                                                   QUERY PLAN                                                                                    
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Subquery Scan on t  (cost=0.57..1819615.87 rows=2500 width=20) (actual time=0.042..103886.966 rows=200000 loops=1)
   Filter: (t.rn = 1)
   Rows Removed by Filter: 46031611
   ->  WindowAgg  (cost=0.57..1813365.87 rows=500000 width=12) (actual time=0.037..92722.913 rows=46231611 loops=1)
         ->  Index Only Scan using idx_t1_1 on t1  (cost=0.57..1804615.87 rows=500000 width=12) (actual time=0.030..62673.221 rows=46231611 loops=1)
               Index Cond: ((crt_time >= '2016-05-06 16:50:29.887566'::timestamp without time zone) AND (crt_time <= '2016-05-06 16:51:19.887566'::timestamp without time zone))
               Heap Fetches: 46231611
 Planning time: 0.119 ms
 Execution time: 103950.955 ms
(9 rows)
Time: 103951.638 ms

用优化后的方法时间依旧不变，只需要2.9秒出结果

postgres=# explain analyze select * from f('2016-05-06 16:50:29.887566', '2016-05-06 16:51:19.887566');
                                                   QUERY PLAN                                                   
----------------------------------------------------------------------------------------------------------------
 Function Scan on f  (cost=0.25..10.25 rows=1000 width=12) (actual time=2809.562..2858.468 rows=199999 loops=1)
 Planning time: 0.037 ms
 Execution time: 2894.181 ms
(3 rows)
Time: 2894.605 ms

优化阶段2

继续优化，把SQL抽象成函数

postgres=# create or replace function f1(int, timestamp, timestamp) returns t1 as 
$$

  select * from t1 where id=$1 and crt_time between $2 and $3 order by crt_time desc limit 1;

$$
 language sql strict;
CREATE FUNCTION
Time: 0.564 ms

循环在外头，比函数中的FOR效率更高，内核中的代码开销更少，所以效率提升到2.3秒了。

postgres=# explain analyze select f1(id,'2016-05-06 16:50:29.887566','2016-05-06 16:50:34.887566') from t2;
                                                 QUERY PLAN                                                  
-------------------------------------------------------------------------------------------------------------
 Seq Scan on t2  (cost=0.00..59560.50 rows=225675 width=4) (actual time=0.206..2213.069 rows=200000 loops=1)
 Planning time: 0.121 ms
 Execution time: 2261.185 ms
(3 rows)
Time: 2261.740 ms

postgres=# select count(*) from (select f1(id,'2016-05-06 16:50:29.887566','2016-05-06 16:50:34.887566') from t2)t;
 count  
--------
 200000
(1 row)
Time: 2359.005 ms

因为循环放到外面了，所以可以用游标，可以用limit限制，返回20万记录可以使用分页，对用户体验来说大大提升。

postgres=# select f1(id,'2016-05-06 16:50:29.887566','2016-05-06 16:50:34.887566') from t2 limit 10;
                f1                 
-----------------------------------
 (1,"2016-05-06 16:50:34.818639")
 (2,"2016-05-06 16:50:34.874603")
 (3,"2016-05-06 16:50:34.741072")
 (4,"2016-05-06 16:50:34.727868")
 (5,"2016-05-06 16:50:34.507418")
 (6,"2016-05-06 16:50:34.715711")
 (7,"2016-05-06 16:50:34.817961")
 (8,"2016-05-06 16:50:34.786087")
 (9,"2016-05-06 16:50:34.76778")
 (10,"2016-05-06 16:50:34.836663")
(10 rows)
Time: 0.771 ms

优化阶段3

但是返回所有记录还是没有到1秒内对吧，还有优化的空间么？
我的目标除了优化，还需要榨干硬件性能。
所以，如果你的硬件资源足够，那么其实这个时候就需要并行了，因为取单条记录是很快的，但是循环20万次就慢了。
来看看1万次循环要多久，降低到115毫秒了，符合要求。

postgres=# select count(*) from (select f1(id,'2016-05-06 16:50:29.887566','2016-05-06 16:50:34.887566') from (select * from t2 limit 10000) t) t;
 count 
-------
 10000
(1 row)
Time: 115.690 ms

所以要降低到1秒以内，可以开20个并行，每个查一部分ID，组成一个大的结果集即可。
目前还不支持数据库层的并行，将来PG 9.6会支持。
现在可以在应用层这么来做，但是如何做到并行的数据一致性呢？
这里不得不提一下PG的黑科技，shared export snapshot，允许会话间共享事务快照，所有的事务看到的状态是一致的，这个黑科技已经应用在并行备份中。
现在，应用层如果有跨会话的一致性视角要求，也能使用这个黑科技哦，例如 :
首先
开启会话1

postgres=# begin transaction isolation level repeatable read;
BEGIN
Time: 0.173 ms
postgres=# select pg_export_snapshot();
 pg_export_snapshot 
--------------------
 0FC9C2A3-1
(1 row)

开启会话2, 并导入快照

postgres=# begin transaction isolation level repeatable read;
BEGIN
postgres=# SET TRANSACTION SNAPSHOT '0FC9C2A3-1';
SET

开启会话3, 并导入快照

postgres=# begin transaction isolation level repeatable read;
BEGIN
postgres=# SET TRANSACTION SNAPSHOT '0FC9C2A3-1';
SET

并行的分别在三个会话执行如下

postgres=# select count(*) from (select f1(id,'2016-05-06 16:50:29.887566','2016-05-06 16:50:34.887566') from (select * from t2 order by id limit 70000 offset 0) t) t;
 count 
-------
 70000
(1 row)
Time: 775.071 ms
postgres=# select count(*) from (select f1(id,'2016-05-06 16:50:29.887566','2016-05-06 16:50:34.887566') from (select * from t2 order by id limit 70000 offset 70000) t) t;
 count 
-------
 70000
(1 row)
Time: 763.747 ms
postgres=# select count(*) from (select f1(id,'2016-05-06 16:50:29.887566','2016-05-06 16:50:34.887566') from (select * from t2 order by id limit 70000 offset 140000) t) t;
 count 
-------
 60000
(1 row)

Time: 665.743 ms

并行执行降到1秒内了。
以上查询还有优化的空间哦，就在offset这里，其实ID是PK，所以没有必要用offset，价格范围更好。
但是瓶颈其实不在扫描T2表，所以就是这么任性，不管了。

如果还要优化，把t2再打散即可，做到10毫秒是没有问题的，也就是千万范围的数据能提升1万倍哦。
从优化原理来看，数据量到百亿性能也是一样的，不信可以试试的。

优化阶段4

优化到这里就结束了吗? 当然还没有，因为前面的优化是把ID抽象出来了的，所以不管你要取值的范围是多大，都需要扫描所有的ID，虽然都走索引，但是还有提升的空间。
因此还有优化手段，可以减少扫描的ID次数，例如我给你100万的数据范围，但是这些范围内只有100个唯一ID，理论上只需要扫描100次，但是使用前面的方法，它依旧要扫描20万次。
方法很简单：
（假设需要扫描的时间字段是有流式属性的，既自增，那么可以使用PostgreSQL的黑科技brin索引来提速，如果不是流式的，那就要用传统的btree索引走index only scan了 on(crt_time,id)）
这个索引是为了快速的得到这个范围内的最大ID。

postgres=# create index idx_t2_1 on t1 using brin(crt_time);
CREATE INDEX

插入100万流式数据，但是这100万记录中只有100个唯一ID。

postgres=# insert into t1 select trunc(random()*100),clock_timestamp() from generate_series(1,1000000);
INSERT 0 1000000
Time: 4065.084 ms
postgres=# select now();
             now              
------------------------------
 2016-05-07 11:32:12.93416+08
(1 row)
Time: 0.346 ms

创建一个函数，用来获取输入的ID的下一个ID的最大时间，放在递归查询里面使用。

create or replace function f2(int,timestamp,timestamp) returns t1 as 
$$

  select * from t1 where id is not null and id>$1 and crt_time between $2 and $3 order by id,crt_time desc limit 1;

$$
 language sql strict set enable_sort=off;

创建另一个函数，使用递归查询，得到给定范围的所有ID的最大时间。

create or replace function f3(start_time timestamp, end_time timestamp) returns setof t1 as 
$$

declare
maxid int;
begin
  select max(id) into maxid from t1 where crt_time between start_time and end_time;
  return query with recursive skip as (
  (
    select id,crt_time from t1 where crt_time between start_time and end_time order by id,crt_time desc limit 1
  )
  union all
  (
    select (f2(s1.id, start_time, end_time)).* from skip s1 where s1.id <> maxid and s1.id is not null
  ) 
) select * from skip;
end;

$$
 language plpgsql strict;

postgres=# select * from f3('2016-05-07 09:50:29.887566','2016-05-07 16:50:29.987566');
 id |          crt_time          
----+----------------------------
  0 | 2016-05-07 11:32:00.983203
  1 | 2016-05-07 11:32:00.982906
...
 97 | 2016-05-07 11:32:00.983281
 98 | 2016-05-07 11:32:00.983206
 99 | 2016-05-07 11:32:00.983107
(100 rows)
Time: 177.203 ms

速度杠杠的，只需要177毫秒。

使用阶段3的优化方法需要的时间是恒定的，约3秒多。

select count(*) from (select * from (select (f1(id,'2016-05-07 09:50:29.887566','2016-05-07 16:50:29.987566')).* from t2) t where t.* is not null) t;
 count 
-------
   100
(1 row)
Time: 3153.508 ms

但是阶段4的优化也不是万能的，因为它并不适用于给定范围的ID很多的情况。
请看：

postgres=# select count(*) from f3('2016-05-06 16:50:29.887566','2016-05-06 16:50:34.887566');
 count  
--------
 200000
(1 row)
Time: 13344.261 ms

对于给定范围ID很多的情况，还是建议使用阶段3的优化方法。

postgres=#  select count(*) from (select * from (select (f1(id,'2016-05-06 16:50:29.887566','2016-05-06 16:50:34.887566')).* from t2) t where t.* is not null) t;
 count  
--------
 200000
(1 row)
Time: 3846.156 ms

优化阶段5

怎么自动评估选定范围内的唯一的ID个数呢？
可以用到我前面文章提到的方法,使用以下评估函数

CREATE FUNCTION count_estimate(query text) RETURNS INTEGER AS
$func$
DECLARE
    rec   record;
    ROWS  INTEGER;
BEGIN
    FOR rec IN EXECUTE 'EXPLAIN ' || query LOOP
        ROWS := SUBSTRING(rec."QUERY PLAN" FROM ' rows=([[:digit:]]+)');
        EXIT WHEN ROWS IS NOT NULL;
    END LOOP;

    RETURN ROWS;
END
$func$ LANGUAGE plpgsql;

postgres=# explain select distinct id from t1 where crt_time between '2016-05-06 16:50:29.887566' and '2016-05-06 16:50:34.887566';
                                                                                   QUERY PLAN                                                                                    
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=672240.13..672329.49 rows=8936 width=4)
   Group Key: id
   ->  Bitmap Heap Scan on t1  (cost=46663.05..660864.26 rows=4550347 width=4)
         Recheck Cond: ((crt_time >= '2016-05-06 16:50:29.887566'::timestamp without time zone) AND (crt_time <= '2016-05-06 16:50:34.887566'::timestamp without time zone))
         ->  Bitmap Index Scan on idx_t2_1  (cost=0.00..45525.47 rows=4550347 width=0)
               Index Cond: ((crt_time >= '2016-05-06 16:50:29.887566'::timestamp without time zone) AND (crt_time <= '2016-05-06 16:50:34.887566'::timestamp without time zone))
(6 rows)
Time: 0.645 ms

postgres=# explain select distinct id from t1 where crt_time between '2016-05-07 09:50:29.887566' and '2016-05-07 16:50:29.987566';
                                                                                   QUERY PLAN                                                                                    
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=23.12..23.13 rows=1 width=4)
   Group Key: id
   ->  Bitmap Heap Scan on t1  (cost=22.00..23.12 rows=1 width=4)
         Recheck Cond: ((crt_time >= '2016-05-07 09:50:29.887566'::timestamp without time zone) AND (crt_time <= '2016-05-07 16:50:29.987566'::timestamp without time zone))
         ->  Bitmap Index Scan on idx_t2_1  (cost=0.00..22.00 rows=1 width=0)
               Index Cond: ((crt_time >= '2016-05-07 09:50:29.887566'::timestamp without time zone) AND (crt_time <= '2016-05-07 16:50:29.987566'::timestamp without time zone))
(6 rows)
Time: 0.641 ms


postgres=# select count_estimate(
$$
select distinct id from t1 where crt_time between '2016-05-06 16:50:29.887566' and '2016-05-06 16:50:34.887566'
$$
);
 count_estimate 
----------------
           8936
(1 row)
Time: 1.139 ms

postgres=# select count_estimate(
$$
select distinct id from t1 where crt_time between '2016-05-07 09:50:29.887566' and '2016-05-07 16:50:29.987566'
$$
);
 count_estimate 
----------------
              1
(1 row)
Time: 0.706 ms

接下来你懂的，根据记录数选择应该使用阶段3还是阶段4的优化方法。

另外再奉上count(distinct xx) 以及 distinct xx的优化，也是极为变态的。

PostgreSQL 百亿级数据范围查询, 分组排序窗口取值极致优化 case

CASE

优化阶段1

优化阶段2

优化阶段3

优化阶段4

优化阶段5

关系型数据库

热门文章

最新文章

相关产品

相关课程

相关电子书

推荐镜像

PostgreSQL 百亿级数据范围查询, 分组排序窗口取值 极致优化 case

CASE

优化阶段1

优化阶段2

优化阶段3

优化阶段4

优化阶段5

关系型数据库

热门文章

最新文章

相关产品

相关课程

相关电子书

推荐镜像

PostgreSQL 百亿级数据范围查询, 分组排序窗口取值极致优化 case