PostgreSQL 优化器案例之 - order by limit 索引选择问题

本文涉及的产品
云原生数据库 PolarDB 分布式版,标准版 2核8GB
RDS PostgreSQL Serverless,0.5-4RCU 50GB 3个月
推荐场景:
对影评进行热评分析
云数据库 RDS SQL Server,基础系列 2核4GB
简介:

标签

PostgreSQL , limit , order by , 优化器 , 选择性 , 相关性 , 数据存储顺序 , 目标数据存储顺序


背景

当我们在执行一个这样的SQL时,假如有这样几个索引(c1,c2) (id),数据库到底该用哪个索引呢?

explain select * from tbl where c1=200 and c2=200 order by id limit 10;  
或  
explain select * from tbl where c1=200 and c2 between 100 and 300 order by id limit 10;  

实际上PG会通过计算成本得到应该使用哪个索引。

但是实际上计算公式本身的通用性,使得当C1,C2数据分布倾斜时,可能导致某些C1,C2输入值的执行计划不准确。

走c1,c2索引,当满足c1,c2条件的记录很少时。如果数据比较多,则SORT耗费的成本就较大,导致整个耗时变大。

走id索引,当满足c1,c2条件的记录很多时,(并且c1,c2满足条件的数据分布在id偏小的端),如果分布在大端,则需要扫描更多的记录才能找到满足条件的记录。

如果数据库统计信息中可以评估出来满足条件的c1,c2的记录与ID顺序分布的关系,也可以作为优化器计算成本的一种参考输入(当然要做到这个可能比较复杂,同时也会增加优化器计算成本的开销)。

下面通过实例来说明。

例子

1、建测试表

postgres=# create table tbl (id int, c1 int, c2 int, c3 int, c4 int);  
CREATE TABLE  

2、写入一批随机数据,ID从1到1000万。

postgres=# insert into tbl select generate_series(1,10000000), random()*100, random()*100, random()*100, random()*100;  
INSERT 0 10000000  

3、写入另一批100万条数据,c1,c2 与前面1000万的值不一样。

postgres=# insert into tbl select generate_series(10000001,11000000), 200,200,200,200;  
INSERT 0 1000000  

4、创建两个索引,也就是本文需要重点关注的,到底走哪个索引更划算

postgres=# create index idx_tbl_1 on tbl(id);  
CREATE INDEX  
postgres=# create index idx_tbl_2 on tbl(c1,c2,c3,c4);  
CREATE INDEX  

5、收集统计信息

postgres=# vacuum analyze tbl;  
VACUUM  

6、查看下面SQL的执行计划,走了id索引

postgres=# explain select * from tbl where c1=200 and c2=200 order by id limit 10;  
                                      QUERY PLAN                                        
--------------------------------------------------------------------------------------  
 Limit  (cost=0.43..32.59 rows=10 width=20)  
   ->  Index Scan using idx_tbl_1 on tbl  (cost=0.43..323244.26 rows=100533 width=20)  
         Filter: ((c1 = 200) AND (c2 = 200))  
(3 rows)  

还是走了id索引

postgres=# explain select * from tbl where c1=200 and c2=200 order by id limit 1000;  
                                      QUERY PLAN                                        
--------------------------------------------------------------------------------------  
 Limit  (cost=0.43..3215.74 rows=1000 width=20)  
   ->  Index Scan using idx_tbl_1 on tbl  (cost=0.43..323244.26 rows=100533 width=20)  
         Filter: ((c1 = 200) AND (c2 = 200))  
(3 rows)  

当LIMIT达到50000时,走了c1,c2的索引。为什么呢?

postgres=# explain select * from tbl where c1=200 and c2=200 order by id limit 50000;  
                                         QUERY PLAN                                           
--------------------------------------------------------------------------------------------  
 Limit  (cost=70355.06..70480.06 rows=50000 width=20)  
   ->  Sort  (cost=70355.06..70606.39 rows=100533 width=20)  
         Sort Key: id  
         ->  Bitmap Heap Scan on tbl  (cost=1457.82..62005.97 rows=100533 width=20)  
               Recheck Cond: ((c1 = 200) AND (c2 = 200))  
               ->  Bitmap Index Scan on idx_tbl_2  (cost=0.00..1432.69 rows=100533 width=0)  
                     Index Cond: ((c1 = 200) AND (c2 = 200))  
(7 rows)  

7、分析以上执行计划的含义

首先,表的记录数(1100万)除以"满足c1=200 and c2=200 条件的记录数"(100533),得到平均需要扫描多少条记录,可以得到一条满足c1=200 and c2=200条件的记录.

postgres=# select 11000000/100533.0;  
       ?column?         
----------------------  
 109.4168084111684720  
(1 row)  

也就是说每扫描109.4条记录,可以得到一条满足条件的记录。(优化器这么算,是认为数据分布是均匀的。)

但是,实际上,数据分布是不均匀的,c1=200 and c2=200的记录在表的末端(1000万条记录后面),也就是说需要扫描1000万条记录后,才能得到1条满足c1=200 and c2=200的记录。

并不是估算的每扫描109.4条记录,可以得到一条满足条件的记录。

问题就出在这里。

8、我们再来分析一下为什么limit 50000时,选择了c1,c2的索引。而不是id的索引

使用ID索引时,需要扫描100533条记录,同时需要排序,直到排序完成,总成约70606.39。然后就是GET HEAP TUPLE的成本。

当使用id的索引扫描时,返回多少条记录能达到70606.39的成本呢?

以limit 1000的3215.74成本为例  
  
postgres=# select 70606.39/3215.74;  
      ?column?         
---------------------  
 21.9564983487471002  
(1 row)  
  
postgres=# select 21.956*1000;  
 ?column?    
-----------  
 21956.000  
(1 row)  

分水岭

经过以上分析,也就是说,LIMIT 21956时,走ID索引扫描的执行计划,成本可达到70606.39。

所以limit 21956是一个分水岭,大于这个值时,可能使用c1,c2的索引扫描,而小于它,则会使用ID索引扫描.

如下

postgres=# explain select * from tbl where c1=200 and c2=200 order by id limit 22000;  
                                         QUERY PLAN                                           
--------------------------------------------------------------------------------------------  
 Limit  (cost=69759.69..69814.69 rows=22000 width=20)  
   ->  Sort  (cost=69759.69..70011.02 rows=100533 width=20)  
         Sort Key: id  
         ->  Bitmap Heap Scan on tbl  (cost=1457.82..62005.97 rows=100533 width=20)  
               Recheck Cond: ((c1 = 200) AND (c2 = 200))  
               ->  Bitmap Index Scan on idx_tbl_2  (cost=0.00..1432.69 rows=100533 width=0)  
                     Index Cond: ((c1 = 200) AND (c2 = 200))  
(7 rows)  
postgres=# explain select * from tbl where c1=200 and c2=200 order by id limit 21000;  
                                      QUERY PLAN                                        
--------------------------------------------------------------------------------------  
 Limit  (cost=0.43..67521.75 rows=21000 width=20)  
   ->  Index Scan using idx_tbl_1 on tbl  (cost=0.43..323244.26 rows=100533 width=20)  
         Filter: ((c1 = 200) AND (c2 = 200))  
(3 rows)  

真实的执行耗时

很显然,使用id扫描,一定会慢,因为满足条件的数据都分布在1000万行后面。

1、c1,c2索引扫描,直接命中数据,加排序(100万条),略快。

postgres=# explain  analyze select * from tbl where c1=200 and c2=200 order by id limit 22000;  
                                                                  QUERY PLAN                                                                    
----------------------------------------------------------------------------------------------------------------------------------------------  
 Limit  (cost=69759.69..69814.69 rows=22000 width=20) (actual time=293.961..299.054 rows=22000 loops=1)  
   ->  Sort  (cost=69759.69..70011.02 rows=100533 width=20) (actual time=293.960..296.006 rows=22000 loops=1)  
         Sort Key: id  
         Sort Method: top-N heapsort  Memory: 3255kB  
         ->  Bitmap Heap Scan on tbl  (cost=1457.82..62005.97 rows=100533 width=20) (actual time=47.919..175.698 rows=1000000 loops=1)  
               Recheck Cond: ((c1 = 200) AND (c2 = 200))  
               Heap Blocks: exact=6370  
               ->  Bitmap Index Scan on idx_tbl_2  (cost=0.00..1432.69 rows=100533 width=0) (actual time=47.160..47.160 rows=1000000 loops=1)  
                     Index Cond: ((c1 = 200) AND (c2 = 200))  
 Planning time: 0.152 ms  
 Execution time: 300.664 ms  
(11 rows)  

2、id 索引扫描,慢。

postgres=# explain  analyze select * from tbl where c1=200 and c2=200 order by id limit 21000;  
                                                                QUERY PLAN                                                                  
------------------------------------------------------------------------------------------------------------------------------------------  
 Limit  (cost=0.43..67521.75 rows=21000 width=20) (actual time=1404.932..1412.594 rows=21000 loops=1)  
   ->  Index Scan using idx_tbl_1 on tbl  (cost=0.43..323244.26 rows=100533 width=20) (actual time=1404.930..1409.639 rows=21000 loops=1)  
         Filter: ((c1 = 200) AND (c2 = 200))  
         Rows Removed by Filter: 10000000  
 Planning time: 0.139 ms  
 Execution time: 1414.142 ms  
(6 rows)  

3、limit 10同样,id 索引扫描,慢。

postgres=# explain ( analyze,verbose,timing,costs,buffers) select * from tbl where c1=200 and c2=200 order by id limit 10;  
                                                                  QUERY PLAN                                                                    
----------------------------------------------------------------------------------------------------------------------------------------------  
 Limit  (cost=0.43..32.59 rows=10 width=20) (actual time=1403.861..1403.865 rows=10 loops=1)  
   Output: id, c1, c2, c3, c4  
   Buffers: shared hit=91020  
   ->  Index Scan using idx_tbl_1 on public.tbl  (cost=0.43..323244.26 rows=100533 width=20) (actual time=1403.859..1403.861 rows=10 loops=1)  
         Output: id, c1, c2, c3, c4  
         Filter: ((tbl.c1 = 200) AND (tbl.c2 = 200))  
         Rows Removed by Filter: 10000000  
         Buffers: shared hit=91020  
 Planning time: 0.127 ms  
 Execution time: 1403.893 ms  
(10 rows)  

优化方法

1、使用HINT或者改SQL,强制不走ID扫描。

并不适用于所有场景,比如数据分布均匀时,那么PG的这周成本计算方法就对口,那么什么时候使用ID,什么时候使用C1,C2索引扫描就合乎常理了。

postgres=# explain ( analyze,verbose,timing,costs,buffers) select * from tbl where c1=200 and c2=200 order by id+0 limit 10;  
                                                                  QUERY PLAN                                                                    
----------------------------------------------------------------------------------------------------------------------------------------------  
 Limit  (cost=64429.79..64429.81 rows=10 width=24) (actual time=409.622..409.626 rows=10 loops=1)  
   Output: id, c1, c2, c3, c4, ((id + 0))  
   Buffers: shared hit=10205  
   ->  Sort  (cost=64429.79..64681.12 rows=100533 width=24) (actual time=409.620..409.621 rows=10 loops=1)  
         Output: id, c1, c2, c3, c4, ((id + 0))  
         Sort Key: ((tbl.id + 0))  
         Sort Method: top-N heapsort  Memory: 25kB  
         Buffers: shared hit=10205  
         ->  Bitmap Heap Scan on public.tbl  (cost=1457.82..62257.30 rows=100533 width=24) (actual time=47.347..237.455 rows=1000000 loops=1)  
               Output: id, c1, c2, c3, c4, (id + 0)  
               Recheck Cond: ((tbl.c1 = 200) AND (tbl.c2 = 200))  
               Heap Blocks: exact=6370  
               Buffers: shared hit=10205  
               ->  Bitmap Index Scan on idx_tbl_2  (cost=0.00..1432.69 rows=100533 width=0) (actual time=46.577..46.577 rows=1000000 loops=1)  
                     Index Cond: ((tbl.c1 = 200) AND (tbl.c2 = 200))  
                     Buffers: shared hit=3835  
 Planning time: 0.133 ms  
 Execution time: 409.670 ms  
(18 rows)  

2、如果c1,c2是等值查询的话,可以用以下索引,那么效率是最高的。

postgres=# create index idx_tbl_3 on tbl(c1,c2,id);  
CREATE INDEX  
  
  
postgres=# explain ( analyze,verbose,timing,costs,buffers) select * from tbl where c1=200 and c2 =200 order by id limit 10;  
                                                              QUERY PLAN                                                                 
---------------------------------------------------------------------------------------------------------------------------------------  
 Limit  (cost=0.56..6.93 rows=10 width=20) (actual time=0.102..0.106 rows=10 loops=1)  
   Output: id, c1, c2, c3, c4  
   Buffers: shared hit=1 read=4  
   I/O Timings: read=0.047  
   ->  Index Scan using idx_tbl_3 on public.tbl  (cost=0.56..64086.79 rows=100533 width=20) (actual time=0.101..0.103 rows=10 loops=1)  
         Output: id, c1, c2, c3, c4  
         Index Cond: ((tbl.c1 = 200) AND (tbl.c2 = 200))  
         Buffers: shared hit=1 read=4  
         I/O Timings: read=0.047  
 Planning time: 0.142 ms  
 Execution time: 0.131 ms  
(11 rows)  

3、建议方法:

注意方法2 不适合非等值查询,

postgres=# explain ( analyze,verbose,timing,costs,buffers) select * from tbl where c1=200 and c2 between 100 and 300 order by id limit 10;  
                                                                  QUERY PLAN                                                                    
----------------------------------------------------------------------------------------------------------------------------------------------  
 Limit  (cost=0.43..35.32 rows=10 width=20) (actual time=1371.094..1371.099 rows=10 loops=1)  
   Output: id, c1, c2, c3, c4  
   Buffers: shared hit=91020  
   ->  Index Scan using idx_tbl_1 on public.tbl  (cost=0.43..350743.84 rows=100533 width=20) (actual time=1371.092..1371.095 rows=10 loops=1)  
         Output: id, c1, c2, c3, c4  
         Filter: ((tbl.c2 >= 100) AND (tbl.c2 <= 300) AND (tbl.c1 = 200))  
         Rows Removed by Filter: 10000000  
         Buffers: shared hit=91020  
 Planning time: 0.278 ms  
 Execution time: 1371.128 ms  
(10 rows)  

但是不用担心,我们依旧可以使用其他等值查询列,加上排序列组成复合索引,在INDEX SCAN中使用FILTER来加速。

新增如下索引,加速如下

postgres=# create index idx_tbl_4 on tbl(c1,id);  
CREATE INDEX  
  
postgres=# explain ( analyze,verbose,timing,costs,buffers) select * from tbl where c1=200 and c2 between 100 and 300 order by id limit 10;  
                                                               QUERY PLAN                                                                 
----------------------------------------------------------------------------------------------------------------------------------------  
 Limit  (cost=0.43..10.47 rows=10 width=20) (actual time=0.105..0.110 rows=10 loops=1)  
   Output: id, c1, c2, c3, c4  
   Buffers: shared hit=1 read=3  
   I/O Timings: read=0.051  
   ->  Index Scan using idx_tbl_4 on public.tbl  (cost=0.43..100877.50 rows=100533 width=20) (actual time=0.104..0.107 rows=10 loops=1)  
         Output: id, c1, c2, c3, c4  
         Index Cond: (tbl.c1 = 200)  
         Filter: ((tbl.c2 >= 100) AND (tbl.c2 <= 300))  
         Buffers: shared hit=1 read=3  
         I/O Timings: read=0.051  
 Planning time: 0.172 ms  
 Execution time: 0.134 ms  
(12 rows)  

附Oracle

《Oracle migration to Greenplum - (含 Ora2pg)》

http://www.dba-oracle.com/t_OracleAutotrace.htm

SQL> create table tbl(id int, c1 int, c2 int, c3 int, c4 int);

Table created.

SQL> insert into tbl select rownum,trunc(dbms_random.value(0, 100)),trunc(dbms_random.value(0, 100)),trunc(dbms_random.value(0, 100)),trunc(dbms_random.value(0, 100)) from dual connect by level <=10000000;

10000000 rows created.

SQL> commit;

Commit complete.

SQL> insert into tbl select rownum+10000000, 200,200,200,200 from dual connect by level <=1000000;

1000000 rows created.

SQL> commit;

Commit complete.

SQL> create index idx_tbl_1 on tbl(id);

Index created.

SQL> create index idx_tbl_2 on tbl(c1,c2,c3,c4);

Index created.

SQL> set linesize 512
SQL> set pagesize 50000

SQL> set autotrace on;

SQL> exec DBMS_STATS.GATHER_TABLE_STATS('JIUDU','TBL'); 

PL/SQL procedure successfully completed.

SQL> select * from (select * from tbl where c1=200 and c2=200 order by id) t where rownum<10;

        ID         C1         C2         C3         C4
---------- ---------- ---------- ---------- ----------
  10000001        200        200        200        200
  10000002        200        200        200        200
  10000003        200        200        200        200
  10000004        200        200        200        200
  10000005        200        200        200        200
  10000006        200        200        200        200
  10000007        200        200        200        200
  10000008        200        200        200        200
  10000009        200        200        200        200

9 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 745043579

----------------------------------------------------------------------------------------
| Id  | Operation               | Name | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |      |     9 |   585 |       | 10253   (2)| 00:02:04 |
|*  1 |  COUNT STOPKEY          |      |       |       |       |            |          |
|   2 |   VIEW                  |      | 84875 |  5387K|       | 10253   (2)| 00:02:04 |
|*  3 |    SORT ORDER BY STOPKEY|      | 84875 |  1491K|  2672K| 10253   (2)| 00:02:04 |
|*  4 |     TABLE ACCESS FULL   | TBL  | 84875 |  1491K|       |  9767   (2)| 00:01:58 |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(ROWNUM<10)
   3 - filter(ROWNUM<10)
   4 - filter("C1"=200 AND "C2"=200)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
      34868  consistent gets
          0  physical reads
          0  redo size
        937  bytes sent via SQL*Net to client
        500  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
          9  rows processed

SQL> select * from (select * from tbl where c1=1 and c2=1 order by id) t where rownum<10;

        ID         C1         C2         C3         C4
---------- ---------- ---------- ---------- ----------
      9697          1          1         78         39
     20586          1          1         81         71
     27820          1          1         33         64
     44324          1          1         26         27
     47079          1          1          3          5
     64669          1          1         13         49
     73715          1          1         20         74
     80903          1          1         96         25
     98368          1          1         59          9

9 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 447312937

--------------------------------------------------------------------------------------------
| Id  | Operation                      | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |           |     9 |   585 |   641   (1)| 00:00:08 |
|*  1 |  COUNT STOPKEY                 |           |       |       |            |          |
|   2 |   VIEW                         |           |   704 | 45760 |   641   (1)| 00:00:08 |
|*  3 |    SORT ORDER BY STOPKEY       |           |   704 | 12672 |   641   (1)| 00:00:08 |
|   4 |     TABLE ACCESS BY INDEX ROWID| TBL       |   704 | 12672 |   640   (0)| 00:00:08 |
|*  5 |      INDEX RANGE SCAN          | IDX_TBL_2 |   704 |       |     5   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(ROWNUM<10)
   3 - filter(ROWNUM<10)
   5 - access("C1"=1 AND "C2"=1)


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
       1072  consistent gets
         11  physical reads
          0  redo size
        969  bytes sent via SQL*Net to client
        500  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
          9  rows processed


SQL> create index idx_tbl_3 on tbl(c1,c2,id);
Index created.

SQL> select * from (select * from tbl where c1=200 and c2 between 100 and 300 order by id) t where rownum < 10;

        ID         C1         C2         C3         C4
---------- ---------- ---------- ---------- ----------
  10000001        200        200        200        200
  10000002        200        200        200        200
  10000003        200        200        200        200
  10000004        200        200        200        200
  10000005        200        200        200        200
  10000006        200        200        200        200
  10000007        200        200        200        200
  10000008        200        200        200        200
  10000009        200        200        200        200

9 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 745043579

----------------------------------------------------------------------------------------
| Id  | Operation               | Name | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |      |     9 |   585 |       | 10253   (2)| 00:02:04 |
|*  1 |  COUNT STOPKEY          |      |       |       |       |            |          |
|   2 |   VIEW                  |      | 84875 |  5387K|       | 10253   (2)| 00:02:04 |
|*  3 |    SORT ORDER BY STOPKEY|      | 84875 |  1491K|  2672K| 10253   (2)| 00:02:04 |
|*  4 |     TABLE ACCESS FULL   | TBL  | 84875 |  1491K|       |  9767   (2)| 00:01:58 |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(ROWNUM<10)
   3 - filter(ROWNUM<10)
   4 - filter("C1"=200 AND "C2">=100 AND "C2"<=300)


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
      34868  consistent gets
          0  physical reads
          0  redo size
        937  bytes sent via SQL*Net to client
        500  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
          9  rows processed

SQL> select * from (select * from tbl where c1=200 and c2 =200 order by id) t where rownum < 10;

        ID         C1         C2         C3         C4
---------- ---------- ---------- ---------- ----------
  10000001        200        200        200        200
  10000002        200        200        200        200
  10000003        200        200        200        200
  10000004        200        200        200        200
  10000005        200        200        200        200
  10000006        200        200        200        200
  10000007        200        200        200        200
  10000008        200        200        200        200
  10000009        200        200        200        200

9 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 1825274432

-------------------------------------------------------------------------------------------
| Id  | Operation                     | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |           |     9 |   585 |    12   (0)| 00:00:01 |
|*  1 |  COUNT STOPKEY                |           |       |       |            |          |
|   2 |   VIEW                        |           |    10 |   650 |    12   (0)| 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID| TBL       |    10 |   180 |    12   (0)| 00:00:01 |
|*  4 |     INDEX RANGE SCAN          | IDX_TBL_3 |       |       |     3   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(ROWNUM<10)
   4 - access("C1"=200 AND "C2"=200)


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
          6  consistent gets
          2  physical reads
          0  redo size
        937  bytes sent via SQL*Net to client
        500  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          9  rows processed

同时也发现一个问题,Oracle可能无法使用index filter来优化,例如将index2,index3删除后,留下ID索引,Oracle无法走索引,而PG可以。

SQL> select * from (select * from tbl where c1=1 and c2 =1 order by id) t where rownum < 10;

        ID         C1         C2         C3         C4
---------- ---------- ---------- ---------- ----------
      9697          1          1         78         39
     20586          1          1         81         71
     27820          1          1         33         64
     44324          1          1         26         27
     47079          1          1          3          5
     64669          1          1         13         49
     73715          1          1         20         74
     80903          1          1         96         25
     98368          1          1         59          9

9 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 745043579

--------------------------------------------------------------------------------
| Id  | Operation               | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |      |     9 |   585 |  9766   (2)| 00:01:58 |
|*  1 |  COUNT STOPKEY          |      |       |       |            |          |
|   2 |   VIEW                  |      |   704 | 45760 |  9766   (2)| 00:01:58 |
|*  3 |    SORT ORDER BY STOPKEY|      |   704 | 12672 |  9766   (2)| 00:01:58 |
|*  4 |     TABLE ACCESS FULL   | TBL  |   704 | 12672 |  9765   (2)| 00:01:58 |
--------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(ROWNUM<10)
   3 - filter(ROWNUM<10)
   4 - filter("C1"=1 AND "C2"=1)


Statistics
----------------------------------------------------------
        186  recursive calls
          0  db block gets
      34893  consistent gets
          0  physical reads
          0  redo size
        969  bytes sent via SQL*Net to client
        500  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          7  sorts (memory)
          0  sorts (disk)
          9  rows processed
  

PG


postgres=# explain analyze select * from tbl where c1=1 and c2 =1 order by id limit 10;
                                                          QUERY PLAN                                                           
-------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.43..3703.11 rows=10 width=20) (actual time=7.199..23.926 rows=10 loops=1)
   ->  Index Scan using idx_tbl_1 on tbl  (cost=0.43..323243.84 rows=873 width=20) (actual time=7.198..23.921 rows=10 loops=1)
         Filter: ((c1 = 1) AND (c2 = 1))
         Rows Removed by Filter: 142814
 Planning time: 0.119 ms
 Execution time: 23.950 ms
(6 rows)

小结

当SQL查询中包括排序,以及其他字段的过滤条件,并使用LIMIT快速返回少量数据时,如果满足条件的数据分布在排序键的末端,那么优化器给出的执行计划可能是不好的,导致通过排序索引扫描更多的数据后才能命中需要的记录。

然而,数据库目前使用的评估走排序键时,LIMIT需要扫描多少条记录,使用了数据均匀分布的假设,所以在数据(满足条件的数据与排序键本身的相关性不均匀)分布不均匀时,导致成本估算不准(oracle干脆走全表扫描)。

建议优化方法:

增加索引,创建等值查询条件列(s)加排序列(s)组成的复合索引,降低扫描量。

例子

select * from tbl where c1=200 and c2 between 100 and 300 order by id limit 10;  
  
增加索引  
  
(c1,id)  -- 索引扫描, filter c2  
  
已有  
(c1,c2)  -- 索引扫描, sort id  
(id)     -- 索引扫描, filter c1,c2  
select * from tbl where c1=200 and c2 =200 order by id limit 10;  
  
增加索引  
  
(c1,c2,id)  -- 索引扫描  
  
已有  
(c1,c2)  -- 索引扫描, sort id  
(id)     -- 索引扫描, filter c1,c2  

参考

《PostgreSQL 10 黑科技 - 自定义统计信息》

相关实践学习
使用PolarDB和ECS搭建门户网站
本场景主要介绍基于PolarDB和ECS实现搭建门户网站。
阿里云数据库产品家族及特性
阿里云智能数据库产品团队一直致力于不断健全产品体系,提升产品性能,打磨产品功能,从而帮助客户实现更加极致的弹性能力、具备更强的扩展能力、并利用云设施进一步降低企业成本。以云原生+分布式为核心技术抓手,打造以自研的在线事务型(OLTP)数据库Polar DB和在线分析型(OLAP)数据库Analytic DB为代表的新一代企业级云原生数据库产品体系, 结合NoSQL数据库、数据库生态工具、云原生智能化数据库管控平台,为阿里巴巴经济体以及各个行业的企业客户和开发者提供从公共云到混合云再到私有云的完整解决方案,提供基于云基础设施进行数据从处理、到存储、再到计算与分析的一体化解决方案。本节课带你了解阿里云数据库产品家族及特性。
目录
相关文章
|
3月前
|
监控 关系型数据库 数据库
PostgreSQL的索引优化策略?
【8月更文挑战第26天】PostgreSQL的索引优化策略?
94 1
|
20天前
|
关系型数据库 MySQL PostgreSQL
postgresql和mysql中的limit使用方法
postgresql和mysql中的limit使用方法
35 1
|
3月前
|
SQL 关系型数据库 MySQL
SQL Server、MySQL、PostgreSQL:主流数据库SQL语法异同比较——深入探讨数据类型、分页查询、表创建与数据插入、函数和索引等关键语法差异,为跨数据库开发提供实用指导
【8月更文挑战第31天】SQL Server、MySQL和PostgreSQL是当今最流行的关系型数据库管理系统,均使用SQL作为查询语言,但在语法和功能实现上存在差异。本文将比较它们在数据类型、分页查询、创建和插入数据以及函数和索引等方面的异同,帮助开发者更好地理解和使用这些数据库。尽管它们共用SQL语言,但每个系统都有独特的语法规则,了解这些差异有助于提升开发效率和项目成功率。
364 0
|
3月前
|
关系型数据库 数据库 PostgreSQL
PostgreSQL索引维护看完这篇就够了
PostgreSQL索引维护看完这篇就够了
283 0
|
关系型数据库 Go 数据库
《提高查询速度:PostgreSQL索引实用指南》
《提高查询速度:PostgreSQL索引实用指南》
586 0
|
6月前
|
SQL 关系型数据库 数据库
RDS PostgreSQL索引推荐原理及最佳实践
前言很多开发人员都知道索引对于数据库的查询性能至关重要,一个好的索引能使数据库的性能提升成千上万倍。但给数据库加索引是一项相对专业的工作,需要对数据库的运行原理有一定了解。同时,加了索引有没有性能提升、性能提升了多少,这些都是加索引前就想知道的。这项繁杂的工作有没有更好的方案呢?有!就是今天重磅推出...
116 1
RDS PostgreSQL索引推荐原理及最佳实践
|
关系型数据库 分布式数据库 PolarDB
《阿里云产品手册2022-2023 版》——PolarDB for PostgreSQL
《阿里云产品手册2022-2023 版》——PolarDB for PostgreSQL
363 0
|
存储 缓存 关系型数据库
|
存储 SQL 并行计算
PolarDB for PostgreSQL 开源必读手册-开源PolarDB for PostgreSQL架构介绍(中)
PolarDB for PostgreSQL 开源必读手册-开源PolarDB for PostgreSQL架构介绍
419 0
|
存储 算法 安全
PolarDB for PostgreSQL 开源必读手册-开源PolarDB for PostgreSQL架构介绍(下)
PolarDB for PostgreSQL 开源必读手册-开源PolarDB for PostgreSQL架构介绍
379 0

相关产品

  • 云原生数据库 PolarDB
  • 云数据库 RDS PostgreSQL 版