PostgreSQL Oracle 兼容性之 - INDEX SKIP SCAN (递归查询变态优化) 非驱动列索引扫描优化

本文涉及的产品
云数据库 RDS SQL Server,基础系列 2核4GB
RDS PostgreSQL Serverless,0.5-4RCU 50GB 3个月
推荐场景:
对影评进行热评分析
云原生数据库 PolarDB 分布式版,标准版 2核8GB
简介:

标签

PostgreSQL , Oracle , index skip scan , 非驱动列条件 , 递归查询 , 子树


背景

对于输入条件在复合索引中为非驱动列的,如何高效的利用索引扫描?

在Oracle中可以使用index skip scan来实现这类CASE的高效扫描:

INDEX跳跃扫描一般用在WHERE条件里面没有使用到引导列,但是用到了引导列以外的其他列,并且引导列的DISTINCT值较少的情况。

在这种情况下,数据库把这个复合索引逻辑上拆散为多个子索引,依次搜索子索引中非引导列的WHERE条件里面的值。

使用方法如下:

/*+ INDEX_SS ( [ @ qb_name ] tablespec [ indexspec [ indexspec ]... ] ) */  

The INDEX_SS hint instructs the optimizer to perform an index skip scan for the specified table. If the statement uses an index range scan, then Oracle scans the index entries in ascending order of their indexed values. In a partitioned index, the results are in ascending order within each partition.Each parameter serves the same purpose as in "INDEX Hint". For example:

SELECT /*+ INDEX_SS(e emp_name_ix) */ last_name FROM employees e WHERE first_name = 'Steven';  

下面是来自ORACLE PERFORMANCE TUNING里的原文:

Index skip scans improve index scans by nonprefix columns. Often, scanning index blocks is faster than scanning table data blocks.

Skip scanning lets a composite index be split logically into smaller subindexes. In skip scanning, the initial column of the composite index is not specified in the query. In other words, it is skipped.

The number of logical subindexes is determined by the number of distinct values in the initial column. Skip scanning is advantageous if there are few distinct values in the leading column of the composite index and many distinct values in the nonleading key of the index.

Example 13-5 Index Skip Scan

Consider, for example, a table

employees(  
sex,   
employee_id,  
address  
)   

with a composite index on

(sex, employee_id).   

Splitting this composite index would result in two logical subindexes, one for M and one for F.

For this example, suppose you have the following index data:

('F',98)('F',100)('F',102)('F',104)('M',101)('M',103)('M',105)  

The index is split logically into the following two subindexes:

The first subindex has the keys with the value F.

The second subindex has the keys with the value M

pic

The column sex is skipped in the following query:

SELECT * FROM employeesWHERE employee_id = 101;  

A complete scan of the index is not performed, but the subindex with the value F is searched first, followed by a search of the subindex with the value M.

PostgreSQL 非skip scan

PostgreSQL支持非驱动列的索引扫描,但是需要扫描整个索引。

例子

1、创建测试表

postgres=# create table t(id int, c1 int);  
CREATE TABLE  

2、写入1000万测试数据

postgres=# insert into t select random()*1 , id from generate_series(1,10000000) id;  
INSERT 0 10000000  

3、创建多列索引

postgres=# create index idx_t on t(id,c1);  
CREATE INDEX  

4、非驱动列查询测试如下

index only scan

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from t where c1=1;  
                                                                QUERY PLAN                                                                   
-------------------------------------------------------------------------------------------------------------------------------------------  
 Index Only Scan using idx_t on public.t  (cost=10000000000.43..10000105164.89 rows=1 width=8) (actual time=0.043..152.288 rows=1 loops=1)  
   Output: id, c1  
   Index Cond: (t.c1 = 1)  
   Heap Fetches: 0  
   Buffers: shared hit=27326  
 Execution time: 152.328 ms  
(6 rows)  

index scan

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from t where c1=1;  
                                                      QUERY PLAN                                                         
-----------------------------------------------------------------------------------------------------------------------  
 Index Scan using idx_t on public.t  (cost=0.43..105165.99 rows=1 width=8) (actual time=0.022..151.845 rows=1 loops=1)  
   Output: id, c1  
   Index Cond: (t.c1 = 1)  
   Buffers: shared hit=27326  
 Execution time: 151.881 ms  
(5 rows)  

bitmap scan

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from t where c1=1;  
                                                       QUERY PLAN                                                         
------------------------------------------------------------------------------------------------------------------------  
 Bitmap Heap Scan on public.t  (cost=105164.88..105166.00 rows=1 width=8) (actual time=151.731..151.732 rows=1 loops=1)  
   Output: id, c1  
   Recheck Cond: (t.c1 = 1)  
   Heap Blocks: exact=1  
   Buffers: shared hit=27326  
   ->  Bitmap Index Scan on idx_t  (cost=0.00..105164.88 rows=1 width=0) (actual time=151.721..151.721 rows=1 loops=1)  
         Index Cond: (t.c1 = 1)  
         Buffers: shared hit=27325  
 Execution time: 151.777 ms  
(9 rows)  

seq scan(全表扫描)

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from t where c1=1;  
                                               QUERY PLAN                                                  
---------------------------------------------------------------------------------------------------------  
 Seq Scan on public.t  (cost=0.00..169248.41 rows=1 width=8) (actual time=0.014..594.535 rows=1 loops=1)  
   Output: id, c1  
   Filter: (t.c1 = 1)  
   Rows Removed by Filter: 9999999  
   Buffers: shared hit=44248  
 Execution time: 594.568 ms  
(6 rows)  

使用索引扫,因为不需要FILTER,同时扫描的BLOCK更少,所以性能比全表扫略好。但是还是扫了整个索引的PAGE,所以并不能算skip scan。

那么如何让PostgreSQL支持index skip scan呢?

PostgreSQL skip scan

实际上原理和Oracle类似,可以输入驱动列条件,然后按多个条件扫描,这样就能达到SKIP SCAN的效果。(即多颗子树扫描)。

同样也更加适合于驱动列DISTINCT值较少的情况。

用PostgreSQL的递归查询语法可以实现这样的加速效果。这种方法也被用于获取count(distinct), distinct值等。

《distinct xx和count(distinct xx)的变态递归优化方法 - 索引收敛(skip scan)扫描》

例如,我们通过这个方法,可以快速的得到驱动列的唯一值

with recursive skip as (    
  (    
    select min(t.id) as id from t where t.id is not null    
  )    
  union all    
  (    
    select (select min(t.id) as id from t where t.id > s.id and t.id is not null)     
      from skip s where s.id is not null    
  )  -- 这里的where s.id is not null 一定要加,否则就死循环了.    
)     
select id from skip ;  

然后封装到如下SQL,实现skip scan的效果

explain (analyze,verbose,timing,costs,buffers) select * from t where id in  
(  
with recursive skip as (    
  (    
    select min(t.id) as id from t where t.id is not null    
  )    
  union all    
  (    
    select (select min(t.id) as id from t where t.id > s.id and t.id is not null)     
      from skip s where s.id is not null    
  )  -- 这里的where s.id is not null 一定要加,否则就死循环了.    
)     
select id from skip   
) and c1=1  
union all   
select * from t where id is null and c1=1;  

或者

explain (analyze,verbose,timing,costs,buffers) select * from t where id = any(array  
(  
with recursive skip as (    
  (    
    select min(t.id) as id from t where t.id is not null    
  )    
  union all    
  (    
    select (select min(t.id) as id from t where t.id > s.id and t.id is not null)     
      from skip s where s.id is not null    
  )  -- 这里的where s.id is not null 一定要加,否则就死循环了.    
)     
select id from skip   
)) and c1=1  
union all   
select * from t where id is null and c1=1;  

看执行计划:

效果好多了

  
                                                                                       QUERY PLAN                                                                                          
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  
 Append  (cost=55.00..215.22 rows=2 width=8) (actual time=0.127..0.138 rows=1 loops=1)  
   Buffers: shared hit=21  
   ->  Nested Loop  (cost=55.00..213.64 rows=1 width=8) (actual time=0.126..0.127 rows=1 loops=1)  
         Output: t.id, t.c1  
         Buffers: shared hit=18  
         ->  HashAggregate  (cost=54.57..55.58 rows=101 width=4) (actual time=0.108..0.109 rows=3 loops=1)  
               Output: skip.id  
               Group Key: skip.id  
               Buffers: shared hit=11  
               ->  CTE Scan on skip  (cost=51.29..53.31 rows=101 width=4) (actual time=0.052..0.102 rows=3 loops=1)  
                     Output: skip.id  
                     Buffers: shared hit=11  
                     CTE skip  
                       ->  Recursive Union  (cost=0.46..51.29 rows=101 width=4) (actual time=0.050..0.099 rows=3 loops=1)  
                             Buffers: shared hit=11  
                             ->  Result  (cost=0.46..0.47 rows=1 width=4) (actual time=0.049..0.049 rows=1 loops=1)  
                                   Output: $1  
                                   Buffers: shared hit=4  
                                   InitPlan 3 (returns $1)  
                                     ->  Limit  (cost=0.43..0.46 rows=1 width=4) (actual time=0.045..0.046 rows=1 loops=1)  
                                           Output: t_3.id  
                                           Buffers: shared hit=4  
                                           ->  Index Only Scan using idx_t on public.t t_3  (cost=0.43..205165.21 rows=10000033 width=4) (actual time=0.045..0.045 rows=1 loops=1)  
                                                 Output: t_3.id  
                                                 Index Cond: (t_3.id IS NOT NULL)  
                                                 Heap Fetches: 0  
                                                 Buffers: shared hit=4  
                             ->  WorkTable Scan on skip s  (cost=0.00..4.88 rows=10 width=4) (actual time=0.015..0.015 rows=1 loops=3)  
                                   Output: (SubPlan 2)  
                                   Filter: (s.id IS NOT NULL)  
                                   Rows Removed by Filter: 0  
                                   Buffers: shared hit=7  
                                   SubPlan 2  
                                     ->  Result  (cost=0.46..0.47 rows=1 width=4) (actual time=0.018..0.019 rows=1 loops=2)  
                                           Output: $3  
                                           Buffers: shared hit=7  
                                           InitPlan 1 (returns $3)  
                                             ->  Limit  (cost=0.43..0.46 rows=1 width=4) (actual time=0.018..0.018 rows=0 loops=2)  
                                                   Output: t_2.id  
                                                   Buffers: shared hit=7  
                                                   ->  Index Only Scan using idx_t on public.t t_2  (cost=0.43..76722.42 rows=3333344 width=4) (actual time=0.017..0.017 rows=0 loops=2)  
                                                         Output: t_2.id  
                                                         Index Cond: ((t_2.id > s.id) AND (t_2.id IS NOT NULL))  
                                                         Heap Fetches: 0  
                                                         Buffers: shared hit=7  
         ->  Index Only Scan using idx_t on public.t  (cost=0.43..1.56 rows=1 width=8) (actual time=0.005..0.005 rows=0 loops=3)  
               Output: t.id, t.c1  
               Index Cond: ((t.id = skip.id) AND (t.c1 = 1))  
               Heap Fetches: 0  
               Buffers: shared hit=7  
   ->  Index Only Scan using idx_t on public.t t_1  (cost=0.43..1.56 rows=1 width=8) (actual time=0.010..0.010 rows=0 loops=1)  
         Output: t_1.id, t_1.c1  
         Index Cond: ((t_1.id IS NULL) AND (t_1.c1 = 1))  
         Heap Fetches: 0  
         Buffers: shared hit=3  
 Execution time: 0.256 ms  
(56 rows)  

从150多毫秒,降低到了0.256毫秒

内核层面优化

与Oracle做法类似,或者说与递归的做法类似。

使用这种方法来改进优化器,可以达到index skip scan的效果,而且不用改写SQL。

参考

《distinct xx和count(distinct xx)的变态递归优化方法 - 索引收敛(skip scan)扫描》

相关实践学习
使用PolarDB和ECS搭建门户网站
本场景主要介绍基于PolarDB和ECS实现搭建门户网站。
阿里云数据库产品家族及特性
阿里云智能数据库产品团队一直致力于不断健全产品体系,提升产品性能,打磨产品功能,从而帮助客户实现更加极致的弹性能力、具备更强的扩展能力、并利用云设施进一步降低企业成本。以云原生+分布式为核心技术抓手,打造以自研的在线事务型(OLTP)数据库Polar DB和在线分析型(OLAP)数据库Analytic DB为代表的新一代企业级云原生数据库产品体系, 结合NoSQL数据库、数据库生态工具、云原生智能化数据库管控平台,为阿里巴巴经济体以及各个行业的企业客户和开发者提供从公共云到混合云再到私有云的完整解决方案,提供基于云基础设施进行数据从处理、到存储、再到计算与分析的一体化解决方案。本节课带你了解阿里云数据库产品家族及特性。
目录
相关文章
|
1月前
|
SQL 存储 Oracle
【YashanDB观点】论Oracle兼容性,我们需要做什么
Oracle兼容性是目前国产数据库的关键任务之一,其直接影响到商业迁移的成本和竞争力。
33 8
|
30天前
|
SQL 存储 Oracle
【YashanDB观点】论Oracle兼容性,我们需要做什么
我们经常发现,部分国产数据库声称与 Oracle兼容性高达90%,但在实际迁移过程中,仍需要频繁地修改业务应用的代码。为何实现与Oracle高兼容度的数据库产品如此困难?其中一个重要原因是Oracle兼容性不仅是模仿,而是一个非常复杂和工程量庞大的逆向工程。其技术实现的复杂性以及多如牛毛的细节,足以让多数“年轻”的数据库团队望洋兴叹。YashanDB作为一款从核心理论到关键系统均为原创的数据库产品,从构建初期就具备了技术优势,在Oracle兼容性实现上,敢于亮剑并充分发挥工匠精神,不断打磨,努力构筑一个真正形神兼备的数据库产品。以下将从YashanDB SQL引擎技术、Oracle兼容性的开发
|
1月前
|
存储 关系型数据库 MySQL
MySQL中为什么要使用索引合并(Index Merge)?
通过这些内容的详细介绍和实际案例分析,希望能帮助您深入理解索引合并及其在MySQL中的
130 10
|
3月前
|
SQL Oracle 关系型数据库
[Oracle]索引
本文介绍了数据库索引的基本概念、优化查询的原理及分类。索引是一种数据结构(如B树或B+树),通过排序后的`rowid`来优化查询性能。文章详细解释了索引的构建过程、B+树的特点及其优势,并介绍了五种常见的索引类型:唯一索引、组合索引、反向键索引、位图索引和基于函数的索引。每种索引都有其适用场景和限制,帮助读者更好地理解和应用索引技术。
104 1
[Oracle]索引
|
7月前
|
关系型数据库 MySQL 测试技术
MySQL 报错 ERROR 1709: Index column size too large
MySQL 报错 ERROR 1709: Index column size too large
310 4
|
8月前
|
缓存 关系型数据库 MySQL
MySQL数据库——InnoDB引擎-架构-内存结构(Buffer Pool、Change Buffer、Adaptive Hash Index、Log Buffer)
MySQL数据库——InnoDB引擎-架构-内存结构(Buffer Pool、Change Buffer、Adaptive Hash Index、Log Buffer)
137 3
|
9月前
|
存储 关系型数据库 MySQL
MySQL的优化利器⭐️Multi Range Read与Covering Index是如何优化回表的?
本文以小白的视角使用通俗易懂的流程图深入浅出分析Multi Range Read与Covering Index是如何优化回表
|
9月前
|
存储 Oracle 关系型数据库
Oracle索引知识看这一篇就足够
Oracle索引知识看这一篇就足够
|
9月前
|
存储 Oracle 关系型数据库
Oracle 12c的多重索引:数据的“多维导航仪”
【4月更文挑战第19天】Oracle 12c的多重索引提升数据查询效率,如同多维导航仪。在同一表上创建针对不同列的多个索引,加速检索过程。虽然过多索引会增加存储和维护成本,但合理选择和使用索引策略,结合位图、函数索引等高级特性,能优化查询,应对复杂场景。数据管理员应善用这些工具,根据需求进行索引管理,支持企业数据分析。
|
9月前
|
SQL 关系型数据库 MySQL
mysql查询语句的访问方法const、ref、ref_or_null、range、index、all
mysql查询语句的访问方法const、ref、ref_or_null、range、index、all

相关产品

  • 云原生数据库 PolarDB
  • 云数据库 RDS PostgreSQL 版
  • 推荐镜像

    更多