PostgreSQL开发小讲堂 - 不等于、数组不包含 要不要用索引?

本文涉及的产品
云原生数据库 PolarDB MySQL 版,Serverless 5000PCU 100GB
云原生数据库 PolarDB 分布式版,标准版 2核8GB
云数据库 RDS MySQL Serverless,0.5-2RCU 50GB
简介:

标签

PostgreSQL , 不等于 , 索引 , 外连接


背景

在数据库中不等于能不能走索引呢?理论上是不行的,但是有方法可以让不等于也走索引(虽然走索引也不一定好)。

比如使用外连接实现(需要表有PK才行,没有PK可以使用行号代替),或者使用not exists,或者使用partial index(不支持变量)。

对于返回结果集很大的场景,建议使用游标分页返回,此时可能用全表扫描更适合。

例子1, 有PK

create table tbl_pk(id int primary key, c1 int);  
create index idx_tbl_pk on tbl_pk(c1);  
insert into tbl_pk select generate_series(1,1000000), random()*10000;  

原始方法,不能走索引
select * from tbl_pk where c1 <> 1;

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from tbl_pk where c1 <> 1;

QUERY PLAN

Seq Scan on public.tbl_pk (cost=0.00..16925.00 rows=999902 width=8) (actual time=0.020..182.603 rows=999919 loops=1)
Output: id, c1
Filter: (tbl_pk.c1 <> 1)
Rows Removed by Filter: 81
Buffers: shared hit=4425
Planning time: 0.486 ms
Execution time: 249.335 ms
(7 rows)

让他走索引的写法

postgres=# explain (analyze,verbose,timing,costs,buffers) select t1.* from tbl_pk t1 left join tbl_pk t2 on (t1.id=t2.id and t2.c1=1) where t2.* is null;  
                                                            QUERY PLAN                                                               
-----------------------------------------------------------------------------------------------------------------------------------  
 Hash Left Join  (cost=352.55..18528.53 rows=5000 width=8) (actual time=0.229..395.158 rows=999919 loops=1)  
   Output: t1.id, t1.c1  
   Hash Cond: (t1.id = t2.id)  
   Filter: (t2.* IS NULL)  
   Rows Removed by Filter: 81  
   Buffers: shared hit=4509  
   ->  Seq Scan on public.tbl_pk t1  (cost=0.00..14425.00 rows=1000000 width=8) (actual time=0.009..128.644 rows=1000000 loops=1)  
         Output: t1.id, t1.c1  
         Buffers: shared hit=4425  
   ->  Hash  (cost=351.32..351.32 rows=98 width=36) (actual time=0.211..0.211 rows=81 loops=1)  
         Output: t2.id, t2.*  
         Buckets: 1024  Batches: 1  Memory Usage: 6kB  
         Buffers: shared hit=84  
         ->  Bitmap Heap Scan on public.tbl_pk t2  (cost=5.18..351.32 rows=98 width=36) (actual time=0.044..0.185 rows=81 loops=1)  
               Output: t2.id, t2.*  
               Recheck Cond: (t2.c1 = 1)  
               Heap Blocks: exact=81  
               Buffers: shared hit=84  
               ->  Bitmap Index Scan on idx_tbl_pk  (cost=0.00..5.16 rows=98 width=0) (actual time=0.025..0.025 rows=81 loops=1)  
                     Index Cond: (t2.c1 = 1)  
                     Buffers: shared hit=3  
 Planning time: 0.289 ms  
 Execution time: 461.386 ms  
(23 rows)  

postgres=# set enable_seqscan=off;  
SET  

postgres=# explain (analyze,verbose,timing,costs,buffers) select t1.* from tbl_pk t1 left join tbl_pk t2 on (t1.id=t2.id and t2.c1=1) where t2.* is null;  
                                                                      QUERY PLAN                                                                        
------------------------------------------------------------------------------------------------------------------------------------------------------  
 Merge Left Join  (cost=354.99..33264.46 rows=5000 width=8) (actual time=0.224..482.241 rows=999919 loops=1)  
   Output: t1.id, t1.c1  
   Merge Cond: (t1.id = t2.id)  
   Filter: (t2.* IS NULL)  
   Rows Removed by Filter: 81  
   Buffers: shared hit=7244  
   ->  Index Scan using tbl_pk_pkey on public.tbl_pk t1  (cost=0.42..30408.42 rows=1000000 width=8) (actual time=0.008..215.816 rows=1000000 loops=1)  
         Output: t1.id, t1.c1  
         Buffers: shared hit=7160  
   ->  Sort  (cost=354.57..354.81 rows=98 width=36) (actual time=0.211..0.233 rows=81 loops=1)  
         Output: t2.id, t2.*  
         Sort Key: t2.id  
         Sort Method: quicksort  Memory: 31kB  
         Buffers: shared hit=84  
         ->  Bitmap Heap Scan on public.tbl_pk t2  (cost=5.18..351.32 rows=98 width=36) (actual time=0.046..0.183 rows=81 loops=1)  
               Output: t2.id, t2.*  
               Recheck Cond: (t2.c1 = 1)  
               Heap Blocks: exact=81  
               Buffers: shared hit=84  
               ->  Bitmap Index Scan on idx_tbl_pk  (cost=0.00..5.16 rows=98 width=0) (actual time=0.026..0.026 rows=81 loops=1)  
                     Index Cond: (t2.c1 = 1)  
                     Buffers: shared hit=3  
 Planning time: 0.275 ms  
 Execution time: 548.991 ms  
(24 rows)  

postgres=# set enable_sort=off;  
SET  
postgres=# explain (analyze,verbose,timing,costs,buffers) select t1.* from tbl_pk t1 left join tbl_pk t2 on (t1.id=t2.id and t2.c1=1) where t2.* is null;  
                                                                      QUERY PLAN                                                                        
------------------------------------------------------------------------------------------------------------------------------------------------------  
 Hash Left Join  (cost=352.97..34511.95 rows=5000 width=8) (actual time=0.235..494.068 rows=999919 loops=1)  
   Output: t1.id, t1.c1  
   Hash Cond: (t1.id = t2.id)  
   Filter: (t2.* IS NULL)  
   Rows Removed by Filter: 81  
   Buffers: shared hit=7244  
   ->  Index Scan using tbl_pk_pkey on public.tbl_pk t1  (cost=0.42..30408.42 rows=1000000 width=8) (actual time=0.008..223.681 rows=1000000 loops=1)  
         Output: t1.id, t1.c1  
         Buffers: shared hit=7160  
   ->  Hash  (cost=351.32..351.32 rows=98 width=36) (actual time=0.218..0.218 rows=81 loops=1)  
         Output: t2.id, t2.*  
         Buckets: 1024  Batches: 1  Memory Usage: 6kB  
         Buffers: shared hit=84  
         ->  Bitmap Heap Scan on public.tbl_pk t2  (cost=5.18..351.32 rows=98 width=36) (actual time=0.051..0.201 rows=81 loops=1)  
               Output: t2.id, t2.*  
               Recheck Cond: (t2.c1 = 1)  
               Heap Blocks: exact=81  
               Buffers: shared hit=84  
               ->  Bitmap Index Scan on idx_tbl_pk  (cost=0.00..5.16 rows=98 width=0) (actual time=0.031..0.031 rows=81 loops=1)  
                     Index Cond: (t2.c1 = 1)  
                     Buffers: shared hit=3  
 Planning time: 0.274 ms  
 Execution time: 560.676 ms  
(23 rows)  

postgres=# set enable_hashjoin=off;  
SET  
postgres=# explain (analyze,verbose,timing,costs,buffers) select t1.* from tbl_pk t1 left join tbl_pk t2 on (t1.id=t2.id and t2.c1=1) where t2.* is null;  
                                                                      QUERY PLAN                                                                        
------------------------------------------------------------------------------------------------------------------------------------------------------  
 Merge Left Join  (cost=0.85..65818.07 rows=5000 width=8) (actual time=1.080..664.991 rows=999919 loops=1)  
   Output: t1.id, t1.c1  
   Merge Cond: (t1.id = t2.id)  
   Filter: (t2.* IS NULL)  
   Rows Removed by Filter: 81  
   Buffers: shared hit=14320  
   ->  Index Scan using tbl_pk_pkey on public.tbl_pk t1  (cost=0.42..30408.42 rows=1000000 width=8) (actual time=0.007..208.319 rows=1000000 loops=1)  
         Output: t1.id, t1.c1  
         Buffers: shared hit=7160  
   ->  Index Scan using tbl_pk_pkey on public.tbl_pk t2  (cost=0.42..32908.43 rows=98 width=36) (actual time=1.067..193.630 rows=81 loops=1)  
         Output: t2.id, t2.*  
         Filter: (t2.c1 = 1)  
         Rows Removed by Filter: 999919  
         Buffers: shared hit=7160  
 Planning time: 0.261 ms  
 Execution time: 732.070 ms  
(16 rows)  

例子2, 没有PK

没有PK时,可以使用行号进行关联。

postgres=# create index tbl_pk_idx2 on tbl_pk (ctid);  
CREATE INDEX  

postgres=# explain (analyze,verbose,timing,costs,buffers) select t1.* from tbl_pk t1 left join tbl_pk t2 on (t1.ctid=t2.ctid and t2.c1=1) where t2.* is null;  
                                                                      QUERY PLAN                                                                         
-------------------------------------------------------------------------------------------------------------------------------------------------------  
 Merge Left Join  (cost=0.85..92362.07 rows=5000 width=8) (actual time=1.232..720.467 rows=999919 loops=1)  
   Output: t1.id, t1.c1  
   Merge Cond: (t1.ctid = t2.ctid)  
   Filter: (t2.* IS NULL)  
   Rows Removed by Filter: 81  
   Buffers: shared hit=11585 read=2735  
   ->  Index Scan using tbl_pk_idx2 on public.tbl_pk t1  (cost=0.42..43680.43 rows=1000000 width=14) (actual time=0.041..268.576 rows=1000000 loops=1)  
         Output: t1.id, t1.c1, t1.ctid  
         Buffers: shared hit=7157 read=3  
   ->  Index Scan using tbl_pk_idx2 on public.tbl_pk t2  (cost=0.42..46180.43 rows=98 width=38) (actual time=1.186..213.849 rows=81 loops=1)  
         Output: t2.ctid, t2.*  
         Filter: (t2.c1 = 1)  
         Rows Removed by Filter: 999919  
         Buffers: shared hit=4428 read=2732  
 Planning time: 0.336 ms  
 Execution time: 787.713 ms  
(16 rows)  

例子3, 数组查询中的不包含

postgres=# create table tbl_arr(id int[], c1 text);  
CREATE TABLE  
postgres=# insert into tbl_arr select array_agg(100000*random()) from generate_series(1,100);  
INSERT 0 1  
postgres=# insert into tbl_arr select array_agg(100000*random()) from generate_series(1,100);  
INSERT 0 1  
postgres=# insert into tbl_arr select array_agg(100000*random()) from generate_series(1,100);  
INSERT 0 1  
postgres=# insert into tbl_arr select array_agg(100000*random()) from generate_series(1,100);  
INSERT 0 1  
postgres=# insert into tbl_arr select array_agg(100000*random()) from generate_series(1,100);  
INSERT 0 1  
postgres=# insert into tbl_arr select array_agg(100000*random()) from generate_series(1,100);  
INSERT 0 1  
postgres=# insert into tbl_arr select array_agg(100000*random()) from generate_series(1,100);  
INSERT 0 1  
postgres=# insert into tbl_arr select array_agg(100000*random()) from generate_series(1,100);  
INSERT 0 1  
postgres=# insert into tbl_arr select array_agg(100000*random()) from generate_series(1,100);  
INSERT 0 1  
postgres=# insert into tbl_arr select array_agg(100000*random()) from generate_series(1,100);  
INSERT 0 1  
postgres=# create index idx_tbl_arr_1 on tbl_arr using gin (id);  
CREATE INDEX  
postgres=# create index idx_tbl_arr_2 on tbl_arr (ctid);  
CREATE INDEX  

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from tbl_arr where not (id @> array[1,2]);  
                                                QUERY PLAN                                                   
-----------------------------------------------------------------------------------------------------------  
 Seq Scan on public.tbl_arr  (cost=0.00..1.12 rows=10 width=64) (actual time=0.017..0.029 rows=10 loops=1)  
   Output: id, c1  
   Filter: (NOT (tbl_arr.id @> '{1,2}'::integer[]))  
   Buffers: shared hit=1  
 Planning time: 0.086 ms  
 Execution time: 0.051 ms  
(6 rows)  

postgres=# explain (analyze,verbose,timing,costs,buffers) select t1.* from tbl_arr t1 left join tbl_arr t2 on (t1.ctid=t2.ctid and t2.id @> array[1,2]) where t2.* is null;  
                                                                QUERY PLAN                                                                   
-------------------------------------------------------------------------------------------------------------------------------------------  
 Merge Left Join  (cost=0.27..24.63 rows=1 width=64) (actual time=0.133..0.137 rows=10 loops=1)  
   Output: t1.id, t1.c1  
   Merge Cond: (t1.ctid = t2.ctid)  
   Filter: (t2.* IS NULL)  
   Buffers: shared hit=3 read=1  
   ->  Index Scan using idx_tbl_arr_2 on public.tbl_arr t1  (cost=0.14..12.29 rows=10 width=70) (actual time=0.108..0.110 rows=10 loops=1)  
         Output: t1.id, t1.c1, t1.ctid  
         Buffers: shared hit=1 read=1  
   ->  Index Scan using idx_tbl_arr_2 on public.tbl_arr t2  (cost=0.14..12.31 rows=1 width=94) (actual time=0.022..0.022 rows=0 loops=1)  
         Output: t2.ctid, t2.*  
         Filter: (t2.id @> '{1,2}'::integer[])  
         Rows Removed by Filter: 10  
         Buffers: shared hit=2  
 Planning time: 0.193 ms  
 Execution time: 0.173 ms  
(15 rows)  

postgres=# explain (analyze,verbose,timing,costs,buffers) select t1.* from tbl_arr t1 where not exists (select 1 from tbl_arr t2 where t1.ctid=t2.ctid and t2.id @> array[1,2]);
                                                                QUERY PLAN                                                                 
-------------------------------------------------------------------------------------------------------------------------------------------
 Merge Anti Join  (cost=0.27..24.63 rows=9 width=64) (actual time=0.023..0.026 rows=10 loops=1)
   Output: t1.id, t1.c1
   Merge Cond: (t1.ctid = t2.ctid)
   Buffers: shared hit=4
   ->  Index Scan using idx_tbl_arr_2 on public.tbl_arr t1  (cost=0.14..12.29 rows=10 width=70) (actual time=0.004..0.005 rows=10 loops=1)
         Output: t1.id, t1.c1, t1.ctid
         Buffers: shared hit=2
   ->  Index Scan using idx_tbl_arr_2 on public.tbl_arr t2  (cost=0.14..12.31 rows=1 width=6) (actual time=0.017..0.017 rows=0 loops=1)
         Output: t2.ctid
         Filter: (t2.id @> '{1,2}'::integer[])
         Rows Removed by Filter: 10
         Buffers: shared hit=2
 Planning time: 0.103 ms
 Execution time: 0.047 ms
(14 rows)
相关实践学习
使用PolarDB和ECS搭建门户网站
本场景主要介绍基于PolarDB和ECS实现搭建门户网站。
阿里云数据库产品家族及特性
阿里云智能数据库产品团队一直致力于不断健全产品体系,提升产品性能,打磨产品功能,从而帮助客户实现更加极致的弹性能力、具备更强的扩展能力、并利用云设施进一步降低企业成本。以云原生+分布式为核心技术抓手,打造以自研的在线事务型(OLTP)数据库Polar DB和在线分析型(OLAP)数据库Analytic DB为代表的新一代企业级云原生数据库产品体系, 结合NoSQL数据库、数据库生态工具、云原生智能化数据库管控平台,为阿里巴巴经济体以及各个行业的企业客户和开发者提供从公共云到混合云再到私有云的完整解决方案,提供基于云基础设施进行数据从处理、到存储、再到计算与分析的一体化解决方案。本节课带你了解阿里云数据库产品家族及特性。
相关文章
|
29天前
|
关系型数据库 分布式数据库 数据库
PolarDB常见问题之加了索引但是查询没有使用如何解决
PolarDB是阿里云推出的下一代关系型数据库,具有高性能、高可用性和弹性伸缩能力,适用于大规模数据处理场景。本汇总囊括了PolarDB使用中用户可能遭遇的一系列常见问题及解答,旨在为数据库管理员和开发者提供全面的问题指导,确保数据库平稳运行和优化使用体验。
|
3月前
|
Cloud Native 关系型数据库 分布式数据库
|
5月前
|
关系型数据库 数据库 C语言
PostgreSQL服务端开发学习 -- Datum
在使用C语言开发PostgreSQL后端、客户端应用时,Datum无处不在,所以必须要对Datum有很清楚的了解。
129 2
|
6月前
|
关系型数据库 开发工具 C语言
PostgreSQL libpq开发入门
简单入门C语言开发基于PostgreSQL libpq应用
102 0
|
5月前
|
关系型数据库 C语言 PostgreSQL
PostgreSQL服务端开发学习 --- 常用结构及宏定义1
本篇主要讲解使用C语言开发PostgreSQL服务端应用(libpq、自定义函数、扩展)常用到的结构及宏定义。
95 0
|
5月前
|
关系型数据库 C语言 PostgreSQL
PostgreSQL服务端开发学习 -- fmgr.h
fmgr按官方的解释就是Postgres函数管理器和函数调用接口,在使用C语言开发PostgreSQL后端应用时,所以与backend交互时必须遵循fmgr.h中定义的一些规范。
133 0
|
2月前
|
SQL 算法 关系型数据库
PolarDB-X的XPlan索引选择
对于数据库来说,正确的选择索引是基本的要求,选错索引轻则导致查询缓慢,重则导致数据库整体不可用。PolarDB-X存在多种不同的索引,局部索引、全局索引、列存索引、归档表索引。本文主要介绍一种CN上的局部索引算法:XPlan索引选择。
125754 13
PolarDB-X的XPlan索引选择
|
3月前
|
关系型数据库 分布式数据库 数据库
阿里云PolarDB开发者大会首度召开,让数据库开发像“搭积木”一样简单
阿里云PolarDB开发者大会首度召开,让数据库开发像“搭积木”一样简单
109 0
|
3月前
|
关系型数据库 定位技术 索引
在关系型数据库中,常见的索引种类包括哪些
在关系型数据库中,常见的索引种类包括哪些
486 0
|
3月前
|
关系型数据库 分布式数据库 数据库
家人们谁懂啊?为了让你们免费体验PolarDB,我们开发了一个动手体验搭子!
抢鲜体验赢好礼,阿里云定制折叠伞和定制双肩包等你拿!
家人们谁懂啊?为了让你们免费体验PolarDB,我们开发了一个动手体验搭子!

相关产品

  • 云原生数据库 PolarDB