标签
PostgreSQL , 不等于 , 索引 , 外连接
背景
在数据库中不等于能不能走索引呢?理论上是不行的,但是有方法可以让不等于也走索引(虽然走索引也不一定好)。
比如使用外连接实现(需要表有PK才行,没有PK可以使用行号代替),或者使用not exists,或者使用partial index(不支持变量)。
对于返回结果集很大的场景,建议使用游标分页返回,此时可能用全表扫描更适合。
例子1, 有PK
create table tbl_pk(id int primary key, c1 int);
create index idx_tbl_pk on tbl_pk(c1);
insert into tbl_pk select generate_series(1,1000000), random()*10000;
原始方法,不能走索引
select * from tbl_pk where c1 <> 1;
postgres=# explain (analyze,verbose,timing,costs,buffers) select * from tbl_pk where c1 <> 1;
QUERY PLAN
Seq Scan on public.tbl_pk (cost=0.00..16925.00 rows=999902 width=8) (actual time=0.020..182.603 rows=999919 loops=1)
Output: id, c1
Filter: (tbl_pk.c1 <> 1)
Rows Removed by Filter: 81
Buffers: shared hit=4425
Planning time: 0.486 ms
Execution time: 249.335 ms
(7 rows)
让他走索引的写法
postgres=# explain (analyze,verbose,timing,costs,buffers) select t1.* from tbl_pk t1 left join tbl_pk t2 on (t1.id=t2.id and t2.c1=1) where t2.* is null;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
Hash Left Join (cost=352.55..18528.53 rows=5000 width=8) (actual time=0.229..395.158 rows=999919 loops=1)
Output: t1.id, t1.c1
Hash Cond: (t1.id = t2.id)
Filter: (t2.* IS NULL)
Rows Removed by Filter: 81
Buffers: shared hit=4509
-> Seq Scan on public.tbl_pk t1 (cost=0.00..14425.00 rows=1000000 width=8) (actual time=0.009..128.644 rows=1000000 loops=1)
Output: t1.id, t1.c1
Buffers: shared hit=4425
-> Hash (cost=351.32..351.32 rows=98 width=36) (actual time=0.211..0.211 rows=81 loops=1)
Output: t2.id, t2.*
Buckets: 1024 Batches: 1 Memory Usage: 6kB
Buffers: shared hit=84
-> Bitmap Heap Scan on public.tbl_pk t2 (cost=5.18..351.32 rows=98 width=36) (actual time=0.044..0.185 rows=81 loops=1)
Output: t2.id, t2.*
Recheck Cond: (t2.c1 = 1)
Heap Blocks: exact=81
Buffers: shared hit=84
-> Bitmap Index Scan on idx_tbl_pk (cost=0.00..5.16 rows=98 width=0) (actual time=0.025..0.025 rows=81 loops=1)
Index Cond: (t2.c1 = 1)
Buffers: shared hit=3
Planning time: 0.289 ms
Execution time: 461.386 ms
(23 rows)
postgres=# set enable_seqscan=off;
SET
postgres=# explain (analyze,verbose,timing,costs,buffers) select t1.* from tbl_pk t1 left join tbl_pk t2 on (t1.id=t2.id and t2.c1=1) where t2.* is null;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------
Merge Left Join (cost=354.99..33264.46 rows=5000 width=8) (actual time=0.224..482.241 rows=999919 loops=1)
Output: t1.id, t1.c1
Merge Cond: (t1.id = t2.id)
Filter: (t2.* IS NULL)
Rows Removed by Filter: 81
Buffers: shared hit=7244
-> Index Scan using tbl_pk_pkey on public.tbl_pk t1 (cost=0.42..30408.42 rows=1000000 width=8) (actual time=0.008..215.816 rows=1000000 loops=1)
Output: t1.id, t1.c1
Buffers: shared hit=7160
-> Sort (cost=354.57..354.81 rows=98 width=36) (actual time=0.211..0.233 rows=81 loops=1)
Output: t2.id, t2.*
Sort Key: t2.id
Sort Method: quicksort Memory: 31kB
Buffers: shared hit=84
-> Bitmap Heap Scan on public.tbl_pk t2 (cost=5.18..351.32 rows=98 width=36) (actual time=0.046..0.183 rows=81 loops=1)
Output: t2.id, t2.*
Recheck Cond: (t2.c1 = 1)
Heap Blocks: exact=81
Buffers: shared hit=84
-> Bitmap Index Scan on idx_tbl_pk (cost=0.00..5.16 rows=98 width=0) (actual time=0.026..0.026 rows=81 loops=1)
Index Cond: (t2.c1 = 1)
Buffers: shared hit=3
Planning time: 0.275 ms
Execution time: 548.991 ms
(24 rows)
postgres=# set enable_sort=off;
SET
postgres=# explain (analyze,verbose,timing,costs,buffers) select t1.* from tbl_pk t1 left join tbl_pk t2 on (t1.id=t2.id and t2.c1=1) where t2.* is null;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------
Hash Left Join (cost=352.97..34511.95 rows=5000 width=8) (actual time=0.235..494.068 rows=999919 loops=1)
Output: t1.id, t1.c1
Hash Cond: (t1.id = t2.id)
Filter: (t2.* IS NULL)
Rows Removed by Filter: 81
Buffers: shared hit=7244
-> Index Scan using tbl_pk_pkey on public.tbl_pk t1 (cost=0.42..30408.42 rows=1000000 width=8) (actual time=0.008..223.681 rows=1000000 loops=1)
Output: t1.id, t1.c1
Buffers: shared hit=7160
-> Hash (cost=351.32..351.32 rows=98 width=36) (actual time=0.218..0.218 rows=81 loops=1)
Output: t2.id, t2.*
Buckets: 1024 Batches: 1 Memory Usage: 6kB
Buffers: shared hit=84
-> Bitmap Heap Scan on public.tbl_pk t2 (cost=5.18..351.32 rows=98 width=36) (actual time=0.051..0.201 rows=81 loops=1)
Output: t2.id, t2.*
Recheck Cond: (t2.c1 = 1)
Heap Blocks: exact=81
Buffers: shared hit=84
-> Bitmap Index Scan on idx_tbl_pk (cost=0.00..5.16 rows=98 width=0) (actual time=0.031..0.031 rows=81 loops=1)
Index Cond: (t2.c1 = 1)
Buffers: shared hit=3
Planning time: 0.274 ms
Execution time: 560.676 ms
(23 rows)
postgres=# set enable_hashjoin=off;
SET
postgres=# explain (analyze,verbose,timing,costs,buffers) select t1.* from tbl_pk t1 left join tbl_pk t2 on (t1.id=t2.id and t2.c1=1) where t2.* is null;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------
Merge Left Join (cost=0.85..65818.07 rows=5000 width=8) (actual time=1.080..664.991 rows=999919 loops=1)
Output: t1.id, t1.c1
Merge Cond: (t1.id = t2.id)
Filter: (t2.* IS NULL)
Rows Removed by Filter: 81
Buffers: shared hit=14320
-> Index Scan using tbl_pk_pkey on public.tbl_pk t1 (cost=0.42..30408.42 rows=1000000 width=8) (actual time=0.007..208.319 rows=1000000 loops=1)
Output: t1.id, t1.c1
Buffers: shared hit=7160
-> Index Scan using tbl_pk_pkey on public.tbl_pk t2 (cost=0.42..32908.43 rows=98 width=36) (actual time=1.067..193.630 rows=81 loops=1)
Output: t2.id, t2.*
Filter: (t2.c1 = 1)
Rows Removed by Filter: 999919
Buffers: shared hit=7160
Planning time: 0.261 ms
Execution time: 732.070 ms
(16 rows)
例子2, 没有PK
没有PK时,可以使用行号进行关联。
postgres=# create index tbl_pk_idx2 on tbl_pk (ctid);
CREATE INDEX
postgres=# explain (analyze,verbose,timing,costs,buffers) select t1.* from tbl_pk t1 left join tbl_pk t2 on (t1.ctid=t2.ctid and t2.c1=1) where t2.* is null;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------
Merge Left Join (cost=0.85..92362.07 rows=5000 width=8) (actual time=1.232..720.467 rows=999919 loops=1)
Output: t1.id, t1.c1
Merge Cond: (t1.ctid = t2.ctid)
Filter: (t2.* IS NULL)
Rows Removed by Filter: 81
Buffers: shared hit=11585 read=2735
-> Index Scan using tbl_pk_idx2 on public.tbl_pk t1 (cost=0.42..43680.43 rows=1000000 width=14) (actual time=0.041..268.576 rows=1000000 loops=1)
Output: t1.id, t1.c1, t1.ctid
Buffers: shared hit=7157 read=3
-> Index Scan using tbl_pk_idx2 on public.tbl_pk t2 (cost=0.42..46180.43 rows=98 width=38) (actual time=1.186..213.849 rows=81 loops=1)
Output: t2.ctid, t2.*
Filter: (t2.c1 = 1)
Rows Removed by Filter: 999919
Buffers: shared hit=4428 read=2732
Planning time: 0.336 ms
Execution time: 787.713 ms
(16 rows)
例子3, 数组查询中的不包含
postgres=# create table tbl_arr(id int[], c1 text);
CREATE TABLE
postgres=# insert into tbl_arr select array_agg(100000*random()) from generate_series(1,100);
INSERT 0 1
postgres=# insert into tbl_arr select array_agg(100000*random()) from generate_series(1,100);
INSERT 0 1
postgres=# insert into tbl_arr select array_agg(100000*random()) from generate_series(1,100);
INSERT 0 1
postgres=# insert into tbl_arr select array_agg(100000*random()) from generate_series(1,100);
INSERT 0 1
postgres=# insert into tbl_arr select array_agg(100000*random()) from generate_series(1,100);
INSERT 0 1
postgres=# insert into tbl_arr select array_agg(100000*random()) from generate_series(1,100);
INSERT 0 1
postgres=# insert into tbl_arr select array_agg(100000*random()) from generate_series(1,100);
INSERT 0 1
postgres=# insert into tbl_arr select array_agg(100000*random()) from generate_series(1,100);
INSERT 0 1
postgres=# insert into tbl_arr select array_agg(100000*random()) from generate_series(1,100);
INSERT 0 1
postgres=# insert into tbl_arr select array_agg(100000*random()) from generate_series(1,100);
INSERT 0 1
postgres=# create index idx_tbl_arr_1 on tbl_arr using gin (id);
CREATE INDEX
postgres=# create index idx_tbl_arr_2 on tbl_arr (ctid);
CREATE INDEX
postgres=# explain (analyze,verbose,timing,costs,buffers) select * from tbl_arr where not (id @> array[1,2]);
QUERY PLAN
-----------------------------------------------------------------------------------------------------------
Seq Scan on public.tbl_arr (cost=0.00..1.12 rows=10 width=64) (actual time=0.017..0.029 rows=10 loops=1)
Output: id, c1
Filter: (NOT (tbl_arr.id @> '{1,2}'::integer[]))
Buffers: shared hit=1
Planning time: 0.086 ms
Execution time: 0.051 ms
(6 rows)
postgres=# explain (analyze,verbose,timing,costs,buffers) select t1.* from tbl_arr t1 left join tbl_arr t2 on (t1.ctid=t2.ctid and t2.id @> array[1,2]) where t2.* is null;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------
Merge Left Join (cost=0.27..24.63 rows=1 width=64) (actual time=0.133..0.137 rows=10 loops=1)
Output: t1.id, t1.c1
Merge Cond: (t1.ctid = t2.ctid)
Filter: (t2.* IS NULL)
Buffers: shared hit=3 read=1
-> Index Scan using idx_tbl_arr_2 on public.tbl_arr t1 (cost=0.14..12.29 rows=10 width=70) (actual time=0.108..0.110 rows=10 loops=1)
Output: t1.id, t1.c1, t1.ctid
Buffers: shared hit=1 read=1
-> Index Scan using idx_tbl_arr_2 on public.tbl_arr t2 (cost=0.14..12.31 rows=1 width=94) (actual time=0.022..0.022 rows=0 loops=1)
Output: t2.ctid, t2.*
Filter: (t2.id @> '{1,2}'::integer[])
Rows Removed by Filter: 10
Buffers: shared hit=2
Planning time: 0.193 ms
Execution time: 0.173 ms
(15 rows)
postgres=# explain (analyze,verbose,timing,costs,buffers) select t1.* from tbl_arr t1 where not exists (select 1 from tbl_arr t2 where t1.ctid=t2.ctid and t2.id @> array[1,2]);
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------
Merge Anti Join (cost=0.27..24.63 rows=9 width=64) (actual time=0.023..0.026 rows=10 loops=1)
Output: t1.id, t1.c1
Merge Cond: (t1.ctid = t2.ctid)
Buffers: shared hit=4
-> Index Scan using idx_tbl_arr_2 on public.tbl_arr t1 (cost=0.14..12.29 rows=10 width=70) (actual time=0.004..0.005 rows=10 loops=1)
Output: t1.id, t1.c1, t1.ctid
Buffers: shared hit=2
-> Index Scan using idx_tbl_arr_2 on public.tbl_arr t2 (cost=0.14..12.31 rows=1 width=6) (actual time=0.017..0.017 rows=0 loops=1)
Output: t2.ctid
Filter: (t2.id @> '{1,2}'::integer[])
Rows Removed by Filter: 10
Buffers: shared hit=2
Planning time: 0.103 ms
Execution time: 0.047 ms
(14 rows)