PostgreSQL in 语法 的优化器处理以及如何优化

本文涉及的产品
RDS PostgreSQL Serverless,0.5-4RCU 50GB 3个月
推荐场景:
对影评进行热评分析
云数据库 RDS SQL Server,基础系列 2核4GB
RDS MySQL Serverless 基础系列,0.5-2RCU 50GB
简介:

PostgreSQL in 的优化器处理以及如何优化

在使用数据库的过程中,经常会遇到需要匹配多个值的情况。
通常的写法包括:

-- select * from table where id = any(array);  
-- select * from table where id in (values);  
-- select * from table where id=x or id=x or ....;  
-- select * from table where id in (query);  
-- select * from table where id in ( values query );  
-- select * from table t1 join (query or values query) t2 on t1.id=t2.id;  

每种写法会产生多种执行计划的可能,如下:

-- select * from table where id = any(array);  
  优化器可以使用index scan, bitmap scan, seq scan.  
  
-- select * from table where id in (values);  
  优化器可以使用index scan, bitmap scan, seq scan.  
  
-- select * from table where id=x or id=x or ....;  
  优化器可以使用bitmap scan + BitmapOr, seq scan.  
  
-- select * from table where id in (query);  
  优化器可以使用join (merge,hash,nest).  
  
-- select * from table where id in ( values query );  
  优化器可以使用join (merge,hash,nest).  
  
-- select * from table t1 join (query or values query) t2 on t1.id=t2.id;  
  优化器可以使用join (merge,hash,nest).  

SQL优化策略是尽量减少CPU的运算以及page的扫描数量。

下面针对每种SQL,看看对应的可能的执行计划有什么差别,(使用开关来控制执行计划的选择, 如set enable_indexscan=off)
支持的开关如下:

enable_bitmapscan     enable_hashjoin       enable_indexscan      enable_mergejoin      enable_seqscan        enable_tidscan          
enable_hashagg        enable_indexonlyscan  enable_material       enable_nestloop       enable_sort   

开始测试,使用auto_explain输出执行计划:

load 'auto_explain';    
set auto_explain.log_analyze =true;    
set auto_explain.log_buffers =true;    
set auto_explain.log_nested_statements=true;    
set auto_explain.log_timing=true;    
set auto_explain.log_triggers=true;    
set auto_explain.log_verbose=true;    
set auto_explain.log_min_duration=0;    
set client_min_messages ='log';    
set work_mem='8GB';    

测试SQL写法1:

-- select * from table where id = any(array);  
  
do language plpgsql 
$$
  
declare  
  v_id int[];  
begin  
  select array_agg(trunc(random()*100000)) into v_id from generate_series(1,200) t(id);  
  perform * from t_in_test where id = any (v_id);  
end;  

$$
;  

优化器选择1 (index scan):
离散扫描,适合小的扫描集。

LOG:  duration: 2.312 ms  plan:  
Query Text: SELECT * from t_in_test where id = any (v_id)  
Index Scan using t_in_test_pkey on public.t_in_test  (cost=0.43..895.50 rows=200 width=37) (actual time=0.039..2.266 rows=200 loops=1)  
  Output: id, info  
  Index Cond: (t_in_test.id = ANY ('{50836,73414,41071,45604,...省略部分...,76236}'::integer[]))  
  Buffers: shared hit=776  
CONTEXT:  SQL statement "SELECT * from t_in_test where id = any (v_id)"  
PL/pgSQL function inline_code_block line 6 at PERFORM  

优化器选择2 (bitmap scan):
比index scan多了Recheck的开销,以及按照ctid排序的开销。
适合大的扫描集,排序的目的是减少离散扫描,还可以用到块设备的prefetch。

LOG:  duration: 1.602 ms  plan:  
Query Text: SELECT * from t_in_test where id = any (v_id)  
Bitmap Heap Scan on public.t_in_test  (cost=888.55..1711.16 rows=200 width=37) (actual time=0.880..1.563 rows=200 loops=1)  
  Output: id, info  
  Recheck Cond: (t_in_test.id = ANY ('{32635,31123,6282,59640,...省略部分...,87705}'::integer[]))  
  Heap Blocks: exact=184  
  Buffers: shared hit=784  
  ->  Bitmap Index Scan on t_in_test_pkey  (cost=0.00..888.50 rows=200 width=0) (actual time=0.846..0.846 rows=200 loops=1)  
        Index Cond: (t_in_test.id = ANY ('{32635,31123,6282,59640,...省略部分...,87705}'::integer[]))  
        Buffers: shared hit=600  
CONTEXT:  SQL statement "SELECT * from t_in_test where id = any (v_id)"  
PL/pgSQL function inline_code_block line 6 at PERFORM  

优化器选择3 (seq scan):
适合非常庞大的扫描集。

LOG:  duration: 19940.394 ms  plan:  
Query Text: SELECT * from t_in_test where id = any (v_id)  
Seq Scan on public.t_in_test  (cost=0.00..2683354.80 rows=200 width=37) (actual time=4.237..19940.330 rows=199 loops=1)  
  Output: id, info  
  Filter: (t_in_test.id = ANY ('{45867,72450,95153,86233,63073,11016,56010,47158,...省略部分...,90444}'::integer[]))  
  Rows Removed by Filter: 9999801  
  Buffers: shared hit=83334  
CONTEXT:  SQL statement "SELECT * from t_in_test where id = any (v_id)"  
PL/pgSQL function inline_code_block line 6 at PERFORM  

测试SQL写法2:

-- select * from table where id in (values);  
  
do language plpgsql 
$$
  
declare  
  v_where text;  
begin  
  select string_agg(id::text,',') into v_where from (select trunc(random()*100000)::int as id from generate_series(1,200) t(id)) t;  
  execute 'select * from t_in_test where id in ('||v_where||')';  
end;  

$$
;  

优化器选择1 (index scan):

LOG:  duration: 0.919 ms  plan:  
Query Text: select * from t_in_test where id in (8826,2038,72163,29843,76886,37893,5279,64308,...省略部分...,48126,44868)  
Index Scan using t_in_test_pkey on public.t_in_test  (cost=0.43..895.50 rows=200 width=37) (actual time=0.017..0.894 rows=200 loops=1)  
  Output: id, info  
  Index Cond: (t_in_test.id = ANY ('{8826,2038,72163,29843,76886,37893,5279,64308,7370,80216,...省略部分...,48126,44868}'::integer[]))  
  Buffers: shared hit=779  
CONTEXT:  SQL statement "select * from t_in_test where id in (8826,2038,72163,29843,76886,37893,5279,64308,7370,80216,...省略部分...,73366,48126,44868)"  
PL/pgSQL function inline_code_block line 6 at EXECUTE  

优化器选择2 (bitmap scan):

LOG:  duration: 1.012 ms  plan:  
Query Text: select * from t_in_test where id in (17424,80517,35148,38245,93037,...省略部分...,14997,34639,10646)  
Bitmap Heap Scan on public.t_in_test  (cost=888.55..1711.16 rows=200 width=37) (actual time=0.657..0.978 rows=200 loops=1)  
  Output: id, info  
  Recheck Cond: (t_in_test.id = ANY ('{17424,80517,35148,38245,93037,4516,...省略部分...,14997,34639,10646}'::integer[]))  
  Heap Blocks: exact=177  
  Buffers: shared hit=779  
  ->  Bitmap Index Scan on t_in_test_pkey  (cost=0.00..888.50 rows=200 width=0) (actual time=0.629..0.629 rows=200 loops=1)  
        Index Cond: (t_in_test.id = ANY ('{17424,80517,35148,38245,93037,4516,27690,...省略部分...,34639,10646}'::integer[]))  
        Buffers: shared hit=602  
CONTEXT:  SQL statement "select * from t_in_test where id in (17424,80517,35148,38245,93037,4516,27690,48978,11902,...省略部分...,34639,10646)"  
PL/pgSQL function inline_code_block line 6 at EXECUTE  

优化器选择3 (seq scan):

LOG:  duration: 19678.014 ms  plan:  
Query Text: select * from t_in_test where id in (77056,1340,73056,42536,6862,44702,64810,42774,...省略部分...,24083,11322)  
Seq Scan on public.t_in_test  (cost=0.00..2683354.80 rows=200 width=37) (actual time=2.045..19677.975 rows=200 loops=1)  
  Output: id, info  
  Filter: (t_in_test.id = ANY ('{77056,1340,73056,42536,6862,...省略部分...,24083,11322}'::integer[]))  
  Rows Removed by Filter: 9999800  
  Buffers: shared hit=83334  
CONTEXT:  SQL statement "select * from t_in_test where id in (77056,1340,73056,42536,6862,44702,...省略部分...,24083,11322)"  
PL/pgSQL function inline_code_block line 6 at EXECUTE  

测试SQL写法3:

-- select * from table where id=x or id=x or ....;  
  
do language plpgsql 
$$
  
declare  
  v_where text := 'id=';  
  v int;  
begin  
  for v in select trunc(random()*100000)::int from generate_series(1,200) t(id)  
  loop  
    v_where := ' '|| v_where ||v||' or id=';  
  end loop;  
  v_where := rtrim(v_where,'or id=');  
  execute 'select * from t_in_test where '||v_where;  
end;  

$$
;  

优化器选择1 (bitmapindex scan + bitmapor):
使用or的写法,只能选择bitmap index scan,所以不如使用IN的写法。

LOG:  duration: 1.085 ms  plan:  
Query Text: select * from t_in_test where                                                                                                                                                                                                         id=29207 or id=69918 or id=4044 or ...省略部分... or id=53009 or id=28015 or id=11763  
Bitmap Heap Scan on public.t_in_test  (cost=898.50..1771.11 rows=200 width=37) (actual time=0.754..1.043 rows=200 loops=1)  
  Output: id, info  
  Recheck Cond: ((t_in_test.id = 29207) OR (t_in_test.id = 69918) OR (t_in_test.id = 4044) OR (t_in_test.id = 65838) OR ...省略部分... OR (t_in_test.id = 28015) OR (t_in_test.id = 11763))  
  Heap Blocks: exact=180  
  Buffers: shared hit=781  
  ->  BitmapOr  (cost=898.50..898.50 rows=200 width=0) (actual time=0.725..0.725 rows=0 loops=1)  
        Buffers: shared hit=601  
        ->  Bitmap Index Scan on t_in_test_pkey  (cost=0.00..4.44 rows=1 width=0) (actual time=0.020..0.020 rows=1 loops=1)  
              Index Cond: (t_in_test.id = 29207)  
              Buffers: shared hit=3  
        ->  Bitmap Index Scan on t_in_test_pkey  (cost=0.00..4.44 rows=1 width=0) (actual time=0.011..0.011 rows=1 loops=1)  
              Index Cond: (t_in_test.id = 69918)  
              Buffers: shared hit=3  
        .....省略部分  
        ->  Bitmap Index Scan on t_in_test_pkey  (cost=0.00..4.44 rows=1 width=0) (actual time=0.004..0.004 rows=1 loops=1)  
              Index Cond: (t_in_test.id = 11763)  
              Buffers: shared hit=3  
CONTEXT:  SQL statement "select * from t_in_test where                                                                                                                                                                                                         id=29207 or id=69918 or id=4044 or ...省略部分... or id=28015 or id=11763"  
PL/pgSQL function inline_code_block line 11 at EXECUTE  

优化器选择2 (seq scan):

LOG:  duration: 107484.074 ms  plan:  
Query Text: select * from t_in_test where                                                                                                                                                                                                         id=51946 or id=17129 or id=90027 or ...省略部分... or id=22127 or id=62334 or id=11722  
Seq Scan on public.t_in_test  (cost=0.00..5183374.80 rows=200 width=37) (actual time=17.394..107483.942 rows=199 loops=1)  
  Output: id, info  
  Filter: ((t_in_test.id = 51946) OR (t_in_test.id = 17129) OR (t_in_test.id = 90027) OR ...省略部分... OR (t_in_test.id = 62334) OR (t_in_test.id = 11722))  
  Rows Removed by Filter: 9999801  
  Buffers: shared hit=83334  
CONTEXT:  SQL statement "select * from t_in_test where                                                                                                                                                                                                         id=51946 or id=17129 or id=90027 or ...省略部分... or id=62334 or id=11722"  
PL/pgSQL function inline_code_block line 11 at EXECUTE  

测试SQL写法4:

-- select * from table where id in (query);  
  
do language plpgsql 
$$
  
declare  
begin  
  perform * from t_in_test where id in (select trunc(random()*100000)::int as id from generate_series(1,200) t(id));  
end;  

$$
;  

优化器选择1 (nestloop join):
适合小的扫描集,并且其中有一个表的JOIN列是带有主键或唯一约束的。

LOG:  duration: 1.314 ms  plan:  
Query Text: SELECT * from t_in_test where id in (select trunc(random()*100000)::int as id from generate_series(1,200) t(id))  
Nested Loop  (cost=32.94..1727.00 rows=5000040 width=37) (actual time=0.166..1.226 rows=200 loops=1)  
  Output: t_in_test.id, t_in_test.info  
  Buffers: shared hit=800  
  ->  HashAggregate  (cost=32.50..34.50 rows=200 width=4) (actual time=0.149..0.189 rows=200 loops=1)  
        Output: ((trunc((random() * '100000'::double precision)))::integer)  
        Group Key: (trunc((random() * '100000'::double precision)))::integer  
        ->  Function Scan on pg_catalog.generate_series t  (cost=0.00..20.00 rows=1000 width=0) (actual time=0.033..0.089 rows=200 loops=1)  
              Output: (trunc((random() * '100000'::double precision)))::integer  
              Function Call: generate_series(1, 200)  
  ->  Index Scan using t_in_test_pkey on public.t_in_test  (cost=0.43..8.45 rows=1 width=37) (actual time=0.004..0.005 rows=1 loops=200)  
        Output: t_in_test.id, t_in_test.info  
        Index Cond: (t_in_test.id = ((trunc((random() * '100000'::double precision)))::integer))  
        Buffers: shared hit=800  
CONTEXT:  SQL statement "SELECT * from t_in_test where id in (select trunc(random()*100000)::int as id from generate_series(1,200) t(id))"  
PL/pgSQL function inline_code_block line 4 at PERFORM  

优化器选择2 (hash join):
适合大的扫描集,同时两个表的JOIN列上面都没有索引的情况。

LOG:  duration: 2454.400 ms  plan:  
Query Text: SELECT * from t_in_test where id in (select trunc(random()*100000)::int as id from generate_series(1,200) t(id))  
Hash Join  (cost=37.00..220874.10 rows=5000040 width=37) (actual time=0.413..2454.343 rows=200 loops=1)  
  Output: t_in_test.id, t_in_test.info  
  Hash Cond: (t_in_test.id = ((trunc((random() * '100000'::double precision)))::integer))  
  Buffers: shared hit=83334  
  ->  Seq Scan on public.t_in_test  (cost=0.00..183334.80 rows=10000080 width=37) (actual time=0.010..1182.626 rows=10000000 loops=1)  
        Output: t_in_test.id, t_in_test.info  
        Buffers: shared hit=83334  
  ->  Hash  (cost=34.50..34.50 rows=200 width=4) (actual time=0.221..0.221 rows=200 loops=1)  
        Output: ((trunc((random() * '100000'::double precision)))::integer)  
        Buckets: 1024  Batches: 1  Memory Usage: 16kB  
        ->  HashAggregate  (cost=32.50..34.50 rows=200 width=4) (actual time=0.149..0.177 rows=200 loops=1)  
              Output: ((trunc((random() * '100000'::double precision)))::integer)  
              Group Key: (trunc((random() * '100000'::double precision)))::integer  
              ->  Function Scan on pg_catalog.generate_series t  (cost=0.00..20.00 rows=1000 width=0) (actual time=0.033..0.088 rows=200 loops=1)  
                    Output: (trunc((random() * '100000'::double precision)))::integer  
                    Function Call: generate_series(1, 200)  
CONTEXT:  SQL statement "SELECT * from t_in_test where id in (select trunc(random()*100000)::int as id from generate_series(1,200) t(id))"  
PL/pgSQL function inline_code_block line 4 at PERFORM  

优化器选择3 (merge join):
适合大的扫描集,并且两个表的JOIN列都有索引。

LOG:  duration: 32.551 ms  plan:  
Query Text: SELECT * from t_in_test where id in (select trunc(random()*100000)::int as id from generate_series(1,200) t(id))  
Merge Join  (cost=42.58..368067.98 rows=5000040 width=37) (actual time=0.561..32.497 rows=200 loops=1)  
  Output: t_in_test.id, t_in_test.info  
  Merge Cond: (t_in_test.id = ((trunc((random() * '100000'::double precision)))::integer))  
  Buffers: shared hit=1112  
  ->  Index Scan using t_in_test_pkey on public.t_in_test  (cost=0.43..343022.64 rows=10000080 width=37) (actual time=0.016..20.499 rows=99905 loops=1)  
        Output: t_in_test.id, t_in_test.info  
        Buffers: shared hit=1108  
  ->  Sort  (cost=42.15..42.65 rows=200 width=4) (actual time=0.268..0.296 rows=200 loops=1)  
        Output: ((trunc((random() * '100000'::double precision)))::integer)  
        Sort Key: ((trunc((random() * '100000'::double precision)))::integer)  
        Sort Method: quicksort  Memory: 34kB  
        Buffers: shared hit=4  
        ->  HashAggregate  (cost=32.50..34.50 rows=200 width=4) (actual time=0.148..0.181 rows=200 loops=1)  
              Output: ((trunc((random() * '100000'::double precision)))::integer)  
              Group Key: (trunc((random() * '100000'::double precision)))::integer  
              ->  Function Scan on pg_catalog.generate_series t  (cost=0.00..20.00 rows=1000 width=0) (actual time=0.035..0.078 rows=200 loops=1)  
                    Output: (trunc((random() * '100000'::double precision)))::integer  
                    Function Call: generate_series(1, 200)  
CONTEXT:  SQL statement "SELECT * from t_in_test where id in (select trunc(random()*100000)::int as id from generate_series(1,200) t(id))"  
PL/pgSQL function inline_code_block line 4 at PERFORM  

测试SQL写法5:

-- select * from table where id in ( values query );  
  
do language plpgsql 
$$
  
declare  
  v_values text := 'values ( ';  
  v int;  
begin  
  for v in select trunc(random()*100000)::int from generate_series(1,200) t(id)  
  loop  
    v_values := v_values ||v||'),(';  
  end loop;  
  v_values := rtrim( v_values,',(' );  
  execute 'select * from t_in_test where id in ( select * from ('||v_values||') as t(id))';  
end;  

$$
;  

优化器选择1 (nestloop join):

LOG:  duration: 1.272 ms  plan:  
Query Text: select * from t_in_test where id in ( select * from (values ( 96474),(39030),(12481),(60519),...省略部分...,(23783),(9253)) as t(id))  
Nested Loop  (cost=3.44..1697.50 rows=5000040 width=37) (actual time=0.130..1.195 rows=200 loops=1)  
  Output: t_in_test.id, t_in_test.info  
  Buffers: shared hit=802  
  ->  HashAggregate  (cost=3.00..5.00 rows=200 width=4) (actual time=0.105..0.143 rows=200 loops=1)  
        Output: "*VALUES*".column1  
        Group Key: "*VALUES*".column1  
        ->  Values Scan on "*VALUES*"  (cost=0.00..2.50 rows=200 width=4) (actual time=0.001..0.040 rows=200 loops=1)  
              Output: "*VALUES*".column1  
  ->  Index Scan using t_in_test_pkey on public.t_in_test  (cost=0.43..8.45 rows=1 width=37) (actual time=0.004..0.005 rows=1 loops=200)  
        Output: t_in_test.id, t_in_test.info  
        Index Cond: (t_in_test.id = "*VALUES*".column1)  
        Buffers: shared hit=802  
CONTEXT:  SQL statement "select * from t_in_test where id in ( select * from (values ( 96474),(39030),(12481),(60519),(70354),(33117),...省略部分...,(15818),(23783),(9253)) as t(id))"  
PL/pgSQL function inline_code_block line 11 at EXECUTE  

优化器选择2 (hash join):

LOG:  duration: 2444.648 ms  plan:  
Query Text: select * from t_in_test where id in ( select * from (values ( 95286),(76612),(56400),(99838),(2155),...省略部分...,(29527),(99252)) as t(id))  
Hash Join  (cost=7.50..220844.60 rows=5000040 width=37) (actual time=0.222..2444.573 rows=200 loops=1)  
  Output: t_in_test.id, t_in_test.info  
  Hash Cond: (t_in_test.id = "*VALUES*".column1)  
  Buffers: shared hit=83334  
  ->  Seq Scan on public.t_in_test  (cost=0.00..183334.80 rows=10000080 width=37) (actual time=0.009..1174.724 rows=10000000 loops=1)  
        Output: t_in_test.id, t_in_test.info  
        Buffers: shared hit=83334  
  ->  Hash  (cost=5.00..5.00 rows=200 width=4) (actual time=0.173..0.173 rows=200 loops=1)  
        Output: "*VALUES*".column1  
        Buckets: 1024  Batches: 1  Memory Usage: 16kB  
        ->  HashAggregate  (cost=3.00..5.00 rows=200 width=4) (actual time=0.101..0.135 rows=200 loops=1)  
              Output: "*VALUES*".column1  
              Group Key: "*VALUES*".column1  
              ->  Values Scan on "*VALUES*"  (cost=0.00..2.50 rows=200 width=4) (actual time=0.001..0.042 rows=200 loops=1)  
                    Output: "*VALUES*".column1  
CONTEXT:  SQL statement "select * from t_in_test where id in ( select * from (values ( 95286),(76612),(56400),...省略部分...,(29527),(99252)) as t(id))"  
PL/pgSQL function inline_code_block line 11 at EXECUTE  

优化器选择3 (merge join):

LOG:  duration: 32.296 ms  plan:  
Query Text: select * from t_in_test where id in ( select * from (values ( 18704),(70725),(55056),...省略部分...,(80068),(28737)) as t(id))  
Merge Semi Join  (cost=10.58..368035.98 rows=5000040 width=37) (actual time=0.560..32.212 rows=200 loops=1)  
  Output: t_in_test.id, t_in_test.info  
  Merge Cond: (t_in_test.id = "*VALUES*".column1)  
  Buffers: shared hit=1110  
  ->  Index Scan using t_in_test_pkey on public.t_in_test  (cost=0.43..343022.64 rows=10000080 width=37) (actual time=0.023..20.733 rows=99962 loops=1)  
        Output: t_in_test.id, t_in_test.info  
        Buffers: shared hit=1110  
  ->  Sort  (cost=10.14..10.64 rows=200 width=4) (actual time=0.105..0.134 rows=200 loops=1)  
        Output: "*VALUES*".column1  
        Sort Key: "*VALUES*".column1  
        Sort Method: quicksort  Memory: 34kB  
        ->  Values Scan on "*VALUES*"  (cost=0.00..2.50 rows=200 width=4) (actual time=0.002..0.035 rows=200 loops=1)  
              Output: "*VALUES*".column1  
CONTEXT:  SQL statement "select * from t_in_test where id in ( select * from (values ( 18704),(70725),(55056),...省略部分...,(28737)) as t(id))"  
PL/pgSQL function inline_code_block line 11 at EXECUTE  

测试SQL写法6:

-- select * from table t1 join (query or values query) t2 on t1.id=t2.id;  
  
do language plpgsql 
$$
  
declare  
begin  
  perform * from t_in_test t1 join (select trunc(random()*100000)::int as id from generate_series(1,200) t(id)) t2 on (t1.id=t2.id);  
end;  

$$
;  

优化器选择1 (nestloop join):

LOG:  duration: 1.327 ms  plan:  
Query Text: SELECT * from t_in_test t1 join (select trunc(random()*100000)::int as id from generate_series(1,200) t(id)) t2 on (t1.id=t2.id)  
Nested Loop  (cost=0.44..8404.50 rows=1000 width=41) (actual time=0.062..1.241 rows=200 loops=1)  
  Output: t1.id, t1.info, ((trunc((random() * '100000'::double precision)))::integer)  
  Buffers: shared hit=802  
  ->  Function Scan on pg_catalog.generate_series t  (cost=0.00..20.00 rows=1000 width=0) (actual time=0.034..0.131 rows=200 loops=1)  
        Output: (trunc((random() * '100000'::double precision)))::integer  
        Function Call: generate_series(1, 200)  
  ->  Index Scan using t_in_test_pkey on public.t_in_test t1  (cost=0.43..8.36 rows=1 width=37) (actual time=0.005..0.005 rows=1 loops=200)  
        Output: t1.id, t1.info  
        Index Cond: (t1.id = ((trunc((random() * '100000'::double precision)))::integer))  
        Buffers: shared hit=802  
CONTEXT:  SQL statement "SELECT * from t_in_test t1 join (select trunc(random()*100000)::int as id from generate_series(1,200) t(id)) t2 on (t1.id=t2.id)"  
PL/pgSQL function inline_code_block line 4 at PERFORM  

优化器选择2 (hash join):

LOG:  duration: 4883.088 ms  plan:  
Query Text: SELECT * from t_in_test t1 join (select trunc(random()*100000)::int as id from generate_series(1,200) t(id)) t2 on (t1.id=t2.id)  
Hash Join  (cost=308335.80..308390.80 rows=1000 width=41) (actual time=4882.749..4883.023 rows=200 loops=1)  
  Output: t1.id, t1.info, ((trunc((random() * '100000'::double precision)))::integer)  
  Hash Cond: (((trunc((random() * '100000'::double precision)))::integer) = t1.id)  
  Buffers: shared hit=83334  
  ->  Function Scan on pg_catalog.generate_series t  (cost=0.00..20.00 rows=1000 width=0) (actual time=0.033..0.125 rows=200 loops=1)  
        Output: (trunc((random() * '100000'::double precision)))::integer  
        Function Call: generate_series(1, 200)  
  ->  Hash  (cost=183334.80..183334.80 rows=10000080 width=37) (actual time=4767.895..4767.895 rows=10000000 loops=1)  
        Output: t1.id, t1.info  
        Buckets: 16777216  Batches: 1  Memory Usage: 804901kB  
        Buffers: shared hit=83334  
        ->  Seq Scan on public.t_in_test t1  (cost=0.00..183334.80 rows=10000080 width=37) (actual time=0.014..1325.338 rows=10000000 loops=1)  
              Output: t1.id, t1.info  
              Buffers: shared hit=83334  
CONTEXT:  SQL statement "SELECT * from t_in_test t1 join (select trunc(random()*100000)::int as id from generate_series(1,200) t(id)) t2 on (t1.id=t2.id)"  
PL/pgSQL function inline_code_block line 4 at PERFORM  

优化器选择3 (merge join):

LOG:  duration: 32.505 ms  plan:  
Query Text: SELECT * from t_in_test t1 join (select trunc(random()*100000)::int as id from generate_series(1,200) t(id)) t2 on (t1.id=t2.id)  
Merge Join  (cost=80.27..368117.67 rows=1000 width=41) (actual time=0.182..32.429 rows=200 loops=1)  
  Output: t1.id, t1.info, ((trunc((random() * '100000'::double precision)))::integer)  
  Merge Cond: (t1.id = ((trunc((random() * '100000'::double precision)))::integer))  
  Buffers: shared hit=1102  
  ->  Index Scan using t_in_test_pkey on public.t_in_test t1  (cost=0.43..343022.64 rows=10000080 width=37) (actual time=0.022..20.782 rows=99360 loops=1)  
        Output: t1.id, t1.info  
        Buffers: shared hit=1102  
  ->  Sort  (cost=79.83..82.33 rows=1000 width=4) (actual time=0.154..0.180 rows=200 loops=1)  
        Output: ((trunc((random() * '100000'::double precision)))::integer)  
        Sort Key: ((trunc((random() * '100000'::double precision)))::integer)  
        Sort Method: quicksort  Memory: 34kB  
        ->  Function Scan on pg_catalog.generate_series t  (cost=0.00..20.00 rows=1000 width=0) (actual time=0.036..0.099 rows=200 loops=1)  
              Output: (trunc((random() * '100000'::double precision)))::integer  
              Function Call: generate_series(1, 200)  
CONTEXT:  SQL statement "SELECT * from t_in_test t1 join (select trunc(random()*100000)::int as id from generate_series(1,200) t(id)) t2 on (t1.id=t2.id)"  
PL/pgSQL function inline_code_block line 4 at PERFORM  

优化选择建议如下,可以根据需求来写SQL,最不建议写的是OR:
index scan:
离散扫描,适合小的扫描集。
bitmap scan:
比index scan多了Recheck的开销,以及按照ctid排序的开销。
适合大的扫描集,排序的目的是减少离散扫描,还可以用到块设备的prefetch。
seq scan:
适合非常庞大的扫描集。
bitmapindex scan + bitmapor|And:
使用or的写法,只能选择bitmap index scan,所以不如使用IN的写法。
nestloop join:
适合小的扫描集,并且其中有一个表的JOIN列是带有主键或唯一约束的。
hash join:
适合大的扫描集,同时两个表的JOIN列上面都没有索引的情况。
merge join:
适合大的扫描集,并且两个表的JOIN列都有索引。

如果你发现写好的SQL没有选择最优的执行计划,可以通过设置优化器开关,或者使用hint plan这个插件来指定优化器使用对应的scan或join method.

http://pghintplan.osdn.jp/pg_hint_plan.html

git clone git://git.osdn.jp/gitroot/pghintplan/pg_hint_plan.git  
mv pg_hint_plan postgresql-9.5.0/contrib/  
cd postgresql-9.5.0/contrib/pg_hint_plan  
export PATH=/home/digoal/pgsql9.5.0/bin:$PATH  
make  
make install  
  
psql  
postgres=# create extension pg_hint_plan;  
CREATE EXTENSION  
  
postgres=# LOAD 'pg_hint_plan';  
  
postgres=# /*+ NestLoop(t1 t2) */ explain select * from t_in_test t1 join (select trunc(random()*100000)::int as id from generate_series(1,200) t(id)) t2 on (t1.id=t2.id);  
LOG:  duration: 0.000 ms  plan:  
Query Text: /*+ NestLoop(t1 t2) */ explain select * from t_in_test t1 join (select trunc(random()*100000)::int as id from generate_series(1,200) t(id)) t2 on (t1.id=t2.id);  
Nested Loop  (cost=0.44..8404.50 rows=1000 width=41)  
  Output: t1.id, t1.info, ((trunc((random() * '100000'::double precision)))::integer)  
  ->  Function Scan on pg_catalog.generate_series t  (cost=0.00..20.00 rows=1000 width=0)  
        Output: (trunc((random() * '100000'::double precision)))::integer  
        Function Call: generate_series(1, 200)  
  ->  Index Scan using t_in_test_pkey on public.t_in_test t1  (cost=0.43..8.36 rows=1 width=37)  
        Output: t1.id, t1.info  
        Index Cond: (t1.id = ((trunc((random() * '100000'::double precision)))::integer))  
                                        QUERY PLAN                                          
------------------------------------------------------------------------------------------  
 Nested Loop  (cost=0.44..8404.50 rows=1000 width=41)  
   ->  Function Scan on generate_series t  (cost=0.00..20.00 rows=1000 width=0)  
   ->  Index Scan using t_in_test_pkey on t_in_test t1  (cost=0.43..8.36 rows=1 width=37)  
         Index Cond: (id = ((trunc((random() * '100000'::double precision)))::integer))  
(4 rows)  
相关实践学习
使用PolarDB和ECS搭建门户网站
本场景主要介绍基于PolarDB和ECS实现搭建门户网站。
阿里云数据库产品家族及特性
阿里云智能数据库产品团队一直致力于不断健全产品体系,提升产品性能,打磨产品功能,从而帮助客户实现更加极致的弹性能力、具备更强的扩展能力、并利用云设施进一步降低企业成本。以云原生+分布式为核心技术抓手,打造以自研的在线事务型(OLTP)数据库Polar DB和在线分析型(OLAP)数据库Analytic DB为代表的新一代企业级云原生数据库产品体系, 结合NoSQL数据库、数据库生态工具、云原生智能化数据库管控平台,为阿里巴巴经济体以及各个行业的企业客户和开发者提供从公共云到混合云再到私有云的完整解决方案,提供基于云基础设施进行数据从处理、到存储、再到计算与分析的一体化解决方案。本节课带你了解阿里云数据库产品家族及特性。
目录
相关文章
|
19天前
|
SQL 关系型数据库 MySQL
MySQL慢查询优化、索引优化、以及表等优化详解
本文详细介绍了MySQL优化方案,包括索引优化、SQL慢查询优化和数据库表优化,帮助提升数据库性能。关注【mikechen的互联网架构】,10年+BAT架构经验倾囊相授。
MySQL慢查询优化、索引优化、以及表等优化详解
|
23天前
|
缓存 监控 关系型数据库
如何优化MySQL查询速度?
如何优化MySQL查询速度?【10月更文挑战第31天】
56 3
|
26天前
|
缓存 关系型数据库 MySQL
如何优化 MySQL 数据库的性能?
【10月更文挑战第28天】
51 1
|
2月前
|
NoSQL 关系型数据库 MySQL
MySQL与Redis协同作战:百万级数据统计优化实践
【10月更文挑战第21天】 在处理大规模数据集时,传统的单体数据库解决方案往往力不从心。MySQL和Redis的组合提供了一种高效的解决方案,通过将数据库操作与高速缓存相结合,可以显著提升数据处理的性能。本文将分享一次实际的优化案例,探讨如何利用MySQL和Redis共同实现百万级数据统计的优化。
79 9
|
27天前
|
监控 关系型数据库 MySQL
数据库优化:MySQL索引策略与查询性能调优实战
【10月更文挑战第27天】本文深入探讨了MySQL的索引策略和查询性能调优技巧。通过介绍B-Tree索引、哈希索引和全文索引等不同类型,以及如何创建和维护索引,结合实战案例分析查询执行计划,帮助读者掌握提升查询性能的方法。定期优化索引和调整查询语句是提高数据库性能的关键。
150 1
|
2月前
|
NoSQL 关系型数据库 MySQL
MySQL与Redis协同作战:优化百万数据查询的实战经验
【10月更文挑战第13天】 在处理大规模数据集时,传统的关系型数据库如MySQL可能会遇到性能瓶颈。为了提升数据处理的效率,我们可以结合使用MySQL和Redis,利用两者的优势来优化数据查询。本文将分享一次实战经验,探讨如何通过MySQL与Redis的协同工作来优化百万级数据统计。
63 5
|
2月前
|
存储 关系型数据库 MySQL
优化 MySQL 的锁机制以提高并发性能
【10月更文挑战第16天】优化 MySQL 锁机制需要综合考虑多个因素,根据具体的应用场景和需求进行针对性的调整。通过不断地优化和改进,可以提高数据库的并发性能,提升系统的整体效率。
80 1
|
2月前
|
缓存 关系型数据库 MySQL
一文彻底弄懂MySQL优化之深度分页
【10月更文挑战第24天】本文深入探讨了 MySQL 深度分页的原理、常见问题及优化策略。首先解释了深度分页的概念及其带来的性能和资源问题。接着介绍了基于偏移量(OFFSET)和限制(LIMIT)以及基于游标的分页方法,并分析了它们的优缺点。最后,提出了多种优化策略,包括合理创建索引、优化查询语句和使用数据缓存,帮助提升分页查询的性能和系统稳定性。
128 1
|
28天前
|
监控 关系型数据库 MySQL
数据库优化:MySQL索引策略与查询性能调优实战
【10月更文挑战第26天】数据库作为现代应用系统的核心组件,其性能优化至关重要。本文主要探讨MySQL的索引策略与查询性能调优。通过合理创建索引(如B-Tree、复合索引)和优化查询语句(如使用EXPLAIN、优化分页查询),可以显著提升数据库的响应速度和稳定性。实践中还需定期审查慢查询日志,持续优化性能。
63 0
|
2月前
|
存储 监控 关系型数据库
MySQL并发控制与管理:优化数据库性能的关键
【10月更文挑战第17天】MySQL并发控制与管理:优化数据库性能的关键
182 0

相关产品

  • 云原生数据库 PolarDB
  • 云数据库 RDS PostgreSQL 版