标签
PostgreSQL , adhoc查询 , 大宽表 , 任意字段组合查询 , 索引 , btree , gin , rum
背景
大宽表,任意字段组合查询,透视。是实时分析系统中的常见需求:
1、实时写入。
2、实时任意字段组合查询、透视。
PostgreSQL中,有多种方法支持这种场景:
1、方法1,每个字段加一个索引(普通类型用btree, 多值类型(数组,json,全文检索,枚举等类型)用gin, 空间、范围类型用gist,时序类型用brin)。
2、方法2,使用单个复合索引,把所有字段放到GIN倒排索引接口中。
3、方法3,使用单个复合索引,把所有字段放到RUM索引接口中。
那么这三种方法,应该选哪种更好呢?
或者说选哪种,可以在写入、查询性能上获得更好的平衡。
让单个复合索引支持超过32个字段
注意,PG默认情况下,仅支持一个索引中,最多放32个字段。
通过修改 src/include/pg_config_manual.h ,重新编译,可以支持更多的字段。但是注意,支持的字段越多,索引的头信息会越大。
vi src/include/pg_config_manual.h
/*
* Maximum number of columns in an index. There is little point in making
* this anything but a multiple of 32, because the main cost is associated
* with index tuple header size (see access/itup.h).
*
* Changing this requires an initdb.
*/
// #define INDEX_MAX_KEYS 32
#define INDEX_MAX_KEYS 128
src/include/access/itup.h
/*
* Index tuple header structure
*
* All index tuples start with IndexTupleData. If the HasNulls bit is set,
* this is followed by an IndexAttributeBitMapData. The index attribute
* values follow, beginning at a MAXALIGN boundary.
*
* Note that the space allocated for the bitmap does not vary with the number
* of attributes; that is because we don't have room to store the number of
* attributes in the header. Given the MAXALIGN constraint there's no space
* savings to be had anyway, for usual values of INDEX_MAX_KEYS.
*/
如果字段允许NULL,则有一个数据结构IndexAttributeBitMapData用来表示哪个字段的值是NULL(与TUPLE的头信息类似,也有NULL BITMAP的表示)。
因此如果字段有NULL,则索引条目的的头信息中,会多出若干BIT,是固定大小的。
typedef struct IndexAttributeBitMapData
{
bits8 bits[(INDEX_MAX_KEYS + 8 - 1) / 8];
}
INDEX_MAX_KEYS = 32 时,4字节。
对比每列独立索引、GIN单个全字段复合索引、RUM单个全字段复合索引。
安装测试PostgreSQL软件
1、编译软件
wget https://ftp.postgresql.org/pub/snapshot/dev/postgresql-snapshot.tar.bz2
tar -jxvf postgresql-snapshot.tar.bz2
cd postgresql-12devel
参考上面的章节修改源码 src/include/pg_config_manual.h ,支持单个索引放128个字段。
export USE_NAMED_POSIX_SEMAPHORES=1
LIBS=-lpthread CFLAGS="-O3" ./configure --prefix=/home/digoal/pgsql12
LIBS=-lpthread CFLAGS="-O3" make world -j 128
LIBS=-lpthread CFLAGS="-O3" make install-world
安装rum插件,略。
https://github.com/postgrespro/rum
2、设置环境变量
vi env12.sh
export PS1="$USER@`/bin/hostname -s`-> "
export PGPORT=8000
export PGDATA=/data01/pg/pg_root$PGPORT
export LANG=en_US.utf8
export PGHOME=/home/digoal/pgsql12
export LD_LIBRARY_PATH=$PGHOME/lib:/lib64:/usr/lib64:/usr/local/lib64:/lib:/usr/lib:/usr/local/lib:$LD_LIBRARY_PATH
export DATE=`date +"%Y%m%d%H%M"`
export PATH=$PGHOME/bin:$PATH:.
export MANPATH=$PGHOME/share/man:$MANPATH
export PGHOST=$PGDATA
export PGUSER=postgres
export PGDATABASE=postgres
alias rm='rm -i'
alias ll='ls -lh'
unalias vi
. ./env12.sh
3、初始化数据库集群
initdb -D $PGDATA -U postgres -X /data02/pg/pg_wal_8000 -E SQL_ASCII --locale=C
4、修改数据库配置
vi $PGDATA/postgresql.conf
listen_addresses = '0.0.0.0'
port = 8000
max_connections = 1000
unix_socket_directories = '/tmp,.'
shared_buffers = 32GB
maintenance_work_mem = 1GB
dynamic_shared_memory_type = posix
vacuum_cost_delay = 0
bgwriter_delay = 10ms
bgwriter_lru_maxpages = 1000
bgwriter_lru_multiplier = 10.0
effective_io_concurrency = 0
max_worker_processes = 128
wal_level = replica
synchronous_commit = off
wal_buffers = 16MB
wal_writer_delay = 10ms
checkpoint_timeout = 35min
max_wal_size = 64GB
min_wal_size = 16GB
checkpoint_completion_target = 0.1
random_page_cost = 1.1
log_destination = 'csvlog'
logging_collector = on
log_truncate_on_rotation = on
log_checkpoints = on
log_connections = on
log_disconnections = on
log_error_verbosity = verbose
log_timezone = 'PRC'
log_autovacuum_min_duration = 0
autovacuum_freeze_max_age = 1200000000
autovacuum_multixact_freeze_max_age = 1400000000
autovacuum_vacuum_cost_delay = 0ms
vacuum_freeze_table_age = 1150000000
vacuum_multixact_freeze_table_age = 1150000000
datestyle = 'iso, mdy'
timezone = 'PRC'
lc_messages = 'C'
lc_monetary = 'C'
lc_numeric = 'C'
lc_time = 'C'
default_text_search_config = 'pg_catalog.english'
jit = off
5、启动数据库
pg_ctl start
构建测试表
1、分别创建a b c 3张表,结构一样,都是128个字段。按要求创建索引
每列一个BTREE索引
do language plpgsql $$
declare
sql text := 'create table a (';
begin
for i in 1..128 loop
sql := sql || 'c' || i || ' int default random()*2000000000,';
end loop;
sql := rtrim(sql,',');
sql := sql || ')';
execute sql;
for i in 1..128 loop
execute 'create index idx_a_'||i||' on a (c'||i||')';
end loop;
end;
$$;
所有列放到一个GIN索引中
(注意GIN索引有FASTUPDATE参数,开启的时候,写入性能很好,但是会延迟合并PENDING TUPLES,写入快了,查询就可能慢(如果在PENDING TUPLES合并慢,并且很大时,越慢),后面会有例子)
create extension btree_gin;
do language plpgsql $$
declare
sql text := 'create table b (';
begin
for i in 1..128 loop
sql := sql || 'c' || i || ' int default random()*2000000000,';
end loop;
sql := rtrim(sql,',');
sql := sql || ')';
execute sql;
sql := 'create index idx_b_1 on b using gin (';
for i in 1..128 loop
sql := sql || 'c' || i || ',';
end loop;
sql := rtrim(sql,',');
sql := sql || ')';
execute sql;
end;
$$;
所有列放到一个RUM索引中
create extension rum;
do language plpgsql $$
declare
sql text := 'create table c (';
begin
for i in 1..128 loop
sql := sql || 'c' || i || ' int default random()*2000000000,';
end loop;
sql := rtrim(sql,',');
sql := sql || ')';
execute sql;
sql := 'create index idx_c_1 on c using rum (';
for i in 1..128 loop
sql := sql || 'c' || i || ',';
end loop;
sql := rtrim(sql,',');
sql := sql || ')';
execute sql;
end;
$$;
性能测试
1、创建三个性能测试脚本,分别用于测试三张表的写入性能
vi test_btree.sql
\set c1 random(1,2000000000)
insert into a (c1) values (:c1);
vi test_gin.sql
\set c1 random(1,2000000000)
insert into b (c1) values (:c1);
vi test_rum.sql
\set c1 random(1,2000000000)
insert into c (c1) values (:c1);
2、独立128个btree索引的写入性能。
pgbench -M prepared -n -r -P 1 -f ./test_btree.sql -c 28 -j 28 -T 120
transaction type: ./test_btree.sql
scaling factor: 1
query mode: prepared
number of clients: 28
number of threads: 28
duration: 120 s
number of transactions actually processed: 1906387
latency average = 1.762 ms
latency stddev = 0.693 ms
tps = 15886.268820 (including connections establishing)
tps = 15887.521828 (excluding connections establishing)
statement latencies in milliseconds:
0.002 \set c1 random(1,2000000000)
1.761 insert into a (c1) values (:c1);
1.59万行/s
Samples: 1M of event 'cpu-clock', Event count (approx.): 216077120900
Overhead Shared Object Symbol
12.07% postgres [.] _bt_compare
9.85% postgres [.] hash_search_with_hash_value
6.44% [kernel] [k] _raw_spin_unlock_irqrestore
5.33% postgres [.] LWLockAcquire
4.17% [kernel] [k] __do_softirq
3.90% [kernel] [k] run_timer_softirq
3.50% postgres [.] PinBuffer
3.20% postgres [.] _bt_relandgetbuf
2.24% libc-2.17.so [.] __memset_sse2
2.14% postgres [.] _bt_moveright
2.07% postgres [.] LWLockRelease
1.88% libc-2.17.so [.] __memmove_ssse3_back
1.59% [kernel] [k] finish_task_switch
1.25% postgres [.] LWLockReleaseClearVar
1.06% postgres [.] MarkBufferDirty
0.99% libc-2.17.so [.] __memcpy_ssse3_back
3、一个GIN复合索引(开启FASTUPDATE)的写入性能
pgbench -M prepared -n -r -P 1 -f ./test_gin.sql -c 28 -j 28 -T 120
transaction type: ./test_gin.sql
scaling factor: 1
query mode: prepared
number of clients: 28
number of threads: 28
duration: 120 s
number of transactions actually processed: 4791575
latency average = 0.701 ms
latency stddev = 32.171 ms
tps = 39929.190873 (including connections establishing)
tps = 39932.416884 (excluding connections establishing)
statement latencies in milliseconds:
0.001 \set c1 random(1,2000000000)
0.700 insert into b (c1) values (:c1);
3.99万行/s
Samples: 698K of event 'cpu-clock', Event count (approx.): 119873063495
Overhead Shared Object Symbol
9.70% libc-2.17.so [.] __memset_sse2
7.79% [kernel] [k] _raw_spin_unlock_irqrestore
3.93% postgres [.] ExecInitExprRec
3.13% postgres [.] AllocSetAlloc
2.69% postgres [.] ExecInitFunc
2.12% postgres [.] TupleDescInitEntry
2.04% [kernel] [k] finish_task_switch
1.64% postgres [.] hash_search_with_hash_value
1.57% postgres [.] SearchCatCache1
1.56% libc-2.17.so [.] bsearch
1.54% libc-2.17.so [.] __memcpy_ssse3_back
1.40% postgres [.] expression_tree_walker
1.33% postgres [.] palloc0
1.29% [kernel] [k] run_timer_softirq
1.27% postgres [.] dispatch_compare_ptr
1.25% postgres [.] fmgr_info
0.96% postgres [.] LWLockAcquire
4、一个RUM复合索引的写入性能
pgbench -M prepared -n -r -P 1 -f ./test_rum.sql -c 28 -j 28 -T 120
transaction type: ./test_rum.sql
scaling factor: 1
query mode: prepared
number of clients: 28
number of threads: 28
duration: 120 s
number of transactions actually processed: 453539
latency average = 7.408 ms
latency stddev = 11.713 ms
tps = 3779.393984 (including connections establishing)
tps = 3779.643084 (excluding connections establishing)
statement latencies in milliseconds:
0.002 \set c1 random(1,2000000000)
7.406 insert into c (c1) values (:c1);
0.378万行/s
Samples: 260K of event 'cpu-clock', Event count (approx.): 56451610586
Overhead Shared Object Symbol
18.58% postgres [.] computeRegionDelta
5.74% postgres [.] GenericXLogRegisterBuffer
5.68% [kernel] [k] _raw_spin_unlock_irqrestore
5.25% rum.so [.] rumtuple_get_attrnum
4.11% [kernel] [k] __do_softirq
3.51% postgres [.] hash_search_with_hash_value
3.44% rum.so [.] rumFindLeafPage
3.38% libc-2.17.so [.] __memcpy_ssse3_back
3.10% libc-2.17.so [.] __memset_sse2
2.77% [kernel] [k] run_timer_softirq
2.70% postgres [.] LWLockAcquire
2.07% postgres [.] PinBuffer
1.71% rum.so [.] entryLocateLeafEntry
1.49% postgres [.] LWLockReleaseClearVar
1.36% [kernel] [k] copy_user_enhanced_fast_string
1.07% postgres [.] LWLockRelease
0.97% rum.so [.] entryLocateEntry
0.89% postgres [.] UnpinBuffer.constprop.6
写入性能小结
很显然,性能最好的是GIN(开启FASTUPDATE)时的性能,其次是独立索引,最后是RUM索引。
查询性能
1、都测试了120秒,写入量如下
postgres=# \dt+
List of relations
Schema | Name | Type | Owner | Size | Description
--------+------+-------+----------+---------+-------------
public | a | table | postgres | 993 MB |
public | b | table | postgres | 2497 MB |
public | c | table | postgres | 237 MB |
(2 rows)
postgres=# select count(*) from a;
count
---------
1906387
(1 row)
postgres=# select count(*) from b;
count
---------
4791575
(1 row)
postgres=# select count(*) from c;
count
--------
453539
(1 row)
2、索引大小,很显然GIN索引很大,超过了所有BTREE索引加起来的大小,并且还没有完全合并所有的pending tuples。
postgres=# \di+
List of relations
Schema | Name | Type | Owner | Table | Size | Description
--------+-----------+-------+----------+-------+-------+-------------
public | idx_a_1 | index | postgres | a | 53 MB |
public | idx_a_10 | index | postgres | a | 53 MB |
public | idx_a_100 | index | postgres | a | 54 MB |
public | idx_a_101 | index | postgres | a | 54 MB |
public | idx_a_102 | index | postgres | a | 53 MB |
public | idx_a_103 | index | postgres | a | 54 MB |
public | idx_a_104 | index | postgres | a | 54 MB |
public | idx_a_105 | index | postgres | a | 53 MB |
public | idx_a_106 | index | postgres | a | 53 MB |
public | idx_a_107 | index | postgres | a | 54 MB |
public | idx_a_108 | index | postgres | a | 54 MB |
public | idx_a_109 | index | postgres | a | 54 MB |
public | idx_a_11 | index | postgres | a | 54 MB |
public | idx_a_110 | index | postgres | a | 53 MB |
public | idx_a_111 | index | postgres | a | 53 MB |
public | idx_a_112 | index | postgres | a | 54 MB |
public | idx_a_113 | index | postgres | a | 54 MB |
public | idx_a_114 | index | postgres | a | 53 MB |
public | idx_a_115 | index | postgres | a | 53 MB |
public | idx_a_116 | index | postgres | a | 54 MB |
public | idx_a_117 | index | postgres | a | 53 MB |
public | idx_a_118 | index | postgres | a | 54 MB |
public | idx_a_119 | index | postgres | a | 54 MB |
public | idx_a_12 | index | postgres | a | 54 MB |
public | idx_a_120 | index | postgres | a | 54 MB |
public | idx_a_121 | index | postgres | a | 53 MB |
public | idx_a_122 | index | postgres | a | 53 MB |
public | idx_a_123 | index | postgres | a | 53 MB |
public | idx_a_124 | index | postgres | a | 54 MB |
public | idx_a_125 | index | postgres | a | 54 MB |
public | idx_a_126 | index | postgres | a | 53 MB |
public | idx_a_127 | index | postgres | a | 53 MB |
public | idx_a_128 | index | postgres | a | 54 MB |
public | idx_a_13 | index | postgres | a | 54 MB |
public | idx_a_14 | index | postgres | a | 54 MB |
public | idx_a_15 | index | postgres | a | 54 MB |
public | idx_a_16 | index | postgres | a | 53 MB |
public | idx_a_17 | index | postgres | a | 53 MB |
public | idx_a_18 | index | postgres | a | 53 MB |
public | idx_a_19 | index | postgres | a | 53 MB |
public | idx_a_2 | index | postgres | a | 53 MB |
public | idx_a_20 | index | postgres | a | 53 MB |
public | idx_a_21 | index | postgres | a | 53 MB |
public | idx_a_22 | index | postgres | a | 53 MB |
public | idx_a_23 | index | postgres | a | 53 MB |
public | idx_a_24 | index | postgres | a | 54 MB |
public | idx_a_25 | index | postgres | a | 53 MB |
public | idx_a_26 | index | postgres | a | 54 MB |
public | idx_a_27 | index | postgres | a | 54 MB |
public | idx_a_28 | index | postgres | a | 53 MB |
public | idx_a_29 | index | postgres | a | 54 MB |
public | idx_a_3 | index | postgres | a | 54 MB |
public | idx_a_30 | index | postgres | a | 54 MB |
public | idx_a_31 | index | postgres | a | 53 MB |
public | idx_a_32 | index | postgres | a | 53 MB |
public | idx_a_33 | index | postgres | a | 53 MB |
public | idx_a_34 | index | postgres | a | 54 MB |
public | idx_a_35 | index | postgres | a | 53 MB |
public | idx_a_36 | index | postgres | a | 53 MB |
public | idx_a_37 | index | postgres | a | 53 MB |
public | idx_a_38 | index | postgres | a | 54 MB |
public | idx_a_39 | index | postgres | a | 54 MB |
public | idx_a_4 | index | postgres | a | 54 MB |
public | idx_a_40 | index | postgres | a | 53 MB |
public | idx_a_41 | index | postgres | a | 54 MB |
public | idx_a_42 | index | postgres | a | 53 MB |
public | idx_a_43 | index | postgres | a | 53 MB |
public | idx_a_44 | index | postgres | a | 53 MB |
public | idx_a_45 | index | postgres | a | 53 MB |
public | idx_a_46 | index | postgres | a | 54 MB |
public | idx_a_47 | index | postgres | a | 54 MB |
public | idx_a_48 | index | postgres | a | 54 MB |
public | idx_a_49 | index | postgres | a | 53 MB |
public | idx_a_5 | index | postgres | a | 54 MB |
public | idx_a_50 | index | postgres | a | 54 MB |
public | idx_a_51 | index | postgres | a | 53 MB |
public | idx_a_52 | index | postgres | a | 54 MB |
public | idx_a_53 | index | postgres | a | 54 MB |
public | idx_a_54 | index | postgres | a | 53 MB |
public | idx_a_55 | index | postgres | a | 54 MB |
public | idx_a_56 | index | postgres | a | 54 MB |
public | idx_a_57 | index | postgres | a | 53 MB |
public | idx_a_58 | index | postgres | a | 53 MB |
public | idx_a_59 | index | postgres | a | 53 MB |
public | idx_a_6 | index | postgres | a | 53 MB |
public | idx_a_60 | index | postgres | a | 54 MB |
public | idx_a_61 | index | postgres | a | 53 MB |
public | idx_a_62 | index | postgres | a | 54 MB |
public | idx_a_63 | index | postgres | a | 54 MB |
public | idx_a_64 | index | postgres | a | 54 MB |
public | idx_a_65 | index | postgres | a | 53 MB |
public | idx_a_66 | index | postgres | a | 54 MB |
public | idx_a_67 | index | postgres | a | 53 MB |
public | idx_a_68 | index | postgres | a | 54 MB |
public | idx_a_69 | index | postgres | a | 54 MB |
public | idx_a_7 | index | postgres | a | 54 MB |
public | idx_a_70 | index | postgres | a | 53 MB |
public | idx_a_71 | index | postgres | a | 54 MB |
public | idx_a_72 | index | postgres | a | 54 MB |
public | idx_a_73 | index | postgres | a | 54 MB |
public | idx_a_74 | index | postgres | a | 54 MB |
public | idx_a_75 | index | postgres | a | 54 MB |
public | idx_a_76 | index | postgres | a | 53 MB |
public | idx_a_77 | index | postgres | a | 54 MB |
public | idx_a_78 | index | postgres | a | 53 MB |
public | idx_a_79 | index | postgres | a | 53 MB |
public | idx_a_8 | index | postgres | a | 53 MB |
public | idx_a_80 | index | postgres | a | 53 MB |
public | idx_a_81 | index | postgres | a | 53 MB |
public | idx_a_82 | index | postgres | a | 54 MB |
public | idx_a_83 | index | postgres | a | 53 MB |
public | idx_a_84 | index | postgres | a | 53 MB |
public | idx_a_85 | index | postgres | a | 54 MB |
public | idx_a_86 | index | postgres | a | 53 MB |
public | idx_a_87 | index | postgres | a | 54 MB |
public | idx_a_88 | index | postgres | a | 53 MB |
public | idx_a_89 | index | postgres | a | 53 MB |
public | idx_a_9 | index | postgres | a | 54 MB |
public | idx_a_90 | index | postgres | a | 54 MB |
public | idx_a_91 | index | postgres | a | 54 MB |
public | idx_a_92 | index | postgres | a | 54 MB |
public | idx_a_93 | index | postgres | a | 53 MB |
public | idx_a_94 | index | postgres | a | 53 MB |
public | idx_a_95 | index | postgres | a | 54 MB |
public | idx_a_96 | index | postgres | a | 53 MB |
public | idx_a_97 | index | postgres | a | 53 MB |
public | idx_a_98 | index | postgres | a | 53 MB |
public | idx_a_99 | index | postgres | a | 54 MB |
public | idx_b_1 | index | postgres | b | 17 GB |
(129 rows)
postgres=# \di+
List of relations
Schema | Name | Type | Owner | Table | Size | Description
--------+---------+-------+----------+-------+---------+-------------
public | idx_c_1 | index | postgres | c | 2250 MB |
(1 row)
3、多列查询CASE 1,一个条件选择性好,一个条件选择性差
postgres=# explain analyze select count(*) from a where c1<100000 and c2<10000000;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
Aggregate (cost=104.57..104.58 rows=1 width=8) (actual time=0.332..0.332 rows=1 loops=1)
-> Index Scan using idx_a_1 on a (cost=0.43..104.56 rows=1 width=0) (actual time=0.327..0.327 rows=0 loops=1)
Index Cond: (c1 < 100000)
Filter: (c2 < 10000000)
Rows Removed by Filter: 92
Planning Time: 0.977 ms
Execution Time: 0.399 ms
(7 rows)
postgres=# explain analyze select count(*) from b where c1<100000 and c2<10000000;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=351003.54..351003.55 rows=1 width=8) (actual time=752.842..752.842 rows=1 loops=1)
-> Gather (cost=351003.32..351003.53 rows=2 width=8) (actual time=752.829..752.834 rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial Aggregate (cost=350003.32..350003.33 rows=1 width=8) (actual time=546.163..546.164 rows=1 loops=3)
-> Parallel Seq Scan on b (cost=0.00..349448.64 rows=221871 width=0) (actual time=499.082..546.152 rows=0 loops=3)
Filter: ((c1 < 100000) AND (c2 < 10000000))
Rows Removed by Filter: 1597191
Planning Time: 0.669 ms
Execution Time: 792.562 ms
(10 rows)
postgres=# explain analyze select count(*) from c where c1<100000 and c2<10000000;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
Aggregate (cost=18.71..18.72 rows=1 width=8) (actual time=0.776..0.776 rows=1 loops=1)
-> Index Scan using idx_c_1 on c (cost=16.50..18.71 rows=1 width=0) (actual time=0.773..0.773 rows=0 loops=1)
Index Cond: ((c1 < 100000) AND (c2 < 10000000))
Planning Time: 0.077 ms
Execution Time: 0.815 ms
(5 rows)
显然,性能最好的是独立索引,因为选择性好的一个就好了,只是少量的FILTER
为什么GIN性能不行呢?理论上应该很好呀。
https://www.postgresql.org/docs/devel/static/pageinspect.html#id-1.11.7.31.7
postgres=# SELECT * FROM gin_metapage_info(get_raw_page('idx_b_1', 0));
pending_head | pending_tail | tail_free_size | n_pending_pages | n_pending_tuples | n_total_pages | n_entry_pages | n_data_pages | n_entries | version
--------------+--------------+----------------+-----------------+------------------+---------------+---------------+--------------+-----------+---------
230891 | 1842877 | 5600 | 1587579 | 4205010 | 2 | 1 | 0 | 0 | 2
(1 row)
原因,开启了FASTUPDATE的情况下,写入超快,但是合并PENDING TUPLES只有一个AUTOVACUUM WORKER,所以合并没有写入快,导致了pending tuples越来越大。如上,有几百万条还没有合并。
强制合并:
vacuum analyze b;
耗时需要很久很久,所以实际上开启gin索引的fastupdate,如果写入长时间持续太猛了,并不好。因为合并跟不上,查询性能跟不上。
修改GIN索引的fastupdate,关闭,重测
postgres=# alter index idx_b_1 set (fastupdate =off);
ALTER INDEX
重新测试
transaction type: ./test_gin.sql
scaling factor: 1
query mode: prepared
number of clients: 28
number of threads: 28
duration: 120 s
number of transactions actually processed: 1638833
latency average = 1.985 ms
latency stddev = 0.572 ms
tps = 13656.742727 (including connections establishing)
tps = 13657.785073 (excluding connections establishing)
statement latencies in milliseconds:
0.002 \set c1 random(1,2000000000)
1.983 insert into b (c1) values (:c1);
写入速度从3.99万行/s下降到了1.36万行/s
Samples: 963K of event 'cpu-clock', Event count (approx.): 191435286728
Overhead Shared Object Symbol
9.26% postgres [.] PinBuffer
8.74% postgres [.] LWLockAcquire
7.85% postgres [.] gintuple_get_attrnum
6.01% postgres [.] UnpinBuffer.constprop.6
5.48% postgres [.] hash_search_with_hash_value
5.46% postgres [.] ginFindLeafPage
4.85% postgres [.] LWLockRelease
3.89% [kernel] [k] _raw_spin_unlock_irqrestore
3.19% [kernel] [k] __do_softirq
3.03% [kernel] [k] run_timer_softirq
2.89% postgres [.] ReadBuffer_common
2.51% postgres [.] entryLocateLeafEntry
1.93% postgres [.] entryLocateEntry
1.67% libc-2.17.so [.] __memset_sse2
1.48% postgres [.] gintuple_get_key
1.29% postgres [.] LWLockReleaseClearVar
1.21% libc-2.17.so [.] __memmove_ssse3_back
1.07% libc-2.17.so [.] __memcpy_ssse3_back
0.89% postgres [.] MarkBufferDirty
0.89% postgres [.] AllocSetAlloc
0.76% [kernel] [k] finish_task_switch
0.72% postgres [.] XLogInsertRecord
写入性能下降,比128列独立索引还要慢,可以看到现在没有pending tuples了。
postgres=# SELECT * FROM gin_metapage_info(get_raw_page('idx_b_1', 0));
pending_head | pending_tail | tail_free_size | n_pending_pages | n_pending_tuples | n_total_pages | n_entry_pages | n_data_pages | n_entries | version
--------------+--------------+----------------+-----------------+------------------+---------------+---------------+--------------+-----------+---------
4294967295 | 4294967295 | 0 | 0 | 0 | 2 | 1 | 0 | 0 | 2
(1 row)
多列查询CASE 1,一个条件选择性好,一个条件选择性差.
现在GIN索引的查询性能还好,但是还是不如128独立列索引。
postgres=# explain analyze select count(*) from b where c1<100000 and c2<10000000;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
Aggregate (cost=23.13..23.14 rows=1 width=8) (actual time=3.062..3.062 rows=1 loops=1)
-> Bitmap Heap Scan on b (cost=22.01..23.12 rows=1 width=0) (actual time=3.060..3.060 rows=0 loops=1)
Recheck Cond: ((c1 < 100000) AND (c2 < 10000000))
-> Bitmap Index Scan on idx_b_1 (cost=0.00..22.01 rows=1 width=0) (actual time=3.058..3.058 rows=0 loops=1)
Index Cond: ((c1 < 100000) AND (c2 < 10000000))
Planning Time: 0.053 ms
Execution Time: 3.187 ms
(7 rows)
4、多列查询CASE 2,两个条件都是大范围,合并后结果集较小
postgres=# explain analyze select count(*) from c where c1<100000000 and c2<100000000;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=1290.52..1290.53 rows=1 width=8) (actual time=20.199..20.200 rows=1 loops=1)
-> Bitmap Heap Scan on c (cost=29.59..1287.60 rows=1170 width=0) (actual time=18.299..20.091 rows=1078 loops=1)
Recheck Cond: ((c1 < 100000000) AND (c2 < 100000000))
Heap Blocks: exact=1061
-> Bitmap Index Scan on idx_c_1 (cost=0.00..29.30 rows=1170 width=0) (actual time=18.179..18.179 rows=1078 loops=1)
Index Cond: ((c1 < 100000000) AND (c2 < 100000000))
Planning Time: 0.183 ms
Execution Time: 20.474 ms
(8 rows)
postgres=# explain analyze select count(*) from b where c1<100000000 and c2<100000000;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=4567.42..4567.43 rows=1 width=8) (actual time=105.917..105.918 rows=1 loops=1)
-> Bitmap Heap Scan on b (cost=65.90..4556.98 rows=4176 width=0) (actual time=81.593..105.503 rows=4098 loops=1)
Recheck Cond: ((c1 < 100000000) AND (c2 < 100000000))
Heap Blocks: exact=4020
-> Bitmap Index Scan on idx_b_1 (cost=0.00..64.86 rows=4176 width=0) (actual time=81.067..81.067 rows=4098 loops=1)
Index Cond: ((c1 < 100000000) AND (c2 < 100000000))
Planning Time: 10.918 ms
Execution Time: 107.486 ms
(8 rows)
postgres=# explain analyze select count(*) from a where c1<100000000 and c2<100000000;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=7574.38..7574.39 rows=1 width=8) (actual time=181.375..181.375 rows=1 loops=1)
-> Bitmap Heap Scan on a (cost=2231.28..7561.98 rows=4959 width=0) (actual time=62.215..180.837 rows=4860 loops=1)
Recheck Cond: ((c1 < 100000000) AND (c2 < 100000000))
Rows Removed by Index Recheck: 266961
Heap Blocks: exact=3166 lossy=17904
-> BitmapAnd (cost=2231.28..2231.28 rows=4959 width=0) (actual time=61.461..61.461 rows=0 loops=1)
-> Bitmap Index Scan on idx_a_1 (cost=0.00..1090.57 rows=95192 width=0) (actual time=20.603..20.603 rows=95198 loops=1)
Index Cond: (c1 < 100000000)
-> Bitmap Index Scan on idx_a_2 (cost=0.00..1137.98 rows=99314 width=0) (actual time=37.628..37.628 rows=95665 loops=1)
Index Cond: (c2 < 100000000)
Planning Time: 3.797 ms
Execution Time: 183.723 ms
(12 rows)
现在,性能最好的是GIN索引,其次是RUM和独立列索引(性能差不多)。
5、多列查询CASE 3,单个字段大范围,结果集大
postgres=# explain analyze select count(*) from c where c1<100000000 ;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=18671.14..18671.15 rows=1 width=8) (actual time=25.041..25.041 rows=1 loops=1)
-> Index Scan using idx_c_1 on c (cost=8.80..18614.25 rows=22757 width=0) (actual time=10.142..22.987 rows=22238 loops=1)
Index Cond: (c1 < 100000000)
Planning Time: 0.099 ms
Execution Time: 25.276 ms
(5 rows)
postgres=# explain analyze select count(*) from b where c1<100000000 ;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=63372.32..63372.33 rows=1 width=8) (actual time=139.236..139.236 rows=1 loops=1)
-> Bitmap Heap Scan on b (cost=684.51..63166.06 rows=82505 width=0) (actual time=57.762..131.941 rows=81741 loops=1)
Recheck Cond: (c1 < 100000000)
Heap Blocks: exact=58458
-> Bitmap Index Scan on idx_b_1 (cost=0.00..663.89 rows=82505 width=0) (actual time=48.058..48.058 rows=81741 loops=1)
Index Cond: (c1 < 100000000)
Planning Time: 0.123 ms
Execution Time: 141.904 ms
(8 rows)
postgres=# explain analyze select count(*) from a where c1<100000000 ;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=78459.78..78459.79 rows=1 width=8) (actual time=109.022..109.023 rows=1 loops=1)
-> Index Only Scan using idx_a_1 on a (cost=0.43..78221.80 rows=95192 width=0) (actual time=0.028..100.343 rows=95198 loops=1)
Index Cond: (c1 < 100000000)
Heap Fetches: 95198
Planning Time: 0.614 ms
Execution Time: 109.053 ms
(6 rows)
现在,性能最好的是RUM和独立列索引(性能差不多),其次是GIN索引。
小结
索引模式 | 写入速度 | 索引大小 | 单列查询性能(选择性好) | 单列查询性能(选择性不好) | 多列查询性能(单列选择性都不好,但是合并后选择性很好) | 多列查询性能(有些列选择性很好) |
---|---|---|---|---|---|---|
128列单个gin(fastupdate=on)索引 | 3.99万行/s | 最大 | 不好(pending tuples很大时) | 不好(选择性不好本身就不适合用索引过滤) | 不好(pending tuples很大时) | 不好(pending tuples很大时) |
128列单个gin(fastupdate=off)索引 | 1.36万行/s | 最大 | 好 | 不好(选择性不好本身就不适合用索引过滤) | 好 | 一般 |
128列单个rum索引 | 0.378万行/s | 较小 | 很好 | 不好(选择性不好本身就不适合用索引过滤) | 一般 | 好 |
128列128个btree索引 | 1.59万行/s | 较小 | 很好 | 不好(选择性不好本身就不适合用索引过滤) | 一般 | 好 |
所以,建议使用每列独立的索引,来支撑任意列的组合查询,可以获得写入、读取较好的均衡。
选择性不好的列,建议不要使用索引,选择性通过pg_stats查看
analyze c;
postgres=# select schemaname,tablename,attname,n_distinct from pg_stats where tablename='c';
schemaname | tablename | attname | n_distinct
------------+-----------+---------+------------
public | c | c1 | -1
public | c | c2 | -0.999025
public | c | c3 | -1
...
《PostgreSQL SQL自动优化案例 - 极简,自动推荐索引》
《自动选择正确索引访问接口(btree,hash,gin,gist,sp-gist,brin,bitmap...)的方法》
为什么写入性能会是这样的结果?
同步模式BUILD索引,索引越多,写入的影响越大,本文对单表的测试可以看出,128个索引,写入只能达到1.59万行/s。如果硬件资源足够的话,可以通过使用分区表(多个表)来解决写入的瓶颈问题。
异步模式BUILD索引,(GIN开启fastupdate),写入吞吐可以提高,但是如果长时间处于高吞吐的写入下,由于一张表只有一个VACUUM WORKER进程合并pending tuples,所以可能导致PENDING TUPLES会越来越多,导致查询性能下降。解决办法是分区表,这样每个分区可以有1个vacuum worker参与合并,降低延迟。
为什么查询性能在几种情况下会是这样的结果?
选择性好的情况下,三种索引,性能都很好。
多列查询,如果所有条件的选择性都不好,但是最后所有条件的合并选择性如果很好时,GIN内置的BITMAP过滤,使得性能在RUM,GIN,BTREE多列三种情况下达到最好。
多列查询,如果所有条件的选择性都不好,并且最后所有条件合并后的选择性也不好时,所有索引的性能都一般。