标签
PostgreSQL , 分区表 , 优化器 , 分区过滤 , hash 分区
背景
PostgreSQL 10开始内置分区表语法,当时只支持了range,list两种分区,实际上可以通过LIST实现HASH分区。
PostgreSQL 10 hash 分区表
使用list支持hash分区
postgres=# create table p (id int , info text, crt_time timestamp) partition by list (abs(mod(id,4)));
CREATE TABLE
postgres=# create table p0 partition of p for values in (0);
CREATE TABLE
postgres=# create table p1 partition of p for values in (1);
CREATE TABLE
postgres=# create table p2 partition of p for values in (2);
CREATE TABLE
postgres=# create table p3 partition of p for values in (3);
CREATE TABLE
分区表如下
postgres=# \d+ p
Table "public.p"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
----------+-----------------------------+-----------+----------+---------+----------+--------------+-------------
id | integer | | | | plain | |
info | text | | | | extended | |
crt_time | timestamp without time zone | | | | plain | |
Partition key: LIST (abs(mod(id, 4)))
Partitions: p0 FOR VALUES IN (0),
p1 FOR VALUES IN (1),
p2 FOR VALUES IN (2),
p3 FOR VALUES IN (3)
写入数据
postgres=# insert into p select generate_series(1,1000),md5(random()::text),now();
INSERT 0 1000
postgres=# select tableoid::regclass,id from p limit 10;
tableoid | id
----------+----
p0 | 4
p0 | 8
p0 | 12
p0 | 16
p0 | 20
p0 | 24
p0 | 28
p0 | 32
p0 | 36
p0 | 40
(10 rows)
普通的查询,无法做到分区的过滤
postgres=# explain select * from p where id=1 ;
QUERY PLAN
----------------------------------------------------------
Append (cost=0.00..96.50 rows=24 width=44)
-> Seq Scan on p0 (cost=0.00..24.12 rows=6 width=44)
Filter: (id = 1)
-> Seq Scan on p1 (cost=0.00..24.12 rows=6 width=44)
Filter: (id = 1)
-> Seq Scan on p2 (cost=0.00..24.12 rows=6 width=44)
Filter: (id = 1)
-> Seq Scan on p3 (cost=0.00..24.12 rows=6 width=44)
Filter: (id = 1)
(9 rows)
一定要带上分区条件,才可以做到分区过滤
postgres=# explain select * from p where id=1 and abs(mod(id, 4))=abs(mod(1, 4));
QUERY PLAN
----------------------------------------------------------
Append (cost=0.00..32.60 rows=1 width=44)
-> Seq Scan on p1 (cost=0.00..32.60 rows=1 width=44)
Filter: ((id = 1) AND (abs(mod(id, 4)) = 1))
(3 rows)
PostgreSQL 11 hash 分区表
PostgreSQL 11同样可以使用与10一样的方法,LIST来实现HASH分区,但是有一个更加优雅的方法,直接使用HASH分区。
postgres=# create table p (id int , info text, crt_time timestamp) partition by hash (id);
CREATE TABLE
postgres=# create table p0 partition of p for values WITH (MODULUS 4, REMAINDER 0);
CREATE TABLE
postgres=# create table p1 partition of p for values WITH (MODULUS 4, REMAINDER 1);
CREATE TABLE
postgres=# create table p2 partition of p for values WITH (MODULUS 4, REMAINDER 2);
CREATE TABLE
postgres=# create table p3 partition of p for values WITH (MODULUS 4, REMAINDER 3);
CREATE TABLE
表结构如下
postgres=# \d+ p
Table "public.p"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
----------+-----------------------------+-----------+----------+---------+----------+--------------+-------------
id | integer | | | | plain | |
info | text | | | | extended | |
crt_time | timestamp without time zone | | | | plain | |
Partition key: HASH (id)
Partitions: p0 FOR VALUES WITH (modulus 4, remainder 0),
p1 FOR VALUES WITH (modulus 4, remainder 1),
p2 FOR VALUES WITH (modulus 4, remainder 2),
p3 FOR VALUES WITH (modulus 4, remainder 3)
表分区定义,内置的约束是一个HASH函数的返回值
postgres=# \d+ p0
Table "public.p0"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
----------+-----------------------------+-----------+----------+---------+----------+--------------+-------------
id | integer | | | | plain | |
info | text | | | | extended | |
crt_time | timestamp without time zone | | | | plain | |
Partition of: p FOR VALUES WITH (modulus 4, remainder 0)
Partition constraint: satisfies_hash_partition('180289'::oid, 4, 0, id)
postgres=# \d+ p1
Table "public.p1"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
----------+-----------------------------+-----------+----------+---------+----------+--------------+-------------
id | integer | | | | plain | |
info | text | | | | extended | |
crt_time | timestamp without time zone | | | | plain | |
Partition of: p FOR VALUES WITH (modulus 4, remainder 1)
Partition constraint: satisfies_hash_partition('180289'::oid, 4, 1, id)
这个hash函数的定义如下,他一定是一个immutable 函数,所以可以用于分区过滤
postgres=# \x
Expanded display is on.
postgres=# \df+ satisfies_hash_partition
List of functions
-[ RECORD 1 ]-------+--------------------------------------
Schema | pg_catalog
Name | satisfies_hash_partition
Result data type | boolean
Argument data types | oid, integer, integer, VARIADIC "any"
Type | func
Volatility | immutable
Parallel | safe
Owner | postgres
Security | invoker
Access privileges |
Language | internal
Source code | satisfies_hash_partition
Description | hash partition CHECK constraint
PostgreSQL 11终于可以只输入分区字段值就可以做到分区过滤了
postgres=# explain select * from p where id=1;
QUERY PLAN
----------------------------------------------------------
Append (cost=0.00..24.16 rows=6 width=44)
-> Seq Scan on p0 (cost=0.00..24.12 rows=6 width=44)
Filter: (id = 1)
(3 rows)
postgres=# explain select * from p where id=0;
QUERY PLAN
----------------------------------------------------------
Append (cost=0.00..24.16 rows=6 width=44)
-> Seq Scan on p0 (cost=0.00..24.12 rows=6 width=44)
Filter: (id = 0)
(3 rows)
postgres=# explain select * from p where id=2;
QUERY PLAN
----------------------------------------------------------
Append (cost=0.00..24.16 rows=6 width=44)
-> Seq Scan on p2 (cost=0.00..24.12 rows=6 width=44)
Filter: (id = 2)
(3 rows)
postgres=# explain select * from p where id=3;
QUERY PLAN
----------------------------------------------------------
Append (cost=0.00..24.16 rows=6 width=44)
-> Seq Scan on p1 (cost=0.00..24.12 rows=6 width=44)
Filter: (id = 3)
(3 rows)
它受控于一个开关,当关闭后,就无法只通过分区值来过滤分区。
postgres=# set enable_partition_pruning =off;
SET
postgres=# explain select * from p where id=0;
QUERY PLAN
----------------------------------------------------------
Append (cost=0.00..96.62 rows=24 width=44)
-> Seq Scan on p0 (cost=0.00..24.12 rows=6 width=44)
Filter: (id = 0)
-> Seq Scan on p1 (cost=0.00..24.12 rows=6 width=44)
Filter: (id = 0)
-> Seq Scan on p2 (cost=0.00..24.12 rows=6 width=44)
Filter: (id = 0)
-> Seq Scan on p3 (cost=0.00..24.12 rows=6 width=44)
Filter: (id = 0)
(9 rows)
PostgreSQL 继承表 hash 分区表实现
postgres=# explain select * from tbl where abs(mod(id,4)) = abs(mod(1,4)) and id=1;
QUERY PLAN
--------------------------------------------------------------------------
Append (cost=0.00..979127.84 rows=3 width=45)
-> Seq Scan on tbl (cost=0.00..840377.67 rows=2 width=45)
Filter: ((id = 1) AND (abs(mod(id, 4)) = 1))
-> Seq Scan on tbl1 (cost=0.00..138750.17 rows=1 width=45)
Filter: ((id = 1) AND (abs(mod(id, 4)) = 1))
(5 rows)
pg_pathman分区方法
支持9.5以上的版本
《PostgreSQL 9.5+ 高效分区表实现 - pg_pathman》
小结
1、PostgreSQL 10内置分区表,为了HASH分区,可以使用LIST分区的方法,但是为了让数据库可以自动过滤分区,一定要带上HASH分区条件表达式到SQL中。
2、PostgreSQL 11内置分区表,内置了HASH分区,并且支持只按照HASH分区条件,自动过滤分区。
3、继承表的方法,同样可以实现HASH分区,需要创建触发器,同时主表在查询时依旧会被查询到。
以上三种方法,必须保证constraint_exclusion
参数设置为partition或者on, 否则无法做到分区自动过滤。
对于PostgreSQL 11,为了实现只输入分区字段的值就能够满足分区自动过滤,还需要设置enable_partition_pruning
为on.
索性这些参数默认都是OK的。
4、pg_pathman是通过custom scan接口实现的分区,是目前为止,性能最好的,锁粒度最低的方法。
参考
《PostgreSQL 11 preview - 分区表 增强 汇总》
《PostgreSQL 11 preview - 分区过滤控制参数 - enable_partition_pruning》
《Greenplum 计算能力估算 - 暨多大表需要分区,单个分区多大适宜》
《PostgreSQL 11 preview - 分区表智能并行聚合、分组计算(已类似MPP架构,性能暴增)》
《PostgreSQL 并行vacuum patch - 暨为什么需要并行vacuum或分区表》
《分区表锁粒度差异 - pg_pathman VS native partition table》
《PostgreSQL 11 preview - 分区表用法及增强 - 增加HASH分区支持 (hash, range, list)》
《PostgreSQL 11 preview - Parallel Append(包括 union all\分区查询) (多表并行计算) sharding架构并行计算核心功能之一》
《PostgreSQL 11 preview - 分区表智能并行JOIN (已类似MPP架构,性能暴增)》
《PostgreSQL 查询涉及分区表过多导致的性能问题 - 性能诊断与优化(大量BIND, spin lock, SLEEP进程)》
《PostgreSQL 商用版本EPAS(阿里云ppas(Oracle 兼容版)) - 分区表性能优化 (堪比pg_pathman)》
《HTAP数据库 PostgreSQL 场景与性能测试之 45 - (OLTP) 数据量与性能的线性关系(10亿+无衰减), 暨单表多大需要分区》
《PostgreSQL 10 内置分区 vs pg_pathman perf profiling》