标签
PostgreSQL , hash , range , list , 分区表
背景
PostgreSQL 10开始内置分区表的功能,11对分区表进行了增强,包括:
1、增加HASH分区的支持,支持了RANGE, HASH, LIST三种分区。
https://www.postgresql.org/docs/devel/static/ddl-partitioning.html
2、支持分区表的INSERT ON CONFLICT
https://www.postgresql.org/docs/devel/static/sql-insert.html
3、支持分区表的全局索引管理
《PostgreSQL 11 preview - 新功能, 分区表全局索引管理》
4、支持分区表的并行分区JOIN
《PostgreSQL 11 preview - 分区表智能并行JOIN (已类似MPP架构,性能暴增)》
分区表定义例子
https://www.postgresql.org/docs/devel/static/sql-createtable.html
range 分区表
1、单列range
CREATE TABLE measurement (
logdate date not null,
peaktemp int,
unitsales int
) PARTITION BY RANGE (logdate);
CREATE TABLE measurement_y2016m07
PARTITION OF measurement (
unitsales DEFAULT 0
) FOR VALUES FROM ('2016-07-01') TO ('2016-08-01');
2、多列RANGE
CREATE TABLE measurement_year_month (
logdate date not null,
peaktemp int,
unitsales int
) PARTITION BY RANGE (EXTRACT(YEAR FROM logdate), EXTRACT(MONTH FROM logdate));
CREATE TABLE measurement_ym_older
PARTITION OF measurement_year_month
FOR VALUES FROM (MINVALUE, MINVALUE) TO (2016, 11);
CREATE TABLE measurement_ym_y2016m11
PARTITION OF measurement_year_month
FOR VALUES FROM (2016, 11) TO (2016, 12);
CREATE TABLE measurement_ym_y2016m12
PARTITION OF measurement_year_month
FOR VALUES FROM (2016, 12) TO (2017, 01);
CREATE TABLE measurement_ym_y2017m01
PARTITION OF measurement_year_month
FOR VALUES FROM (2017, 01) TO (2017, 02);
list 分区表
CREATE TABLE cities (
city_id bigserial not null,
name text not null,
population bigint
) PARTITION BY LIST (left(lower(name), 1));
CREATE TABLE cities_ab
PARTITION OF cities (
CONSTRAINT city_id_nonzero CHECK (city_id != 0)
) FOR VALUES IN ('a', 'b');
如果想定义多级分区,那么在创建分区表时,也使用partition by即可
CREATE TABLE cities_ab
PARTITION OF cities (
CONSTRAINT city_id_nonzero CHECK (city_id != 0)
) FOR VALUES IN ('a', 'b') PARTITION BY RANGE (population);
CREATE TABLE cities_ab_10000_to_100000
PARTITION OF cities_ab FOR VALUES FROM (10000) TO (100000);
hash 分区表
CREATE TABLE orders (
order_id bigint not null,
cust_id bigint not null,
status text
) PARTITION BY HASH (order_id);
CREATE TABLE orders_p1 PARTITION OF orders
FOR VALUES WITH (MODULUS 4, REMAINDER 0);
CREATE TABLE orders_p2 PARTITION OF orders
FOR VALUES WITH (MODULUS 4, REMAINDER 1);
CREATE TABLE orders_p3 PARTITION OF orders
FOR VALUES WITH (MODULUS 4, REMAINDER 2);
CREATE TABLE orders_p4 PARTITION OF orders
FOR VALUES WITH (MODULUS 4, REMAINDER 3);
默认分区
CREATE TABLE cities_partdef
PARTITION OF cities DEFAULT;
小结
PostgreSQL的分区表非常灵活,
1、可以按单列或多列分区
2、可以按表达式分区
3、可以创建多级分区(不限级数)
多级分区定义例子
1、创建主表
create table t_range_list(id int, info text, crt_time timestamp) partition by list ( mod(hashtext(info), 4) );
2、创建一级分区,主表
create table t_range_list_0 partition of t_range_list (id , info , crt_time ) for values in (0) partition by range (crt_time);
create table t_range_list_1 partition of t_range_list (id , info , crt_time ) for values in (1) partition by range (crt_time);
create table t_range_list_2 partition of t_range_list (id , info , crt_time ) for values in (2) partition by range (crt_time);
create table t_range_list_3 partition of t_range_list (id , info , crt_time ) for values in (3) partition by range (crt_time);
3、创建2级分区表
create table t_range_list_0_201611 partition of t_range_list_0 (id primary key, info , crt_time ) for values from ('2016-10-01') to ('2016-11-01');
create table t_range_list_0_201612 partition of t_range_list_0 (id primary key, info , crt_time ) for values from ('2016-11-01') to ('2016-12-01');
create table t_range_list_1_201611 partition of t_range_list_1 (id primary key, info , crt_time ) for values from ('2016-10-01') to ('2016-11-01');
create table t_range_list_1_201612 partition of t_range_list_1 (id primary key, info , crt_time ) for values from ('2016-11-01') to ('2016-12-01');
create table t_range_list_2_201611 partition of t_range_list_2 (id primary key, info , crt_time ) for values from ('2016-10-01') to ('2016-11-01');
create table t_range_list_2_201612 partition of t_range_list_2 (id primary key, info , crt_time ) for values from ('2016-11-01') to ('2016-12-01');
create table t_range_list_3_201611 partition of t_range_list_3 (id primary key, info , crt_time ) for values from ('2016-10-01') to ('2016-11-01');
create table t_range_list_3_201612 partition of t_range_list_3 (id primary key, info , crt_time ) for values from ('2016-11-01') to ('2016-12-01');
绑定和解绑分区
1、绑定分区,指将已有的普通表,绑定到某个分区表下面,作为它的一个分区,绑定分区时,需要指定分区的区间。
ALTER TABLE [ IF EXISTS ] name
ATTACH PARTITION partition_name { FOR VALUES partition_bound_spec | DEFAULT }
2、解绑分区,指将某个分区从现有的分区表中摘除。
ALTER TABLE [ IF EXISTS ] name
DETACH PARTITION partition_name
例如某个分区已经不怎么查到了,那么可以解绑,并使用DDL把它MOVE一个冷存储的表空间里面。(如果是阿里云RDS PG,那么可以MOVE到OSS里面)
绑定
1、将measurement_y2016m07普通表绑定到measurement。指定它的分区定义取值范围。
Attach a partition to range partitioned table:
ALTER TABLE measurement
ATTACH PARTITION measurement_y2016m07 FOR VALUES FROM ('2016-07-01') TO ('2016-08-01');
2、将cities_ab普通表绑定到cities。指定它的分区定义取值范围。
Attach a partition to list partitioned table:
ALTER TABLE cities
ATTACH PARTITION cities_ab FOR VALUES IN ('a', 'b');
3、将cities_partdef普通表绑定到cities。并设置为默认分区。
Attach a default partition to a partitioned table:
ALTER TABLE cities
ATTACH PARTITION cities_partdef DEFAULT;
3、将orders_p4普通表绑定到orders。指定它的分区定义取值范围。
Attach a partition to hash partitioned table:
ALTER TABLE orders
ATTACH PARTITION orders_p4 FOR VALUES WITH (MODULUS 4, REMAINDER 3);
解绑
1、将分区measurement_y2015m12从measurement中摘除。
Detach a partition from partitioned table:
ALTER TABLE measurement
DETACH PARTITION measurement_y2015m12;
分区表 insert on conflict 例子
CREATE TABLE orders (
order_id bigint not null,
cust_id bigint not null,
status text
) PARTITION BY HASH (order_id);
CREATE TABLE orders_p1 PARTITION OF orders (order_id primary key)
FOR VALUES WITH (MODULUS 4, REMAINDER 0);
CREATE TABLE orders_p2 PARTITION OF orders (order_id primary key)
FOR VALUES WITH (MODULUS 4, REMAINDER 1);
CREATE TABLE orders_p3 PARTITION OF orders (order_id primary key)
FOR VALUES WITH (MODULUS 4, REMAINDER 2);
CREATE TABLE orders_p4 PARTITION OF orders (order_id primary key)
FOR VALUES WITH (MODULUS 4, REMAINDER 3);
postgres=# insert into orders values (1,1,'test') on conflict (order_id) do nothing;
ERROR: there is no unique or exclusion constraint matching the ON CONFLICT specification
语法层面已支持分区表的insert on conflict,接下来就等提交分区表UK索引的PATCH。
https://commitfest.postgresql.org/16/1452/
分区表全局索引
《PostgreSQL 11 preview - 新功能, 分区表全局索引管理》
分区的优势
1、如果被访问的数据集中在某个分区内,那么使用分区的话,分区和索引都比元表更小,更容易加载到内存,提升性能。(虽然是BLOCK管理的,但是使用分区的情况下数据分布更紧凑,所以依旧是节约内存的)
2、当需要对整个分区进行更新时,操作单个分区,比操作单个大表更快(即使使用索引,也没有直接全表扫描分区快)。
3、维护数据更方便,例如需要删除某些分区的数据,可以直接DROP 分区,不需要DELETE。
4、加载数据也更方便,例如可以并行加载到不同的分区,速度比写入单表快,因为单表的话索引、BLOCK在大量数据导入时会产生一定的LOCK冲突,没有并行加载到各个分区快。
5、不经常访问的数据分区,可以移动到廉价存储。易于管理数据。
参考
https://www.postgresql.org/docs/devel/static/sql-createtable.html
https://www.postgresql.org/docs/devel/static/ddl-partitioning.html
《PostgreSQL 11 preview - 新功能, 分区表全局索引管理》
《PostgreSQL 11 preview - 分区表智能并行JOIN (已类似MPP架构,性能暴增)》
《PostgreSQL 查询涉及分区表过多导致的性能问题 - 性能诊断与优化(大量BIND, spin lock, SLEEP进程)》
《PostgreSQL 商用版本EPAS(阿里云ppas) - 分区表性能优化 (堪比pg_pathman)》
《PostgreSQL 10 内置分区 vs pg_pathman perf profiling》