PostgreSQL 在路上的特性 - 远离触发器, 拥抱内置分区

本文涉及的产品
云数据库 RDS MySQL,集群系列 2核4GB
推荐场景:
搭建个人博客
云原生内存数据库 Tair,内存型 2GB
云数据库 Redis 版,标准版 2GB
推荐场景:
搭建游戏排行榜
简介: 之前分享过阿里云RDS PG支持分区表的文章https://yq.aliyun.com/articles/113今天要给大家带来另一个好消息,PostgreSQL 社区版本终于要有集成的分区表特性了,再也不用为写分区触发器烦恼了。(很多人认为PG分区表就像是一粒老鼠屎,坏了一锅汤,深有同感。社区终.

之前分享过阿里云RDS PG支持分区表的文章
https://yq.aliyun.com/articles/113
今天要给大家带来另一个好消息,
PostgreSQL 社区版本终于要有集成的分区表特性了,再也不用为写分区触发器烦恼了。
(很多人认为PG现有的分区表用法是"惨无人道"的(除了管理不方便,性能也是个问题),就像是一粒老鼠屎,坏了一锅汤。社区终于要把老鼠屎请出去了。)

如果你不care性能,可以看看我以前写的一个通用的分区表触发器函数,一个函数打天下
http://blog.163.com/digoal@126/blog/static/16387704020128772037884/

内置分区表的讨论和patch详见
https://www.postgresql.org/message-id/flat/55D3093C.5010800@lab.ntt.co.jp#55D3093C.5010800@lab.ntt.co.jp

https://wiki.postgresql.org/wiki/Table_partitioning
基本已成型,9.6有合进来的可能。

相比触发器的方法,内置分区直接使用插入更新删除接口处理,节省了parser和optimize的过程,比触发器转成SQL更高效。

src/backend/access/heap/heapam.c

 *              heap_insert             - insert tuple into a relation
 *              heap_delete             - delete a tuple from a relation
 *              heap_update             - replace a tuple in a relation with another tuple


语法

创建分区主表

-- create partitioned table and child partitions at once.
CREATE TABLE parent (...)
PARTITION BY [ RANGE | LIST ]( key ) [ opclass ]  -- 默认使用分区列的btree默认opclass, 也可以自定义
[ (
     PARTITION child
       {
           VALUES LESS THAN { ... | MAXVALUE } -- for RANGE
         | VALUES [ IN ]( { ... | DEFAULT } ) -- for LIST
       }
       [ WITH ( ... ) ] [ TABLESPACE tbs ]
     [, ...]
  ) ] ;

-- add a partition key to a table.
ALTER TABLE parent PARTITION BY  [ RANGE | LIST ]( key ) [ opclass ] [ (...) ] ;

添加分区子表

-- create a new partition on a partitioned table.
CREATE PARTITION child ON parent VALUES ... ;

-- add a table as a partition.
ALTER TABLE parent ATTACH PARTITION child VALUES ... ;

-- Remove a partition as a normal table.
ALTER TABLE parent DETACH PARTITION child ;

分区表描述

分区主表
postgres=# \d+ test
                          Unlogged partitioned table "public.test"
  Column  |            Type             | Modifiers | Storage  | Stats target | Description 
----------+-----------------------------+-----------+----------+--------------+-------------
 id       | integer                     |           | plain    |              | 
 info     | text                        |           | extended |              | 
 crt_time | timestamp without time zone |           | plain    |              | 
Partition Key: PARTITION BY RANGE (id)

分区子表
postgres=# \d+ test0
                               Unlogged table "public.test0"
  Column  |            Type             | Modifiers | Storage  | Stats target | Description 
----------+-----------------------------+-----------+----------+--------------+-------------
 id       | integer                     |           | plain    |              | 
 info     | text                        |           | extended |              | 
 crt_time | timestamp without time zone |           | plain    |              | 
Partition Of: test FOR VALUES START (0) END (1000)

分区表元数据

分区主表

postgres=# select * from pg_partitioned;
 partedrelid | partstrat | partnatts | partattrs | partclass | partexprs 
-------------+-----------+-----------+-----------+-----------+-----------
       16405 | r         |         1 | 1         | 1978      | 
(1 row)

postgres=# select * from pg_class where relkind ='P';
 relname | relnamespace | reltype | reloftype | relowner | relam | relfilenode | reltablespace | relpages | reltuples | relallvisible | reltoastrelid | relhasindex | relisshared | relpersistence | relkind | relnatts | relchecks | relhaso
ids | relhaspkey | relhasrules | relhastriggers | relhassubclass | relrowsecurity | relforcerowsecurity | relispopulated | relreplident | relfrozenxid | relminmxid | relacl | reloptions 
---------+--------------+---------+-----------+----------+-------+-------------+---------------+----------+-----------+---------------+---------------+-------------+-------------+----------------+---------+----------+-----------+--------
----+------------+-------------+----------------+----------------+----------------+---------------------+----------------+--------------+--------------+------------+--------+------------
 test    |         2200 |   16407 |         0 |       10 |     0 |       16405 |             0 |        0 |         0 |             0 |         16408 | f           | f           | u              | P       |        3 |         0 | f      
    | f          | f           | f              | t              | f              | f                   | t              | d            |            0 |          0 |        | 
(1 row)

分区子表
记录了每个子表的分区规则,是不是和pg_rewrite很像呢

postgres=# select * from pg_partition;
 partrelid |                                                                                                                                                                                    partbound                                    
                                                                                                                                                 
-----------+----------------------------------------------------------------------------------------------------------------------
     16411 | {PARTITIONRANGE :lowerinc true :lower ({CONST :consttype 23 :consttypmod -1 :constcollid 0 :constlen 4 :constbyval true :constisnull false :location 64 :constvalue 4 [ 0 0 0 0 0 0 0 0 ]}) :upperinc false :upper ({CONST :cons
ttype 23 :consttypmod -1 :constcollid 0 :constlen 4 :constbyval true :constisnull false :location 72 :constvalue 4 [ -24 3 0 0 0 0 0 0 ]})}
......
     16567 | {PARTITIONRANGE :lowerinc true :lower ({CONST :consttype 23 :consttypmod -1 :constcollid 0 :constlen 4 :constbyval true :constisnull false :location 65 :constvalue 4 [ -56 50 0 0 0 0 0 0 ]}) :upperinc false :upper ({CONST :c
onsttype 23 :consttypmod -1 :constcollid 0 :constlen 4 :constbyval true :constisnull false :location 77 :constvalue 4 [ -80 54 0 0 0 0 0 0 ]})}
(14 rows)

通过记录找出对应范围分区的代码,可以看到它这里用了二分查找来提高范围分区表的分区定位效率

+/*
+ * range_partition_for_tuple
+ *             Search the range partition for a range key ('values')
+ *
+ * Returns -1 if none found.
+ */
+static int
+range_partition_for_tuple(PartitionKey key, PartitionDesc pdesc, Datum *values)
+{
+       Assert(pdesc->nparts > 0);
+
+       return range_partition_bsearch(key, pdesc, values);
+}
+
+/*
+ * range_partition_bsearch
+ *             Workhorse of range_partition_for_tuple
+ */
+static int
+range_partition_bsearch(PartitionKey key, PartitionDesc pdesc,
+                                               Datum *values)
+{
+       int             low, high;
+
+       /* Good ol' bsearch */
+       low = 0;
+       high = pdesc->nparts - 1;
+       while (low <= high)
+       {
+               int             idx = (low + high) / 2;
+
+               if (pdesc->rangeuppers[idx]->infinite)
+               {
+                       if (rightof(key, values, pdesc->rangelowers[idx]))
+                               return idx;
+
+                       break;
+               }
+               else if (leftof(key, values, pdesc->rangeuppers[idx]))
+               {
+                       if (pdesc->rangelowers[idx]->infinite)
+                               return idx;
+
+                       if (rightof(key, values, pdesc->rangelowers[idx]))
+                               return idx;
+
+                       high = idx - 1;
+                       continue;
+               }
+
+               low = idx + 1;
+       }
+
+       return -1;
+}

查找list 分区

+/*
+ * list_partition_for_tuple
+ *             Find the list partition for a tuple
+ *
+ * Returns -1 if none found.
+ */
+static int
+list_partition_for_tuple(PartitionKey key, PartitionDesc pdesc,
+                                                Datum value, bool isnull)
+{
+       int                     i;
+
+       Assert(pdesc->nparts > 0);
+
+       for (i = 0; i < pdesc->nparts; i++)
+       {
+               int             j;
+
+               if (isnull)
+               {
+                       if (pdesc->lists[i]->contains_null)
+                               return i;
+
+                       continue;
+               }
+
+               for (j = 0; j < pdesc->lists[i]->nvalues; j++)
+               {
+                       int32   cmpval;
+
+                       cmpval = DatumGetInt32(FunctionCall2Coll(&key->partsupfunc[0],
+                                                                  key->tcinfo->typcoll[0],
+                                                                  pdesc->lists[i]->values[j],
+                                                                  value));
+                       if (!cmpval)
+                               return i;
+               }
+       }
+
+       return -1;
+}

插入直接使用的是heap_insert接口,而不是rewrite。

+                               heap_insert(resultRelInfo->ri_RelationDesc,
+                                                       tuple, mycid, hi_options, bistate);


buildin partition vs trigger based partition

(ps: 触发器的分区查找算法有优化的空间,这里没有使用二分查找。因此最后一个分区的性能落差不至于这么大。)
COPY 500W记录的性能对比。
_

测试详情

使用PostgreSQL 9.6 beta1的源码,用最近的patch.

wget https://www.postgresql.org/message-id/attachment/44614/0001-Add-syntax-to-specify-partition-key-v6.patch
wget https://www.postgresql.org/message-id/attachment/44615/0002-Add-a-IGNORE-dependency-type-v6.patch
wget https://www.postgresql.org/message-id/attachment/44616/0003-Infrastructure-for-creation-of-partitioned-tables-v6.patch
wget https://www.postgresql.org/message-id/attachment/44617/0004-Add-syntax-to-create-partitions-v6.patch
wget https://www.postgresql.org/message-id/attachment/44618/0005-Infrastructure-for-partition-metadata-storage-and-ma-v6.patch
wget https://www.postgresql.org/message-id/attachment/44619/0006-Introduce-tuple-routing-for-partitioned-tables-v6.patch

wget https://ftp.postgresql.org/pub/source/v9.6beta1/postgresql-9.6beta1.tar.bz2
tar -jxvf postgresql-9.6beta1.tar.bz2
cd postgresql-9.6beta1/
patch -p1 < ../0001-Add-syntax-to-specify-partition-key-v6.patch 
patch -p1 < ../0002-Add-a-IGNORE-dependency-type-v6.patch 
patch -p1 < ../0003-Infrastructure-for-creation-of-partitioned-tables-v6.patch
patch -p1 < ../0004-Add-syntax-to-create-partitions-v6.patch 
patch -p1 < ../0005-Infrastructure-for-partition-metadata-storage-and-ma-v6.patch 
patch -p1 < ../0006-Introduce-tuple-routing-for-partitioned-tables-v6.patch 

安装和测试

./configure --prefix=/home/digoal/pgsql9.6
gmake world -j 32
gmake install-world -j 32

vi ~/env9.6.sh

export PS1="$USER@`/bin/hostname -s`-> "
export PGPORT=1922
export PGDATA=/home/digoal/pgdata9.6
export LANG=en_US.utf8
export PGHOME=/home/digoal/pgsql9.6
export LD_LIBRARY_PATH=$PGHOME/lib:/lib64:/usr/lib64:/usr/local/lib64:/lib:/usr/lib:/usr/local/lib:$LD_LIBRARY_PATH
export DATE=`date +"%Y%m%d%H%M"`
export PATH=$PGHOME/bin:$PATH:.
export MANPATH=$PGHOME/share/man:$MANPATH
export PGHOST=$PGDATA
export PGUSER=postgres
export PGDATABASE=postgres
alias rm='rm -i'
alias ll='ls -lh'
unalias vi

. ~/env9.6.sh
initdb -D $PGDATA -E SQL_ASCII -U postgres --locale=C
vi $PGDATA/postgresql.conf
listen_addresses = '0.0.0.0'            # what IP address(es) to listen on;
port = 1922                             # (change requires restart)
max_connections = 100                   # (change requires restart)
unix_socket_directories = '.'   # comma-separated list of directories
unix_socket_permissions = 0700          # begin with 0 to use octal notation
shared_buffers = 128MB                  # min 128kB
dynamic_shared_memory_type = posix      # the default is the first option
log_destination = 'csvlog'              # Valid values are combinations of
logging_collector = on          # Enable capturing of stderr and csvlog
log_directory = 'pg_log'                # directory where log files are written,
log_truncate_on_rotation = on           # If on, an existing log file with the
log_timezone = 'PRC'
datestyle = 'iso, mdy'
timezone = 'PRC'
lc_messages = 'C'                       # locale for system error message
lc_monetary = 'C'                       # locale for monetary formatting
lc_numeric = 'C'                        # locale for number formatting
lc_time = 'C'                           # locale for time formatting
default_text_search_config = 'pg_catalog.english'

pg_ctl start

创建分区表    
psql
CREATE unlogged TABLE test(id int, info text, crt_time timestamp) PARTITION BY RANGE (id);
create unlogged table test0 partition of test for values start (0) end (1000);
create unlogged table test1 partition of test for values start (1000) end (2000);
create unlogged table test2 partition of test for values start (2000) end (3000);
create unlogged table test3 partition of test for values start (3000) end (4000);
create unlogged table test4 partition of test for values start (4000) end (5000);
create unlogged table test5 partition of test for values start (5000) end (6000);
create unlogged table test6 partition of test for values start (6000) end (7000);
create unlogged table test7 partition of test for values start (7000) end (8000);
create unlogged table test8 partition of test for values start (8000) end (9000);
create unlogged table test9 partition of test for values start (9000) end (10000);
create unlogged table test10 partition of test for values start (10000) end (11000);
create unlogged table test11 partition of test for values start (11000) end (12000);
create unlogged table test12 partition of test for values start (12000) end (13000);
create unlogged table test13 partition of test for values start (13000) end (14000);

使用触发器的方法创建分区表  
一坨坨的代码来了  
create unlogged table t(id int, info text, crt_time timestamp);
create unlogged table t0 (like t including all) inherits(t);;
create unlogged table t1 (like t including all) inherits(t);
create unlogged table t2 (like t including all) inherits(t);
create unlogged table t3 (like t including all) inherits(t);
create unlogged table t4 (like t including all) inherits(t);
create unlogged table t5 (like t including all) inherits(t);
create unlogged table t6 (like t including all) inherits(t);
create unlogged table t7 (like t including all) inherits(t);
create unlogged table t8 (like t including all) inherits(t);
create unlogged table t9 (like t including all) inherits(t);
create unlogged table t10 (like t including all) inherits(t);
create unlogged table t11 (like t including all) inherits(t);
create unlogged table t12 (like t including all) inherits(t);
create unlogged table t13 (like t including all) inherits(t);
alter table t0 add constraint ck1 check (id>=0 and id<1000);
alter table t1 add constraint ck1 check (id>=1000 and id<2000);
alter table t2 add constraint ck1 check (id>=2000 and id<3000);
alter table t3 add constraint ck1 check (id>=3000 and id<4000);
alter table t4 add constraint ck1 check (id>=4000 and id<5000);
alter table t5 add constraint ck1 check (id>=5000 and id<6000);
alter table t6 add constraint ck1 check (id>=6000 and id<7000);
alter table t7 add constraint ck1 check (id>=7000 and id<8000);
alter table t8 add constraint ck1 check (id>=8000 and id<9000);
alter table t9 add constraint ck1 check (id>=9000 and id<10000);
alter table t10 add constraint ck1 check (id>=10000 and id<11000);
alter table t11 add constraint ck1 check (id>=11000 and id<12000);
alter table t12 add constraint ck1 check (id>=12000 and id<13000);
alter table t13 add constraint ck1 check (id>=13000 and id<14000);

postgres=# create or replace function ins() returns trigger as 
$$

declare
begin
  if 0 <= new.id and new.id < 1000 then insert into t0 values (new.*);
  elsif 1000 <= new.id and new.id < 2000 then insert into t1 values (new.*);
  elsif 2000 <= new.id and new.id < 3000 then insert into t2 values (new.*);
  elsif 3000 <= new.id and new.id < 4000 then insert into t3 values (new.*);
  elsif 4000 <= new.id and new.id < 5000 then insert into t4 values (new.*);
  elsif 5000 <= new.id and new.id < 6000 then insert into t5 values (new.*);
  elsif 6000 <= new.id and new.id < 7000 then insert into t6 values (new.*);
  elsif 7000 <= new.id and new.id < 8000 then insert into t7 values (new.*);
  elsif 8000 <= new.id and new.id < 9000 then insert into t8 values (new.*);
  elsif 9000 <= new.id and new.id < 10000 then insert into t9 values (new.*);
  elsif 10000 <= new.id and new.id < 11000 then insert into t10 values (new.*);
  elsif 11000 <= new.id and new.id < 12000 then insert into t11 values (new.*);
  elsif 12000 <= new.id and new.id < 13000 then insert into t12 values (new.*);
  elsif 13000 <= new.id and new.id < 14000 then insert into t13 values (new.*);
  else raise 'partition key value overflow';
  end if;
  return null;
end;

$$
 language plpgsql strict;

postgres=# create trigger tg1 before insert on t for each row execute procedure ins();

测试传统的方法和内置分区表的方法进行测试  
cat t.sql
insert into t values (0,'t',now());
insert into t values (1000,'t',now());
insert into t values (2000,'t',now());
insert into t values (3000,'t',now());
insert into t values (4000,'t',now());
insert into t values (5000,'t',now());
insert into t values (6000,'t',now());
insert into t values (7000,'t',now());
insert into t values (8000,'t',now());
insert into t values (9000,'t',now());
insert into t values (10000,'t',now());
insert into t values (11000,'t',now());
insert into t values (12000,'t',now());
insert into t values (13000,'t',now());

cat test.sql
insert into test values (0,'test',now());
insert into test values (1000,'test',now());
insert into test values (2000,'test',now());
insert into test values (3000,'test',now());
insert into test values (4000,'test',now());
insert into test values (5000,'test',now());
insert into test values (6000,'test',now());
insert into test values (7000,'test',now());
insert into test values (8000,'test',now());
insert into test values (9000,'test',now());
insert into test values (10000,'test',now());
insert into test values (11000,'test',now());
insert into test values (12000,'test',now());
insert into test values (13000,'test',now());

pgbench -M prepared -n -r -P 1 -f ./test.sql -c 16 -j 16 -T 120
tps = 5619.075687 (including connections establishing)
tps = 5619.473688 (excluding connections establishing)

pgbench -M prepared -n -r -P 1 -f ./t.sql -c 16 -j 16 -T 120
tps = 3781.152246 (including connections establishing)
tps = 3781.456820 (excluding connections establishing)

cat t.sql
insert into t values (13000,'t',now());

cat test.sql
insert into test values (13000,'test',now());

pgbench -M prepared -n -r -P 1 -f ./test.sql -c 16 -j 16 -T 120
tps = 75777.169561 (including connections establishing)
tps = 75781.869815 (excluding connections establishing)

pgbench -M prepared -n -r -P 1 -f ./t.sql -c 16 -j 16 -T 120
tps = 50596.853557 (including connections establishing)
tps = 50599.913130 (excluding connections establishing)

以上测试,软中断较多,并没有体现PG分区表代码层的优势。

使用copy 测试
效果立竿见影

date;psql -c "copy (select 0,'test',now() from generate_series(1,5000000)) to stdout"|psql -c "copy test from stdin";date
Sat Jun 11 19:12:50 CST 2016
COPY 5000000
Sat Jun 11 19:13:02 CST 2016

date;psql -c "copy (select 0,'test',now() from generate_series(1,5000000)) to stdout"|psql -c "copy t from stdin";date
Sat Jun 11 19:13:05 CST 2016
COPY 0
Sat Jun 11 19:14:35 CST 2016

date;psql -c "copy (select 13000,'test',now() from generate_series(1,5000000)) to stdout"|psql -c "copy test from stdin";date
Sat Jun 11 19:05:08 CST 2016
COPY 5000000
Sat Jun 11 19:05:22 CST 2016
date;psql -c "copy (select 13000,'test',now() from generate_series(1,5000000)) to stdout"|psql -c "copy t from stdin";date
Sat Jun 11 19:05:26 CST 2016
COPY 0
Sat Jun 11 19:07:55 CST 2016
相关实践学习
使用PolarDB和ECS搭建门户网站
本场景主要介绍基于PolarDB和ECS实现搭建门户网站。
阿里云数据库产品家族及特性
阿里云智能数据库产品团队一直致力于不断健全产品体系,提升产品性能,打磨产品功能,从而帮助客户实现更加极致的弹性能力、具备更强的扩展能力、并利用云设施进一步降低企业成本。以云原生+分布式为核心技术抓手,打造以自研的在线事务型(OLTP)数据库Polar DB和在线分析型(OLAP)数据库Analytic DB为代表的新一代企业级云原生数据库产品体系, 结合NoSQL数据库、数据库生态工具、云原生智能化数据库管控平台,为阿里巴巴经济体以及各个行业的企业客户和开发者提供从公共云到混合云再到私有云的完整解决方案,提供基于云基础设施进行数据从处理、到存储、再到计算与分析的一体化解决方案。本节课带你了解阿里云数据库产品家族及特性。
目录
相关文章
|
11月前
|
存储 关系型数据库 数据库
深入了解 PostgreSQL:功能、特性和部署
PostgreSQL,通常简称为Postgres,是一款强大且开源的关系型数据库管理系统(RDBMS),它在数据存储和处理方面提供了广泛的功能和灵活性。本文将详细介绍 PostgreSQL 的功能、特性以及如何部署和使用它。
566 1
深入了解 PostgreSQL:功能、特性和部署
|
29天前
|
监控 物联网 关系型数据库
使用PostgreSQL触发器解决物联网设备状态同步问题
在物联网监控系统中,确保设备状态(如在线与离线)的实时性和准确性至关重要。当设备状态因外部因素改变时,需迅速反映到系统内部。因设备状态数据分布在不同表中,直接通过应用同步可能引入复杂性和错误。采用PostgreSQL触发器自动同步状态变化是一种高效方法。首先定义触发函数,在设备状态改变时更新管理模块表;然后创建触发器,在状态字段更新后执行此函数。此外,还需进行充分测试、监控性能并实施优化,以及在触发函数中加入错误处理和日志记录功能。这种方法不仅提高自动化程度,增强数据一致性与实时性,还需注意其对性能的影响并采取优化措施。
|
11月前
|
关系型数据库 数据管理 Go
《PostgreSQL数据分区:原理与实战》
《PostgreSQL数据分区:原理与实战》
173 0
|
关系型数据库 大数据 PostgreSQL
PostgreSQL16-新特性-并行聚合
PostgreSQL16-新特性-并行聚合
126 0
|
存储 关系型数据库 数据库
探索PostgreSQL 14新特性--SEARCH和CYCLE
探索PostgreSQL 14新特性--SEARCH和CYCLE
75 0
|
缓存 监控 关系型数据库
[译]PostgreSQL16-新特性-新增IO统计视图:pg_stat_io
[译]PostgreSQL16-新特性-新增IO统计视图:pg_stat_io
206 0
|
存储 缓存 关系型数据库
PostgreSQL 14新特性--减少索引膨胀
PostgreSQL 14新特性--减少索引膨胀
445 0
|
存储 SQL Oracle
AnalyticDB PostgreSQL 7.0 支持存储过程(CREATE PROCEDURE)特性
AnalyticDB PostgreSQL 7.0 新增了存储过程功能的支持,让用户在使用ADB PG时能够更方便高效地开发业务,并能够更好地兼容Oracle等传统数仓的业务。
467 1
AnalyticDB PostgreSQL 7.0 支持存储过程(CREATE PROCEDURE)特性
|
SQL 存储 关系型数据库
9 PostgreSQL 点对点多主表级复制-触发器篇|学习笔记
快速学习9 PostgreSQL 点对点多主表级复制-触发器篇
390 0
9 PostgreSQL 点对点多主表级复制-触发器篇|学习笔记
|
SQL 消息中间件 算法
14 PostgreSQL 表级复制-Londiste3哈希数据分区复制|学习笔记
快速学习14 PostgreSQL 表级复制-Londiste3哈希数据分区复制
153 0
14 PostgreSQL 表级复制-Londiste3哈希数据分区复制|学习笔记

相关产品

  • 云原生数据库 PolarDB