PostgreSQL 9.6 sharding based on FDW & pg_pathman

本文涉及的产品
云原生数据库 PolarDB 分布式版,标准版 2核8GB
RDS MySQL Serverless 基础系列,0.5-2RCU 50GB
云数据库 RDS MySQL,集群版 2核4GB 100GB
推荐场景:
搭建个人博客
简介:

PostgreSQL 9.6 sharding based on FDW & pg_pathman

作者

digoal

日期

2016-10-26

标签

PostgreSQL , 分区表 , pg_pathman , custom scan api , sharding , FDW


背景

可以阅读以下几篇文章先回顾一下FDW,基于FDW的shared以及高效的分区插件pg_pathman。

《PostgreSQL 9.6 单元化,sharding (based on postgres_fdw) - 内核层支持前传》

《PostgreSQL 9.6 sharding + 单元化 (based on postgres_fdw) 最佳实践 - 通用水平分库场景设计与实践》

《PostgreSQL 9.5+ 高效分区表实现 - pg_pathman》

由于pg_pathman支持foreign table,所以拿它来做sharding也是理所当然的事情,同时它支持的HASH partition不需要在QUERY中带入constraint对应的clause,也能走分区查询。

例子

依旧使用《PostgreSQL 9.6 sharding + 单元化 (based on postgres_fdw) 最佳实践 - 通用水平分库场景设计与实践》文章中的例子。

例子使用1个路由节点,4个数据节点。

1. 配置目标库的pg_hba.conf

使用md5认证

2. 创建用户

create role role0 nosuperuser login encrypted password 'pwd' ;
create role role1 nosuperuser login encrypted password 'pwd' ;
create role role2 nosuperuser login encrypted password 'pwd' ;
create role role3 nosuperuser login encrypted password 'pwd' ;
create role digoal nosuperuser login encrypted password 'pwd' ;

3. 创建数据库

create database db0 with template template0 ;
create database db1 with template template0 ;
create database db2 with template template0 ;
create database db3 with template template0 ;
create database mas1 with template template0 ;

4. 配置数据库权限

grant all on database db0 to role0 ;
grant all on database db1 to role1 ;
grant all on database db2 to role2 ;
grant all on database db3 to role3 ;
grant all on database mas1 to digoal ;

5. 创建schema, 建议与USER同名

\c db0 role0
create schema role0;
\c db1 role1
create schema role1;
\c db2 role2
create schema role2;
\c db3 role3
create schema role3;
\c mas1 digoal
create schema digoal;

6. 配置路由节点,需要用到postgres_fdw与pg_pathman插件

\c mas1 postgres
create extension postgres_fdw;
create extension pg_pathman;
grant usage on foreign data wrapper postgres_fdw to digoal;

7. ddl建表语句 , ddl-1.sql

create table userinfo(uid int8 primary key, info text, crt_time timestamp);

create table user_log(uid int8 not null, logintime timestamp, logouttime timestamp, client_addr inet, client_port int);

create table user_membership(uid1 int8 not null, uid2 int8, unique (uid1,uid2) );

create table user_membership_rev(uid2 int8 not null, uid1 int8, unique (uid2,uid1) );

8. 初始化数据节点,创建数据表,注意表名最好全局唯一,方便使用import foreign schema的方式导入。

\c db0 role0

ddl-1.sql

然后执行 :   
do language plpgsql $$
declare
  i int;
  mod int := 1;
  nod int := 0;
  sql text;
begin
  for i in (mod*nod)..(mod*(nod+1)-1) loop
      sql := 'create table userinfo_'||i||'(like userinfo including all)';
      execute sql;
      sql := 'create table user_log_'||i||'(like user_log including all)';
      execute sql;
      sql := 'create table user_membership_'||i||'(like user_membership including all)';
      execute sql;
      sql := 'create table user_membership_rev_'||i||'(like user_membership_rev including all)';
      execute sql;
  end loop;
end;
$$;
\c db1 role1

ddl-1.sql

然后执行 :   
do language plpgsql $$
declare
  i int;
  mod int := 1;
  nod int := 1;
  sql text;
begin
  for i in (mod*nod)..(mod*(nod+1)-1) loop
      sql := 'create table userinfo_'||i||'(like userinfo including all)';
      execute sql;
      sql := 'create table user_log_'||i||'(like user_log including all)';
      execute sql;
      sql := 'create table user_membership_'||i||'(like user_membership including all)';
      execute sql;
      sql := 'create table user_membership_rev_'||i||'(like user_membership_rev including all)';
      execute sql;
  end loop;
end;
$$;
\c db2 role2

ddl-1.sql

然后执行 :   
do language plpgsql $$
declare
  i int;
  mod int := 1;
  nod int := 2;
  sql text;
begin
  for i in (mod*nod)..(mod*(nod+1)-1) loop
      sql := 'create table userinfo_'||i||'(like userinfo including all)';
      execute sql;
      sql := 'create table user_log_'||i||'(like user_log including all)';
      execute sql;
      sql := 'create table user_membership_'||i||'(like user_membership including all)';
      execute sql;
      sql := 'create table user_membership_rev_'||i||'(like user_membership_rev including all)';
      execute sql;
  end loop;
end;
$$;
\c db3 role3

ddl-1.sql

然后执行 :   
do language plpgsql $$
declare
  i int;
  mod int := 1;
  nod int := 3;
  sql text;
begin
  for i in (mod*nod)..(mod*(nod+1)-1) loop
      sql := 'create table userinfo_'||i||'(like userinfo including all)';
      execute sql;
      sql := 'create table user_log_'||i||'(like user_log including all)';
      execute sql;
      sql := 'create table user_membership_'||i||'(like user_membership including all)';
      execute sql;
      sql := 'create table user_membership_rev_'||i||'(like user_membership_rev including all)';
      execute sql;
  end loop;
end;
$$;

9. 初始化master

\c mas1 digoal


-- 初始化foreign server
-- or user host (本例使用unix socket)
create server db0 foreign data wrapper postgres_fdw OPTIONS (host '/u02/digoal/pg_root5281', port '5281', dbname 'db0', use_remote_estimate 'false', fetch_size '5000000');
create server db1 foreign data wrapper postgres_fdw OPTIONS (host '/u02/digoal/pg_root5281', port '5281', dbname 'db1', use_remote_estimate 'false', fetch_size '5000000');
create server db2 foreign data wrapper postgres_fdw OPTIONS (host '/u02/digoal/pg_root5281', port '5281', dbname 'db2', use_remote_estimate 'false', fetch_size '5000000');
create server db3 foreign data wrapper postgres_fdw OPTIONS (host '/u02/digoal/pg_root5281', port '5281', dbname 'db3', use_remote_estimate 'false', fetch_size '5000000');
-- 设置use_remote_estimate 'false' 需收集外部表的统计信息, 但是可以节约explain的操作,对于高并发的小事务,建议使用FALSE


-- 创建user mapping
create user mapping FOR digoal SERVER db0 options (user 'role0', password 'pwd');
create user mapping FOR digoal SERVER db1 options (user 'role1', password 'pwd');
create user mapping FOR digoal SERVER db2 options (user 'role2', password 'pwd');
create user mapping FOR digoal SERVER db3 options (user 'role3', password 'pwd');


-- 导入FOREIGN TABLE,排除主表
import foreign schema role0 EXCEPT (userinfo,user_log,user_membership,mv_user_membership,user_membership_rev,mv_user_membership_rev) from server db0 into digoal;
import foreign schema role1 EXCEPT (userinfo,user_log,user_membership,mv_user_membership,user_membership_rev,mv_user_membership_rev) from server db1 into digoal;
import foreign schema role2 EXCEPT (userinfo,user_log,user_membership,mv_user_membership,user_membership_rev,mv_user_membership_rev) from server db2 into digoal;
import foreign schema role3 EXCEPT (userinfo,user_log,user_membership,mv_user_membership,user_membership_rev,mv_user_membership_rev) from server db3 into digoal;


-- 创建主表
create table userinfo(uid int8 primary key, info text, crt_time timestamp);
create table user_log(uid int8 not null, logintime timestamp, logouttime timestamp, client_addr inet, client_port int);
create table user_membership(uid1 int8 not null, uid2 int8, unique (uid1,uid2) );
create table user_membership_rev(uid2 int8 not null, uid1 int8, unique (uid2,uid1) );


-- 生成哈希分区
select create_hash_partitions('userinfo'::regclass, 'uid', 0, true) ;
select create_hash_partitions('user_log'::regclass, 'uid', 0, true) ;
select create_hash_partitions('user_membership'::regclass, 'uid1', 0, true) ;
select create_hash_partitions('user_membership_rev'::regclass, 'uid2', 0, true) ;


-- 设置foreign table 约束,目前pg_pathman不支持attach hash 分区表,所以我把步骤分解了,先构建约束
-- 注意约束名有固定的格式
do language plpgsql $$
declare
  i int;
  x int := 0;
  y int := 3;
  sql text;
begin
  for i in x..y loop
      sql := 'alter foreign table userinfo_'||i||' add constraint pathman_userinfo_'||i||'_1_check check(get_hash_part_idx(hashint8(uid), 4) = '||i||' )';
      execute sql;
      sql := 'alter foreign table user_log_'||i||' add constraint pathman_user_log_'||i||'_1_check check(get_hash_part_idx(hashint8(uid), 4) = '||i||' )';
      execute sql;
      sql := 'alter foreign table user_membership_'||i||' add constraint pathman_user_membership_'||i||'_1_check check(get_hash_part_idx(hashint8(uid1), 4) = '||i||' )';
      execute sql;
      sql := 'alter foreign table user_membership_rev_'||i||' add constraint pathman_user_membership_rev_'||i||'_1_check check(get_hash_part_idx(hashint8(uid2), 4) = '||i||' )';
      execute sql;
  end loop;
end;
$$;


-- 设置继承关系
do language plpgsql $$
declare
  i int;
  x int := 0;
  y int := 3;
  sql text;
begin
  for i in x..y loop
      sql := 'alter foreign table userinfo_'||i||' inherit userinfo';
      execute sql;
      sql := 'alter foreign table user_log_'||i||' inherit user_log';
      execute sql;
      sql := 'alter foreign table user_membership_'||i||' inherit user_membership';
      execute sql;
      sql := 'alter foreign table user_membership_rev_'||i||' inherit user_membership_rev';
      execute sql;
  end loop;
end;
$$;


-- 前面都准备好之后,可以设置pg_pathman的内存,刷新。
do language plpgsql $$
declare
  i int;
  x int := 0;
  y int := 3;
  tbls text[] := array['userinfo', 'user_log', 'user_membership', 'user_membership_rev'];
  tbl text;
  child text;
  sql text;
begin
  foreach tbl in array tbls loop 
    for i in x..y loop
      child := tbl||'_'||i;
      -- 将子表添加到pg_pathman hash memory中
      perform public.invoke_on_partition_created_callback(tbl::regclass, child::REGCLASS, 0);
    end loop;
    -- 刷新
    perform public.on_create_partitions(tbl);
  end loop;
end;
$$;

10. 插入压测

插入数据测试的函数

create or replace function f(int) returns void as $$
declare
begin 
  insert into userinfo values ($1, md5(random()::Text), now());
 exception when others then return;
end;
$$ language plpgsql strict;

压测

$ vi test1.sql
\set id random(1,50000000)
select f(:id);

$ pgbench -M simple -n -r -P 1 -f ./test1.sql -c 64 -j 64 -T 1200 -U digoal mas1
...
progress: 102.0 s, 42769.1 tps, lat 1.495 ms stddev 0.512
progress: 103.0 s, 42633.0 tps, lat 1.500 ms stddev 0.519
progress: 104.0 s, 42864.0 tps, lat 1.492 ms stddev 0.525
progress: 105.0 s, 42724.0 tps, lat 1.497 ms stddev 0.537
progress: 106.0 s, 42801.3 tps, lat 1.494 ms stddev 0.507
progress: 107.0 s, 42547.7 tps, lat 1.503 ms stddev 0.524
...
   PerfTop:   54084 irqs/sec  kernel:40.6%  exact:  0.0% [1000Hz cycles],  (all, 64 CPUs)
---------------------------------------------------------------------------------------------
             samples  pcnt function                      DSO
             _______ _____ _____________________________ ____________________________________

            18991.00  2.5% AllocSetAlloc                 /home/digoal/pgsql9.6/bin/postgres
            17321.00  2.3% _spin_lock                    [kernel.kallsyms]                   
            15946.00  2.1% GetSnapshotData               /home/digoal/pgsql9.6/bin/postgres
            15049.00  2.0% hash_search_with_hash_value   /home/digoal/pgsql9.6/bin/postgres
            12864.00  1.7% base_yyparse                  /home/digoal/pgsql9.6/bin/postgres
            10796.00  1.4% SearchCatCache                /home/digoal/pgsql9.6/bin/postgres
             9826.00  1.3% memcpy                        /lib64/libc-2.12.so                 
             8840.00  1.2% _int_malloc                   /lib64/libc-2.12.so                 
             8261.00  1.1% __strlen_sse42                /lib64/libc-2.12.so                 
             8245.00  1.1% schedule                      [kernel.kallsyms]                   
             7441.00  1.0% __GI_vfprintf                 /lib64/libc-2.12.so                 
             7348.00  1.0% AtEOXact_GUC                  /home/digoal/pgsql9.6/bin/postgres
             6679.00  0.9% __strcmp_sse42                /lib64/libc-2.12.so                 
             6531.00  0.9% mutex_spin_on_owner           [kernel.kallsyms]                   
             5899.00  0.8% reschedule_interrupt          [kernel.kallsyms]                   

直连测试

\c db0 role0

create or replace function f(int) returns void as $$
declare
begin 
  insert into userinfo values ($1, md5(random()::Text), now());
 exception when others then return;
end;
$$ language plpgsql strict;
$ pgbench -M prepared -n -r -P 1 -f ./test1.sql -c 64 -j 64 -T 1200 -U role0 db0

progress: 6.0 s, 149615.0 tps, lat 0.427 ms stddev 0.389
progress: 7.0 s, 149022.1 tps, lat 0.428 ms stddev 0.398
progress: 8.0 s, 148586.6 tps, lat 0.430 ms stddev 0.430

11. query测试

压测

$ vi test1.sql
\set id random(1,50000000)
select * from userinfo where uid=:id;

$ pgbench -M prepared -n -r -P 1 -f ./test.sql -c 64 -j 64 -T 1200 -U digoal mas1
...
progress: 158.0 s, 46046.8 tps, lat 1.388 ms stddev 0.960
progress: 159.0 s, 46201.2 tps, lat 1.384 ms stddev 0.945
progress: 160.0 s, 46176.9 tps, lat 1.385 ms stddev 0.957
progress: 161.0 s, 46143.0 tps, lat 1.386 ms stddev 0.943
progress: 162.0 s, 46316.0 tps, lat 1.381 ms stddev 0.955
progress: 163.0 s, 46067.1 tps, lat 1.388 ms stddev 0.954
...
   PerfTop:   46994 irqs/sec  kernel:40.7%  exact:  0.0% [1000Hz cycles],  (all, 64 CPUs)
-------------------------------------------------------------------------------------------------
             samples  pcnt function                      DSO
             _______ _____ _____________________________ ________________________________________

            17838.00  2.4% GetSnapshotData               /home/digoal/pgsql9.6/bin/postgres    
            17711.00  2.4% _spin_lock                    [kernel.kallsyms]                       
            17372.00  2.4% AllocSetAlloc                 /home/digoal/pgsql9.6/bin/postgres    
            15205.00  2.1% base_yyparse                  /home/digoal/pgsql9.6/bin/postgres    
            12210.00  1.7% SearchCatCache                /home/digoal/pgsql9.6/bin/postgres    
            11821.00  1.6% hash_search_with_hash_value   /home/digoal/pgsql9.6/bin/postgres    
            10729.00  1.5% __mutex_lock_slowpath         [kernel.kallsyms]                       
            10694.00  1.5% mutex_spin_on_owner           [kernel.kallsyms]                       
            10168.00  1.4% _int_malloc                   /lib64/libc-2.12.so                     
             8217.00  1.1% schedule                      [kernel.kallsyms]                       
             6977.00  1.0% __strlen_sse42                /lib64/libc-2.12.so                     
             6918.00  0.9% __strcmp_sse42                /lib64/libc-2.12.so                     
             6711.00  0.9% _spin_lock_irqsave            [kernel.kallsyms]                       
             6629.00  0.9% reschedule_interrupt          [kernel.kallsyms]                       
             6599.00  0.9% find_busiest_group            [kernel.kallsyms]                       
             5984.00  0.8% core_yylex                    /home/digoal/pgsql9.6/bin/postgres                                   

直连测试

$ vi test.sql
\set id random(1,50000000)
select * from userinfo_0 where uid=:id;

$ pgbench -M prepared -n -r -P 1 -f ./test.sql -c 64 -j 64 -T 1200 -U role0 db0
progress: 3.0 s, 1065981.5 tps, lat 0.059 ms stddev 0.008
progress: 4.0 s, 1084914.1 tps, lat 0.058 ms stddev 0.006
progress: 5.0 s, 1093083.4 tps, lat 0.057 ms stddev 0.006

小结

1. 基于fdw的shard,从功能上来讲还有哪些值得改进的点:

例如, 支持聚合下推(10.0已经支持)

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=f5d6bce63ceb3c59a964814bb0df5a0648e750e5

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=7012b132d07c2b4ea15b0b3cb1ea9f3278801d98

append的并行化,目前如果要扫描多个分区,是串行的,并不是并行扫描,非常影响效率。

2. 从性能方面来讲,还有哪些值得改进的点:

从profile的结果来看,目前的锁较重,有非常大的性能提升空间,需要花点时间看看代码。

当然如果当前的性能能满足你,或者你通过构建多个对等的master加上负载均衡,也能解决这个问题。

现在与直连单节点的性能差距还是较大的。

Count

相关实践学习
使用PolarDB和ECS搭建门户网站
本场景主要介绍基于PolarDB和ECS实现搭建门户网站。
阿里云数据库产品家族及特性
阿里云智能数据库产品团队一直致力于不断健全产品体系,提升产品性能,打磨产品功能,从而帮助客户实现更加极致的弹性能力、具备更强的扩展能力、并利用云设施进一步降低企业成本。以云原生+分布式为核心技术抓手,打造以自研的在线事务型(OLTP)数据库Polar DB和在线分析型(OLAP)数据库Analytic DB为代表的新一代企业级云原生数据库产品体系, 结合NoSQL数据库、数据库生态工具、云原生智能化数据库管控平台,为阿里巴巴经济体以及各个行业的企业客户和开发者提供从公共云到混合云再到私有云的完整解决方案,提供基于云基础设施进行数据从处理、到存储、再到计算与分析的一体化解决方案。本节课带你了解阿里云数据库产品家族及特性。
相关文章
|
SQL Oracle 关系型数据库
xDB Replication Server - PostgreSQL, Oracle, SQL Server, PPAS 全量、增量(redo log based, or trigger based)同步(支持single-master, mult-master同步, 支持DDL)
xDB Replication Server - PostgreSQL, Oracle, SQL Server, PPAS 全量、增量(redo log based, or trigger based)同步(支持single-master, mult-master同步, 支持DDL)
827 0
|
SQL Oracle 关系型数据库
xDB Replication Server - PostgreSQL, Oracle, SQL Server, PPAS 全量、增量(redo log based, or trigger based)同步(支持single-master, mult-master同步, 支持DDL)
标签 PostgreSQL , xDB replication server , sql server , oracle , ppas , 同步 , ddl , dml , 全量 , 增量 , log based , trigger based , smr(single-master replication) , mmr(multi-master replication) 背景 Post
536 0
|
关系型数据库 C++ PostgreSQL
|
弹性计算 关系型数据库 测试技术
PostgreSQL 11 1000亿 tpcb、1000W tpcc 性能测试 - on 阿里云ECS + ESSD (含quorum based 0丢失多副本配置与性能测试)
标签 PostgreSQL , pgbench , tpcb , tpcc , tpch 背景 https://help.aliyun.com/knowledge_detail/64950.html 阿里云ESSD提供了单盘32TB容量,100万IOPS,4GB/s读写吞吐的能力,单台ECS可以挂载16块ESSD盘,组成512 TB的大容量存储。
3067 0
|
关系型数据库 PostgreSQL
PostgreSQL sharding : citus 系列7 - topn 加速(count(*) group by order by count(*) desc limit x) (use 估值插件 topn)
标签 PostgreSQL , topn , topn.number_of_counters , count(*) group by order by count(*) desc limit x 背景 count(*) group by order by count(*) desc limit x 用来统计 topn。
1415 0
|
关系型数据库 PostgreSQL
PostgreSQL sharding : citus 系列6 - count(distinct xx) 加速 (use 估值插件 hll|hyperloglog)
标签 PostgreSQL , hll , hyperloglog , distinct , 加速 , citus.count_distinct_error_rate 背景 在分布式数据库中,计算count(distinct xxx),需要对distinct 的字段, 1、去重, 2、重分布去重后的数据,(这一步,如果distinct值特别多,那么就会比较耗时) 3、然后再去重, 4、最后count (xxx), 5、求所有节点的count SUM。
1803 0
|
SQL 关系型数据库 OLAP
PostgreSQL sharding : citus 系列5 - worker节点网络优化
标签 PostgreSQL , citus , pgbouncer , 网络优化 , worker节点 背景 citus 节点间的网络需求: 1、cn节点访问所有worker节点。oltp业务的访问较频繁。
1502 0
|
SQL 关系型数据库 数据库
PostgreSQL sharding : citus 系列4 - DDL 操作规范 (新增DB,TABLE,SCHEMA,UDF,OP,用户等)
标签 PostgreSQL , citus , 新增对象 , 新增数据库 , 新增用户 背景 citus是PG的一个插件,插件主要针对普通SQL(非UTILITY)加HOOK进行了一些ROUTE处理,同时使用UDF对表进行新建分区的操作。
1815 0
|
SQL 移动开发 关系型数据库
PostgreSQL sharding : citus 系列3 - 窗口函数调用限制 与 破解之法(套用gpdb执行树,分步执行)
标签 PostgreSQL , citus , 窗口函数 背景 窗口函数是分析场景常用的,目前(citus 7.5)仅支持两种场景使用window函数, 1、partition by 必须是分布键。
1488 0

相关产品

  • 云原生数据库 PolarDB