PostgreSQL 9.6 sharding based on FDW & pg_pathman

简介:
+关注继续查看

PostgreSQL 9.6 sharding based on FDW & pg_pathman

作者

digoal

日期

2016-10-26

标签

PostgreSQL , 分区表 , pg_pathman , custom scan api , sharding , FDW


背景

可以阅读以下几篇文章先回顾一下FDW,基于FDW的shared以及高效的分区插件pg_pathman。

《PostgreSQL 9.6 单元化,sharding (based on postgres_fdw) - 内核层支持前传》

《PostgreSQL 9.6 sharding + 单元化 (based on postgres_fdw) 最佳实践 - 通用水平分库场景设计与实践》

《PostgreSQL 9.5+ 高效分区表实现 - pg_pathman》

由于pg_pathman支持foreign table,所以拿它来做sharding也是理所当然的事情,同时它支持的HASH partition不需要在QUERY中带入constraint对应的clause,也能走分区查询。

例子

依旧使用《PostgreSQL 9.6 sharding + 单元化 (based on postgres_fdw) 最佳实践 - 通用水平分库场景设计与实践》文章中的例子。

例子使用1个路由节点,4个数据节点。

1. 配置目标库的pg_hba.conf

使用md5认证

2. 创建用户

create role role0 nosuperuser login encrypted password 'pwd' ;
create role role1 nosuperuser login encrypted password 'pwd' ;
create role role2 nosuperuser login encrypted password 'pwd' ;
create role role3 nosuperuser login encrypted password 'pwd' ;
create role digoal nosuperuser login encrypted password 'pwd' ;

3. 创建数据库

create database db0 with template template0 ;
create database db1 with template template0 ;
create database db2 with template template0 ;
create database db3 with template template0 ;
create database mas1 with template template0 ;

4. 配置数据库权限

grant all on database db0 to role0 ;
grant all on database db1 to role1 ;
grant all on database db2 to role2 ;
grant all on database db3 to role3 ;
grant all on database mas1 to digoal ;

5. 创建schema, 建议与USER同名

\c db0 role0
create schema role0;
\c db1 role1
create schema role1;
\c db2 role2
create schema role2;
\c db3 role3
create schema role3;
\c mas1 digoal
create schema digoal;

6. 配置路由节点,需要用到postgres_fdw与pg_pathman插件

\c mas1 postgres
create extension postgres_fdw;
create extension pg_pathman;
grant usage on foreign data wrapper postgres_fdw to digoal;

7. ddl建表语句 , ddl-1.sql

create table userinfo(uid int8 primary key, info text, crt_time timestamp);

create table user_log(uid int8 not null, logintime timestamp, logouttime timestamp, client_addr inet, client_port int);

create table user_membership(uid1 int8 not null, uid2 int8, unique (uid1,uid2) );

create table user_membership_rev(uid2 int8 not null, uid1 int8, unique (uid2,uid1) );

8. 初始化数据节点,创建数据表,注意表名最好全局唯一,方便使用import foreign schema的方式导入。

\c db0 role0

ddl-1.sql

然后执行 :   
do language plpgsql $$
declare
  i int;
  mod int := 1;
  nod int := 0;
  sql text;
begin
  for i in (mod*nod)..(mod*(nod+1)-1) loop
      sql := 'create table userinfo_'||i||'(like userinfo including all)';
      execute sql;
      sql := 'create table user_log_'||i||'(like user_log including all)';
      execute sql;
      sql := 'create table user_membership_'||i||'(like user_membership including all)';
      execute sql;
      sql := 'create table user_membership_rev_'||i||'(like user_membership_rev including all)';
      execute sql;
  end loop;
end;
$$;
\c db1 role1

ddl-1.sql

然后执行 :   
do language plpgsql $$
declare
  i int;
  mod int := 1;
  nod int := 1;
  sql text;
begin
  for i in (mod*nod)..(mod*(nod+1)-1) loop
      sql := 'create table userinfo_'||i||'(like userinfo including all)';
      execute sql;
      sql := 'create table user_log_'||i||'(like user_log including all)';
      execute sql;
      sql := 'create table user_membership_'||i||'(like user_membership including all)';
      execute sql;
      sql := 'create table user_membership_rev_'||i||'(like user_membership_rev including all)';
      execute sql;
  end loop;
end;
$$;
\c db2 role2

ddl-1.sql

然后执行 :   
do language plpgsql $$
declare
  i int;
  mod int := 1;
  nod int := 2;
  sql text;
begin
  for i in (mod*nod)..(mod*(nod+1)-1) loop
      sql := 'create table userinfo_'||i||'(like userinfo including all)';
      execute sql;
      sql := 'create table user_log_'||i||'(like user_log including all)';
      execute sql;
      sql := 'create table user_membership_'||i||'(like user_membership including all)';
      execute sql;
      sql := 'create table user_membership_rev_'||i||'(like user_membership_rev including all)';
      execute sql;
  end loop;
end;
$$;
\c db3 role3

ddl-1.sql

然后执行 :   
do language plpgsql $$
declare
  i int;
  mod int := 1;
  nod int := 3;
  sql text;
begin
  for i in (mod*nod)..(mod*(nod+1)-1) loop
      sql := 'create table userinfo_'||i||'(like userinfo including all)';
      execute sql;
      sql := 'create table user_log_'||i||'(like user_log including all)';
      execute sql;
      sql := 'create table user_membership_'||i||'(like user_membership including all)';
      execute sql;
      sql := 'create table user_membership_rev_'||i||'(like user_membership_rev including all)';
      execute sql;
  end loop;
end;
$$;

9. 初始化master

\c mas1 digoal


-- 初始化foreign server
-- or user host (本例使用unix socket)
create server db0 foreign data wrapper postgres_fdw OPTIONS (host '/u02/digoal/pg_root5281', port '5281', dbname 'db0', use_remote_estimate 'false', fetch_size '5000000');
create server db1 foreign data wrapper postgres_fdw OPTIONS (host '/u02/digoal/pg_root5281', port '5281', dbname 'db1', use_remote_estimate 'false', fetch_size '5000000');
create server db2 foreign data wrapper postgres_fdw OPTIONS (host '/u02/digoal/pg_root5281', port '5281', dbname 'db2', use_remote_estimate 'false', fetch_size '5000000');
create server db3 foreign data wrapper postgres_fdw OPTIONS (host '/u02/digoal/pg_root5281', port '5281', dbname 'db3', use_remote_estimate 'false', fetch_size '5000000');
-- 设置use_remote_estimate 'false' 需收集外部表的统计信息, 但是可以节约explain的操作,对于高并发的小事务,建议使用FALSE


-- 创建user mapping
create user mapping FOR digoal SERVER db0 options (user 'role0', password 'pwd');
create user mapping FOR digoal SERVER db1 options (user 'role1', password 'pwd');
create user mapping FOR digoal SERVER db2 options (user 'role2', password 'pwd');
create user mapping FOR digoal SERVER db3 options (user 'role3', password 'pwd');


-- 导入FOREIGN TABLE,排除主表
import foreign schema role0 EXCEPT (userinfo,user_log,user_membership,mv_user_membership,user_membership_rev,mv_user_membership_rev) from server db0 into digoal;
import foreign schema role1 EXCEPT (userinfo,user_log,user_membership,mv_user_membership,user_membership_rev,mv_user_membership_rev) from server db1 into digoal;
import foreign schema role2 EXCEPT (userinfo,user_log,user_membership,mv_user_membership,user_membership_rev,mv_user_membership_rev) from server db2 into digoal;
import foreign schema role3 EXCEPT (userinfo,user_log,user_membership,mv_user_membership,user_membership_rev,mv_user_membership_rev) from server db3 into digoal;


-- 创建主表
create table userinfo(uid int8 primary key, info text, crt_time timestamp);
create table user_log(uid int8 not null, logintime timestamp, logouttime timestamp, client_addr inet, client_port int);
create table user_membership(uid1 int8 not null, uid2 int8, unique (uid1,uid2) );
create table user_membership_rev(uid2 int8 not null, uid1 int8, unique (uid2,uid1) );


-- 生成哈希分区
select create_hash_partitions('userinfo'::regclass, 'uid', 0, true) ;
select create_hash_partitions('user_log'::regclass, 'uid', 0, true) ;
select create_hash_partitions('user_membership'::regclass, 'uid1', 0, true) ;
select create_hash_partitions('user_membership_rev'::regclass, 'uid2', 0, true) ;


-- 设置foreign table 约束,目前pg_pathman不支持attach hash 分区表,所以我把步骤分解了,先构建约束
-- 注意约束名有固定的格式
do language plpgsql $$
declare
  i int;
  x int := 0;
  y int := 3;
  sql text;
begin
  for i in x..y loop
      sql := 'alter foreign table userinfo_'||i||' add constraint pathman_userinfo_'||i||'_1_check check(get_hash_part_idx(hashint8(uid), 4) = '||i||' )';
      execute sql;
      sql := 'alter foreign table user_log_'||i||' add constraint pathman_user_log_'||i||'_1_check check(get_hash_part_idx(hashint8(uid), 4) = '||i||' )';
      execute sql;
      sql := 'alter foreign table user_membership_'||i||' add constraint pathman_user_membership_'||i||'_1_check check(get_hash_part_idx(hashint8(uid1), 4) = '||i||' )';
      execute sql;
      sql := 'alter foreign table user_membership_rev_'||i||' add constraint pathman_user_membership_rev_'||i||'_1_check check(get_hash_part_idx(hashint8(uid2), 4) = '||i||' )';
      execute sql;
  end loop;
end;
$$;


-- 设置继承关系
do language plpgsql $$
declare
  i int;
  x int := 0;
  y int := 3;
  sql text;
begin
  for i in x..y loop
      sql := 'alter foreign table userinfo_'||i||' inherit userinfo';
      execute sql;
      sql := 'alter foreign table user_log_'||i||' inherit user_log';
      execute sql;
      sql := 'alter foreign table user_membership_'||i||' inherit user_membership';
      execute sql;
      sql := 'alter foreign table user_membership_rev_'||i||' inherit user_membership_rev';
      execute sql;
  end loop;
end;
$$;


-- 前面都准备好之后,可以设置pg_pathman的内存,刷新。
do language plpgsql $$
declare
  i int;
  x int := 0;
  y int := 3;
  tbls text[] := array['userinfo', 'user_log', 'user_membership', 'user_membership_rev'];
  tbl text;
  child text;
  sql text;
begin
  foreach tbl in array tbls loop 
    for i in x..y loop
      child := tbl||'_'||i;
      -- 将子表添加到pg_pathman hash memory中
      perform public.invoke_on_partition_created_callback(tbl::regclass, child::REGCLASS, 0);
    end loop;
    -- 刷新
    perform public.on_create_partitions(tbl);
  end loop;
end;
$$;

10. 插入压测

插入数据测试的函数

create or replace function f(int) returns void as $$
declare
begin 
  insert into userinfo values ($1, md5(random()::Text), now());
 exception when others then return;
end;
$$ language plpgsql strict;

压测

$ vi test1.sql
\set id random(1,50000000)
select f(:id);

$ pgbench -M simple -n -r -P 1 -f ./test1.sql -c 64 -j 64 -T 1200 -U digoal mas1
...
progress: 102.0 s, 42769.1 tps, lat 1.495 ms stddev 0.512
progress: 103.0 s, 42633.0 tps, lat 1.500 ms stddev 0.519
progress: 104.0 s, 42864.0 tps, lat 1.492 ms stddev 0.525
progress: 105.0 s, 42724.0 tps, lat 1.497 ms stddev 0.537
progress: 106.0 s, 42801.3 tps, lat 1.494 ms stddev 0.507
progress: 107.0 s, 42547.7 tps, lat 1.503 ms stddev 0.524
...
   PerfTop:   54084 irqs/sec  kernel:40.6%  exact:  0.0% [1000Hz cycles],  (all, 64 CPUs)
---------------------------------------------------------------------------------------------
             samples  pcnt function                      DSO
             _______ _____ _____________________________ ____________________________________

            18991.00  2.5% AllocSetAlloc                 /home/digoal/pgsql9.6/bin/postgres
            17321.00  2.3% _spin_lock                    [kernel.kallsyms]                   
            15946.00  2.1% GetSnapshotData               /home/digoal/pgsql9.6/bin/postgres
            15049.00  2.0% hash_search_with_hash_value   /home/digoal/pgsql9.6/bin/postgres
            12864.00  1.7% base_yyparse                  /home/digoal/pgsql9.6/bin/postgres
            10796.00  1.4% SearchCatCache                /home/digoal/pgsql9.6/bin/postgres
             9826.00  1.3% memcpy                        /lib64/libc-2.12.so                 
             8840.00  1.2% _int_malloc                   /lib64/libc-2.12.so                 
             8261.00  1.1% __strlen_sse42                /lib64/libc-2.12.so                 
             8245.00  1.1% schedule                      [kernel.kallsyms]                   
             7441.00  1.0% __GI_vfprintf                 /lib64/libc-2.12.so                 
             7348.00  1.0% AtEOXact_GUC                  /home/digoal/pgsql9.6/bin/postgres
             6679.00  0.9% __strcmp_sse42                /lib64/libc-2.12.so                 
             6531.00  0.9% mutex_spin_on_owner           [kernel.kallsyms]                   
             5899.00  0.8% reschedule_interrupt          [kernel.kallsyms]                   

直连测试

\c db0 role0

create or replace function f(int) returns void as $$
declare
begin 
  insert into userinfo values ($1, md5(random()::Text), now());
 exception when others then return;
end;
$$ language plpgsql strict;
$ pgbench -M prepared -n -r -P 1 -f ./test1.sql -c 64 -j 64 -T 1200 -U role0 db0

progress: 6.0 s, 149615.0 tps, lat 0.427 ms stddev 0.389
progress: 7.0 s, 149022.1 tps, lat 0.428 ms stddev 0.398
progress: 8.0 s, 148586.6 tps, lat 0.430 ms stddev 0.430

11. query测试

压测

$ vi test1.sql
\set id random(1,50000000)
select * from userinfo where uid=:id;

$ pgbench -M prepared -n -r -P 1 -f ./test.sql -c 64 -j 64 -T 1200 -U digoal mas1
...
progress: 158.0 s, 46046.8 tps, lat 1.388 ms stddev 0.960
progress: 159.0 s, 46201.2 tps, lat 1.384 ms stddev 0.945
progress: 160.0 s, 46176.9 tps, lat 1.385 ms stddev 0.957
progress: 161.0 s, 46143.0 tps, lat 1.386 ms stddev 0.943
progress: 162.0 s, 46316.0 tps, lat 1.381 ms stddev 0.955
progress: 163.0 s, 46067.1 tps, lat 1.388 ms stddev 0.954
...
   PerfTop:   46994 irqs/sec  kernel:40.7%  exact:  0.0% [1000Hz cycles],  (all, 64 CPUs)
-------------------------------------------------------------------------------------------------
             samples  pcnt function                      DSO
             _______ _____ _____________________________ ________________________________________

            17838.00  2.4% GetSnapshotData               /home/digoal/pgsql9.6/bin/postgres    
            17711.00  2.4% _spin_lock                    [kernel.kallsyms]                       
            17372.00  2.4% AllocSetAlloc                 /home/digoal/pgsql9.6/bin/postgres    
            15205.00  2.1% base_yyparse                  /home/digoal/pgsql9.6/bin/postgres    
            12210.00  1.7% SearchCatCache                /home/digoal/pgsql9.6/bin/postgres    
            11821.00  1.6% hash_search_with_hash_value   /home/digoal/pgsql9.6/bin/postgres    
            10729.00  1.5% __mutex_lock_slowpath         [kernel.kallsyms]                       
            10694.00  1.5% mutex_spin_on_owner           [kernel.kallsyms]                       
            10168.00  1.4% _int_malloc                   /lib64/libc-2.12.so                     
             8217.00  1.1% schedule                      [kernel.kallsyms]                       
             6977.00  1.0% __strlen_sse42                /lib64/libc-2.12.so                     
             6918.00  0.9% __strcmp_sse42                /lib64/libc-2.12.so                     
             6711.00  0.9% _spin_lock_irqsave            [kernel.kallsyms]                       
             6629.00  0.9% reschedule_interrupt          [kernel.kallsyms]                       
             6599.00  0.9% find_busiest_group            [kernel.kallsyms]                       
             5984.00  0.8% core_yylex                    /home/digoal/pgsql9.6/bin/postgres                                   

直连测试

$ vi test.sql
\set id random(1,50000000)
select * from userinfo_0 where uid=:id;

$ pgbench -M prepared -n -r -P 1 -f ./test.sql -c 64 -j 64 -T 1200 -U role0 db0
progress: 3.0 s, 1065981.5 tps, lat 0.059 ms stddev 0.008
progress: 4.0 s, 1084914.1 tps, lat 0.058 ms stddev 0.006
progress: 5.0 s, 1093083.4 tps, lat 0.057 ms stddev 0.006

小结

1. 基于fdw的shard,从功能上来讲还有哪些值得改进的点:

例如, 支持聚合下推(10.0已经支持)

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=f5d6bce63ceb3c59a964814bb0df5a0648e750e5

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=7012b132d07c2b4ea15b0b3cb1ea9f3278801d98

append的并行化,目前如果要扫描多个分区,是串行的,并不是并行扫描,非常影响效率。

2. 从性能方面来讲,还有哪些值得改进的点:

从profile的结果来看,目前的锁较重,有非常大的性能提升空间,需要花点时间看看代码。

当然如果当前的性能能满足你,或者你通过构建多个对等的master加上负载均衡,也能解决这个问题。

现在与直连单节点的性能差距还是较大的。

Count

相关实践学习
使用PolarDB和ECS搭建门户网站
本场景主要介绍基于PolarDB和ECS实现搭建门户网站。
阿里云数据库产品家族及特性
阿里云智能数据库产品团队一直致力于不断健全产品体系,提升产品性能,打磨产品功能,从而帮助客户实现更加极致的弹性能力、具备更强的扩展能力、并利用云设施进一步降低企业成本。以云原生+分布式为核心技术抓手,打造以自研的在线事务型(OLTP)数据库Polar DB和在线分析型(OLAP)数据库Analytic DB为代表的新一代企业级云原生数据库产品体系, 结合NoSQL数据库、数据库生态工具、云原生智能化数据库管控平台,为阿里巴巴经济体以及各个行业的企业客户和开发者提供从公共云到混合云再到私有云的完整解决方案,提供基于云基础设施进行数据从处理、到存储、再到计算与分析的一体化解决方案。本节课带你了解阿里云数据库产品家族及特性。
目录
相关文章
|
2月前
|
存储 关系型数据库 数据库
How to Optimize PostgreSQL Logical Replication
How to Optimize PostgreSQL Logical Replication
25 0
How to Optimize PostgreSQL Logical Replication
|
4月前
|
SQL 关系型数据库 API
【PostgreSQL】PostgreSQL扩展:pg_stat_statements 优化SQL
【PostgreSQL】PostgreSQL扩展:pg_stat_statements 优化SQL
LXJ
|
SQL 关系型数据库 开发工具
pg_pathman安装
pg_pathman安装
LXJ
418 0
|
关系型数据库 数据库 PostgreSQL
PostgreSQL pg_basebackup
本文探讨 pg_basebackup工具,本地备份、远程备份、单一表空间本地数据库的备份、表空间重定向
1675 0
PostgreSQL pg_basebackup
|
SQL 分布式计算 并行计算
PostgreSQL 并行计算解说 之13 - parallel OLAP : 中间结果 parallel with unlogged table
标签 PostgreSQL , cpu 并行 , smp 并行 , 并行计算 , gpu 并行 , 并行过程支持 背景 PostgreSQL 11 优化器已经支持了非常多场合的并行。简单估计,已支持27余种场景的并行计算。 parallel seq scan parallel
540 0
|
SQL 关系型数据库 OLAP
AnalyticDB for PostgreSQL 6.0 新特性解析:Recursive CTE (Common Table Expressions)
Recursive CTE (Common Table Expressions) 能够实现SQL的递归查询功能,一般用于处理逻辑上为层次化或树状结构的数据(如查询组织结构、物料清单等),方便对该类数据进行多级递归查询。
|
SQL 关系型数据库 测试技术
PostgreSQL sharding : citus 系列2 - TPC-H
标签 PostgreSQL , citus , tpc-h 背景 紧接着上一篇文档,本文测试citus的tpc-h能力(包括兼容性). 《PostgreSQL sharding : citus 系列1 - 多机部署(含OLTP(TPC-B)测试)》 《(TPC-H测试 SF=10,SF=200) PostgreSQL 11 vs 10 vs Deepgreen》 https://github.com/digoal/gp_tpch 实际测试过程中,发现CITUS对TPC-H的SQL支持并不完整。
1981 0
|
关系型数据库 数据库 SQL
PostgreSQL Logical Replication
限制及特性 1、只支持普通表生效,不支持序列、视图、物化视图、外部表、分区表和大对象 2、只支持普通表的DML(INSERT、UPDATE、DELETE)操作,不支持truncate、DDL操作 3、需要同步的表必须设置REPLICA IDENTITY 不能为noting(默认值是default).
8626 0
|
关系型数据库 数据库 PostgreSQL
PostgreSQL分区表(Table Partitioning)应用
一、简介   在数据库日渐庞大的今天,为了方便对数据库数据的管理,比如按时间,按地区去统计一些数据时,基数过于庞大,多有不便。
1361 0
相关产品
云数据库 Redis 版
云数据库 MongoDB 版
云数据库 RDS
推荐文章
更多