开源数据库PostgreSQL攻克并行计算难题

本文涉及的产品
云原生数据库 PolarDB 分布式版,标准版 2核8GB
RDS SQL Server Serverless,2-4RCU 50GB 3个月
推荐场景:
RDS MySQL Serverless 基础系列,0.5-2RCU 50GB
简介: PostgreSQL 9.6的并行复制一发,相信已经有很多小伙伴已经开始测试了,我昨晚测试了一个场景是标签系统类应用的比特位运算,昨天测试发现性能相比非并行已经提升了7倍。昨天没有仔细研究代码,发现怎么测都只能用8个并行,今天看了一下代码,终于找到端倪了,其实并行度是由几个方面决定d , 决定并行.

经过多年的酝酿(从支持work process到支持动态fork共享内存,再到内核层面支持并行计算),PostgreSQL 的并行计算功能终于来了,为PG的scale up能力再次拔高一个台阶,标志着开源数据库已经攻克了并行计算的难题。


相信有很多小伙伴已经开始测试了,我也测试了一个场景是标签系统类应用的比特位运算,昨天测试发现性能相比非并行已经提升了7倍。

调整并行度,在32个核的虚拟机上测试,性能提升了约10多倍。
但是实际上并没有到32倍,不考虑内存和IO的瓶颈,是有优化空间。
注意不同的并行度,效果不一样,目前来看并不是最大并行度就能发挥最好的性能,还需要考虑锁竞争的问题。
把测试表的数据量加载到16亿,共90GB。

postgres=# \dt+
                    List of relations
 Schema |  Name  | Type  |  Owner   | Size  | Description 
--------+--------+-------+----------+-------+-------------
 public | t_bit2 | table | postgres | 90 GB | 
(1 row)
AI 代码解读

不使用并行的性能如下,耗时 141377.100 毫秒。

postgres=# alter table t_bit2 set (parallel_degree=0);
ALTER TABLE
Time: 0.335 ms
postgres=# select count(*) from t_bit2 ;
   count    
------------
 1600000000
(1 row)
Time: 141377.100 ms
AI 代码解读

使用17个并行,获得了最好的性能, 耗时9423.257 毫秒。

postgres=# alter table t_bit2 set (parallel_degree=17);
ALTER TABLE
Time: 0.287 ms
postgres=# select count(*) from t_bit2 ;
   count    
------------
 1600000000
(1 row)

Time: 9423.257 ms
AI 代码解读

并行度为17时,每秒处理的数据量已经达到9.55GB。
与非并行相比,性能达到了15倍,基本上是线性的。
但是可能由于NUMA的原因(并行度增加时, 读数据操作可能会引入较多的__mutex_lock_slowpath, _spin_lock),并行度再加上来性能并不能再线性提升,而是会往下走。
_
另一组测试数据,加入了BIT计算。
32个并行度时,可以获得最好的性能提升,同样也和NUMA有关,为什么并行度能更高呢,因为计算量更大了,扫描冲突可以分担掉。
同样性能比达到了30.9倍,也基本上是线性的。
_

postgres=# alter table t_bit2 set (parallel_degree=32);
ALTER TABLE
Time: 0.341 ms
postgres=# select count(*) from t_bit2 where bitand(id, '10101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010')=B'10101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010';
   count    
------------
 1600000000
(1 row)

Time: 15836.064 ms
postgres=# alter table t_bit2 set (parallel_degree=0);
ALTER TABLE
Time: 0.368 ms
postgres=# select count(*) from t_bit2 where bitand(id, '10101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010')=B'10101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010';
   count    
------------
 1600000000
(1 row)

Time: 488459.158 ms
postgres=# select 488459.158 /15826.358;
      ?column?       
---------------------
 30.8636489835501004
(1 row)

Time: 2.919 ms
AI 代码解读

后面会再提供tpc-h的测试数据。


那么如何设置并行度呢?决定并行度的几个参数如下
.1. 最大允许的并行度
max_parallel_degree


.2. 表设置的并行度(create table或alter table设置)
parallel_degree
如果设置了表的并行度,则最终并行度取min(max_parallel_degree , parallel_degree )

                /*
                 * Use the table parallel_degree, but don't go further than
                 * max_parallel_degree.
                 */
                parallel_degree = Min(rel->rel_parallel_degree, max_parallel_degree);
AI 代码解读


.3. 如果表没有设置并行度parallel_degree ,则根据表的大小 和 parallel_threshold 这个硬编码值决定,计算得出(见函数create_plain_partial_paths)
然后依旧受到max_parallel_degree 参数的限制,不能大于它。
代码如下

src/backend/optimizer/util/plancat.c
void
get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
                                  RelOptInfo *rel)
{
...
        /* Retrive the parallel_degree reloption, if set. */
        rel->rel_parallel_degree = RelationGetParallelDegree(relation, -1);
...


src/include/utils/rel.h
/*
 * RelationGetParallelDegree
 *              Returns the relation's parallel_degree.  Note multiple eval of argument!
 */
#define RelationGetParallelDegree(relation, defaultpd) \
        ((relation)->rd_options ? \
         ((StdRdOptions *) (relation)->rd_options)->parallel_degree : (defaultpd))


src/backend/optimizer/path/allpaths.c
/*
 * create_plain_partial_paths
 *        Build partial access paths for parallel scan of a plain relation
 */
static void
create_plain_partial_paths(PlannerInfo *root, RelOptInfo *rel)
{
        int                     parallel_degree = 1;

        /*
         * If the user has set the parallel_degree reloption, we decide what to do
         * based on the value of that option.  Otherwise, we estimate a value.
         */
        if (rel->rel_parallel_degree != -1)
        {
                /*
                 * If parallel_degree = 0 is set for this relation, bail out.  The
                 * user does not want a parallel path for this relation.
                 */
                if (rel->rel_parallel_degree == 0)
                        return;

                /*
                 * Use the table parallel_degree, but don't go further than
                 * max_parallel_degree.
                 */
                parallel_degree = Min(rel->rel_parallel_degree, max_parallel_degree);
        }
        else
        {
                int                     parallel_threshold = 1000;

                /*
                 * If this relation is too small to be worth a parallel scan, just
                 * return without doing anything ... unless it's an inheritance child.
                 * In that case, we want to generate a parallel path here anyway.  It
                 * might not be worthwhile just for this relation, but when combined
                 * with all of its inheritance siblings it may well pay off.
                 */
                if (rel->pages < parallel_threshold &&
                        rel->reloptkind == RELOPT_BASEREL)
                        return;
// 表级并行度没有设置时,通过表的大小和parallel_threshold 计算并行度  
                /*
                 * Limit the degree of parallelism logarithmically based on the size
                 * of the relation.  This probably needs to be a good deal more
                 * sophisticated, but we need something here for now.
                 */
                while (rel->pages > parallel_threshold * 3 &&
                           parallel_degree < max_parallel_degree)
                {
                        parallel_degree++;
                        parallel_threshold *= 3;
                        if (parallel_threshold >= PG_INT32_MAX / 3)
                                break;
                }
        }

        /* Add an unordered partial path based on a parallel sequential scan. */
        add_partial_path(rel, create_seqscan_path(root, rel, NULL, parallel_degree));
}
AI 代码解读


其他测试数据:

增加到32个并行,和硬件有关,并不一定是并行度最高时,性能就最好,前面已经分析了,一定要找到每个查询的拐点。  
postgres=# alter table t_bit2 set (parallel_degree =32);

postgres=# explain (analyze,verbose,timing,costs,buffers) select count(*) from t_bit2 where bitand(id, '10101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010')=B'10101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010';
                                                                                                                                                                                                                                        QUERY
 PLAN                                                                                                                                                                                                                                        
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=1551053.25..1551053.26 rows=1 width=8) (actual time=31092.551..31092.552 rows=1 loops=1)
   Output: count(*)
   Buffers: shared hit=1473213
   ->  Gather  (cost=1551049.96..1551053.17 rows=32 width=8) (actual time=31060.939..31092.469 rows=33 loops=1)
         Output: (PARTIAL count(*))
         Workers Planned: 32
         Workers Launched: 32
         Buffers: shared hit=1473213
         ->  Partial Aggregate  (cost=1550049.96..1550049.97 rows=1 width=8) (actual time=31047.074..31047.075 rows=1 loops=33)
               Output: PARTIAL count(*)
               Buffers: shared hit=1470589
               Worker 0: actual time=31037.287..31037.288 rows=1 loops=1
                 Buffers: shared hit=43483
               Worker 1: actual time=31035.803..31035.804 rows=1 loops=1
                 Buffers: shared hit=45112
......
               Worker 31: actual time=31055.871..31055.876 rows=1 loops=1
                 Buffers: shared hit=46439
               ->  Parallel Seq Scan on public.t_bit2  (cost=0.00..1549983.80 rows=26465 width=0) (actual time=0.040..17244.827 rows=6060606 loops=33)
                     Output: id
                     Filter: (bitand(t_bit2.id, B'1010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101
0101010101010'::"bit") = B'10101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010'::"bit")
                     Buffers: shared hit=1470589
                     Worker 0: actual time=0.035..17314.296 rows=5913688 loops=1
                       Buffers: shared hit=43483
                     Worker 1: actual time=0.030..16965.158 rows=6135232 loops=1
                       Buffers: shared hit=45112
......
                     Worker 31: actual time=0.031..17580.908 rows=6315704 loops=1
                       Buffers: shared hit=46439
 Planning time: 0.354 ms
 Execution time: 31157.006 ms
(145 rows)

比特位运算  
postgres=# select count(*) from t_bit2 where bitand(id, '10101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010')=B'10101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010';
   count   
-----------
 200000000
(1 row)
Time: 4320.931 ms

COUNT  
postgres=# select count(*) from t_bit2;
   count   
-----------
 200000000
(1 row)
Time: 1896.647 ms

关闭并行的查询效率    
postgres=# set force_parallel_mode =off;
SET
postgres=# alter table t_bit2 set (parallel_degree =0);
ALTER TABLE
postgres=# \timing
Timing is on.
postgres=# select count(*) from t_bit2 where bitand(id, '10101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010')=B'10101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010';
   count   
-----------
 200000000
(1 row)
Time: 53098.480 ms
postgres=# select count(*) from t_bit2;
   count   
-----------
 200000000
(1 row)
Time: 18504.679 ms

表大小  
postgres=# \dt+ t_bit2
                    List of relations
 Schema |  Name  | Type  |  Owner   | Size  | Description 
--------+--------+-------+----------+-------+-------------
 public | t_bit2 | table | postgres | 11 GB | 
(1 row)
AI 代码解读


参考信息
http://www.postgresql.org/docs/9.6/static/sql-createtable.html

parallel_degree (integer)
The parallel degree for a table is the number of workers that should be used to assist a parallel scan of that table. If not set, the system will determine a value based on the relation size. The actual number of workers chosen by the planner may be less, for example due to the setting of max_parallel_degree.
AI 代码解读

http://www.postgresql.org/docs/9.6/static/runtime-config-query.html#RUNTIME-CONFIG-QUERY-OTHER

force_parallel_mode (enum)
Allows the use of parallel queries for testing purposes even in cases where no performance benefit is expected. The allowed values of force_parallel_mode are off (use parallel mode only when it is expected to improve performance), on (force parallel query for all queries for which it is thought to be safe), and regress (like on, but with additional behavior changes as explained below).

More specifically, setting this value to on will add a Gather node to the top of any query plan for which this appears to be safe, so that the query runs inside of a parallel worker. Even when a parallel worker is not available or cannot be used, operations such as starting a subtransaction that would be prohibited in a parallel query context will be prohibited unless the planner believes that this will cause the query to fail. If failures or unexpected results occur when this option is set, some functions used by the query may need to be marked PARALLEL UNSAFE (or, possibly, PARALLEL RESTRICTED).

Setting this value to regress has all of the same effects as setting it to on plus some additional effects that are intended to facilitate automated regression testing. Normally, messages from a parallel worker include a context line indicating that, but a setting of regress suppresses this line so that the output is the same as in non-parallel execution. Also, the Gather nodes added to plans by this setting are hidden in EXPLAIN output so that the output matches what would be obtained if this setting were turned off.
AI 代码解读

http://www.postgresql.org/docs/9.6/static/runtime-config-resource.html#RUNTIME-CONFIG-RESOURCE-ASYNC-BEHAVIOR

max_parallel_degree (integer)
Sets the maximum number of workers that can be started for an individual parallel operation. Parallel workers are taken from the pool of processes established by max_worker_processes. Note that the requested number of workers may not actually be available at runtime. If this occurs, the plan will run with fewer workers than expected, which may be inefficient. The default value is 2. Setting this value to 0 disables parallel query execution.
AI 代码解读

http://www.postgresql.org/docs/9.6/static/runtime-config-query.html#RUNTIME-CONFIG-QUERY-CONSTANTS

parallel_setup_cost (floating point)
Sets the planner's estimate of the cost of launching parallel worker processes. The default is 1000.
parallel_tuple_cost (floating point)
Sets the planner's estimate of the cost of transferring one tuple from a parallel worker process to another process. The default is 0.1.
AI 代码解读
相关实践学习
使用PolarDB和ECS搭建门户网站
本场景主要介绍基于PolarDB和ECS实现搭建门户网站。
阿里云数据库产品家族及特性
阿里云智能数据库产品团队一直致力于不断健全产品体系,提升产品性能,打磨产品功能,从而帮助客户实现更加极致的弹性能力、具备更强的扩展能力、并利用云设施进一步降低企业成本。以云原生+分布式为核心技术抓手,打造以自研的在线事务型(OLTP)数据库Polar DB和在线分析型(OLAP)数据库Analytic DB为代表的新一代企业级云原生数据库产品体系, 结合NoSQL数据库、数据库生态工具、云原生智能化数据库管控平台,为阿里巴巴经济体以及各个行业的企业客户和开发者提供从公共云到混合云再到私有云的完整解决方案,提供基于云基础设施进行数据从处理、到存储、再到计算与分析的一体化解决方案。本节课带你了解阿里云数据库产品家族及特性。
目录
打赏
0
0
0
0
20702
分享
相关文章
MyEMS开源系统安装之数据库
本文详细讲解MyEMS的安装步骤,重点介绍数据库架构与脚本部署。MyEMS支持MySQL 8.0、MariaDB 10.5及SingleStore 7.0等数据库服务器。通过命令行或客户端工具执行SQL脚本完成安装,包括多个数据库(如myems_billing_db、myems_energy_db等)。此外,提供解决常见问题的方法,如“用户拒绝访问”、“COLLATE设置”和“MAX_ALLOWED_PACKET错误”。注意,不建议在生产环境中将数据库安装于Docker容器内。
61 1
Go语言数据库编程:使用 `database/sql` 与 MySQL/PostgreSQL
Go语言通过`database/sql`标准库提供统一数据库操作接口,支持MySQL、PostgreSQL等多种数据库。本文介绍了驱动安装、连接数据库、基本增删改查操作、预处理语句、事务处理及错误管理等内容,涵盖实际开发中常用的技巧与注意事项,适合快速掌握Go语言数据库编程基础。
113 62
【赵渝强老师】基于PostgreSQL的分布式数据库:Citus
Citus 是基于 PostgreSQL 的开源分布式数据库,采用 shared nothing 架构,具备良好的扩展性。它以插件形式集成,部署简单,适用于处理大规模数据和高并发场景。本文介绍了 Citus 的基础概念、安装配置步骤及其在单机环境下的集群搭建方法。
16.1k star! 只需要DDL就能一键生成数据库关系图!开源神器ChartDB让你的数据结构"看得见"
ChartDB是一款开源的数据库可视化神器,通过一句智能查询就能自动生成专业的数据库关系图。无需安装客户端、不用暴露数据库密码,打开网页就能完成从数据建模到迁移的全流程操作,堪称开发者的"数据库透视镜"。
569 67
PolarDB开源:云原生数据库的架构革命
本文围绕开源核心价值、社区运营实践和技术演进路线展开。首先解读存算分离架构的三大突破,包括基于RDMA的分布式存储、计算节点扩展及存储池扩容机制,并强调与MySQL的高兼容性。其次分享阿里巴巴开源治理模式,涵盖技术决策、版本发布和贡献者成长体系,同时展示企业应用案例。最后展望技术路线图,如3.0版本的多写多读架构、智能调优引擎等特性,以及开发者生态建设举措,推荐使用PolarDB-Operator实现高效部署。
187 3
PolarDB开源数据库入门教程
PolarDB是阿里云推出的云原生数据库,基于PostgreSQL、MySQL和Oracle引擎构建,具备高性能、高扩展性和高可用性。其开源版采用计算与存储分离架构,支持快速弹性扩展和100%兼容PostgreSQL/MySQL。本文介绍了PolarDB的安装方法(Docker部署或源码编译)、基本使用(连接数据库、创建表等)及高级特性(计算节点扩展、存储自动扩容、并行查询等)。同时提供了性能优化建议和监控维护方法,帮助用户在生产环境中高效使用PolarDB。
824 21
PolarDB开源:云原生数据库的新篇章
阿里云自研的云原生数据库PolarDB于2023年5月正式开源,采用“存储计算分离”架构,具备高性能、高可用及全面兼容性。其开源版本提供企业级数据库解决方案,支持MySQL、PostgreSQL和Oracle语法,适用于高并发OLTP、核心业务系统等场景。PolarDB通过开放治理与开发者工具构建完整生态,并展望更丰富的插件功能与AI集成,为中国云原生数据库技术发展贡献重要力量。
330 17
【赵渝强老师】创建PostgreSQL的数据库
本文介绍了在PostgreSQL中通过SQL命令“create database”创建数据库的方法。首先查询系统目录pg_database以查看现有数据库集合,然后使用“create database”命令创建新数据库,并了解其在$PDATA/base目录下对应的文件夹生成。最后重新查询数据库集合确认创建结果,附带视频讲解便于理解操作步骤及注意事项。
101 1
【赵渝强老师】PostgreSQL的模板数据库
在PostgreSQL中,创建新数据库时,默认通过拷贝`template1`实现。`template1`包含标准系统对象,可自定义以影响新数据库内容;而`template0`是纯净模板,仅含预定义对象且不应修改。视频讲解和代码示例展示了如何查看现有数据库信息及标识字段的作用。 ![图示](https://ucc.alicdn.com/pic/developer-ecology/yub6x2mlkqwck_398ed06397a44c2d9bfbb5ae5c90bbc0.png) [视频链接](https://www.bilibili.com/video/BV1szyfY4EQn)
【赵渝强老师】PostgreSQL的模板数据库

相关产品

  • 云原生数据库 PolarDB
  • 云数据库 RDS PostgreSQL 版
  • 推荐镜像

    更多
    AI助理
    登录插画

    登录以查看您的控制台资源

    管理云资源
    状态一览
    快捷访问

    你好,我是AI助理

    可以解答问题、推荐解决方案等