PostgreSQL 10.0 preview 性能增强 - libpq支持pipeline batch模式减少网络交互提升性能

本文涉及的产品
云原生数据库 PolarDB 分布式版,标准版 2核8GB
RDS PostgreSQL Serverless,0.5-4RCU 50GB 3个月
推荐场景:
对影评进行热评分析
云数据库 RDS MySQL,集群系列 2核4GB
推荐场景:
搭建个人博客
简介:

标签

PostgreSQL , 10.0 , libpq , pipeline , batch


背景

PostgreSQL 10.0 libpq支持pipeline batch两种模式,batch模式意味着客户端可以将多个QUERY塞入pipeline,作为一个batch提交给server段,从而减少客户端和服务端的网络交互次数。

在网络环境不太好的环境中,特别是云环境,大幅提升性能。

+   <application>libpq</application> supports queueing up mulitiple queries into  
+   a pipeline to be executed as a batch on the server. Batching queries allows  
+   applications to avoid a client/server round-trip after each query to get  
+   the results before issuing the next query.  

详见

Hi all  

Following on from the foreign table batch inserts thread[1], here's a patch  
to add support for pipelining queries into asynchronous batches in libpq.  

Attached, and also available at  
https://github.com/2ndQuadrant/postgres/tree/dev/libpq-async-batch (subject  
to rebasing and force pushes).  

It's cleaned up over the draft I posted on that thread and has error  
recovery implemented. I've written and included the SGML docs for it. The  
test program is now pretty comprehensive, more so than for anything else in  
libpq anyway. I'll submit it to the next CF as a 9.7/10.0 candidate.  

I'm measuring 300x (not %) performance improvements doing batches on  
servers over the Internet, so this seems pretty worthwhile. It turned out  
to be way less invasive than I expected too.  

(I intentionally didn't add any way for clients to annotate each work-item  
in a batch with their own private data. I think that'd be really useful and  
would make implementing clients easier, but should be a separate patch).  

This should be very useful for optimising FDWs, Postgres-XC, etc.  


[1]  
http://www.postgresql.org/message-id/CAMsr+YFgDUiJ37DEfPRk8WDBuZ58psdAYJd8iNFSaGxtw=wU3g@mail.gmail.com  

--   
 Craig Ringer                   http://www.2ndQuadrant.com/  
 PostgreSQL Development, 24x7 Support, Training & Services  

一些例子

+  <para>  
+   An example of batch use may be found in the source distribution in  
+   <filename>src/test/examples/libpqbatch.c</filename>.  
+  </para>  
+  
+  <sect2>  
+   <title>When to use batching</title>  
+  
+   <para>  
+    Much like asynchronous query mode, there is no performance disadvantage to  
+    using batching and pipelining. It somewhat increased client application  
+    complexity and extra caution is required to prevent client/server network  
+    deadlocks, but can offer considerable performance improvements.  
+   </para>  
+  
+   <para>  
+    Batching is most useful when the server is distant, i.e. network latency  
+    ("ping time") is high, and when many small operations are being performed in  
+    rapid sequence. There is usually less benefit in using batches when each  
+    query takes many multiples of the client/server round-trip time to execute.  
+    A 100-statement operation run on a server 300ms round-trip-time away would take  
+    30 seconds in network latency alone without batching; with batching it may spend  
+    as little as 0.3s waiting for results from the server.  
+   </para>  
+  
+   <para>  
+    Use batches when your application does lots of small  
+    <literal>INSERT</literal>, <literal>UPDATE</literal> and  
+    <literal>DELETE</literal> operations that can't easily be transformed into  
+    operations on sets or into a  
+    <link linkend="libpq-copy"><literal>COPY</literal></link> operation.  
+   </para>  
+  
+   <para>  
+    Batching less useful when information from one operation is required by the  
+    client before it knows enough to send the next operation. The client must  
+    introduce a synchronisation point and wait for a full client/server  
+    round-trip to get the results it needs. However, it's often possible to  
+    adjust the client design to exchange the required information server-side.  
+    Read-modify-write cycles are especially good candidates; for example:  
+    <programlisting>  
+     BEGIN;  
+     SELECT x FROM mytable WHERE id = 42 FOR UPDATE;  
+     -- result: x=2  
+     -- client adds 1 to x:  
+     UPDATE mytable SET x = 3 WHERE id = 42;  
+     COMMIT;  
+    </programlisting>  
+    could be much more efficiently done with:  
+    <programlisting>  
+     UPDATE mytable SET x = x + 1;  
+    </programlisting>  
+   </para>  
+  
+   <note>  
+    <para>  
+     The batch API was introduced in PostgreSQL 9.6, but clients using it can  
+     use batches on server versions 8.4 and newer. Batching works on any server  
+     that supports the v3 extended query protocol.  
+    </para>  
+   </note>  
+  
+  </sect2>  
+  
+  <sect2 id="libpq-batch-using">  
+   <title>Using batch mode</title>  
+  
+   <para>  
+    To issue batches the application must switch  
+    <application>libpq</application> into batch mode. Enter batch mode with <link  
+    linkend="libpq-pqbeginbatchmode"><function>PQbeginBatchMode(conn)</function></link> or test  
+    whether batch mode is active with <link  
+    linkend="libpq-pqisinbatchmode"><function>PQisInBatchMode(conn)</function></link>. In batch mode only <link  
+    linkend="libpq-async">asynchronous operations</link> are permitted, and  
+    <literal>COPY</literal> is not allowed. (The restriction on <literal>COPY</literal> is an implementation  
+    limit; the PostgreSQL protocol and server can support batched <literal>COPY</literal>).  
+   </para>  
+  
+   <para>  
+    The client uses libpq's asynchronous query functions to dispatch work,  
+    marking the end of each batch with <function>PQsendEndBatch</function>.  
+    Concurrently, it uses <function>PQgetResult</function> and  
+    <function>PQgetNextQuery</function> to get results. It may eventually exit  
+    batch mode with <function>PQendBatchMode</function> once all results are  
+    processed.  
+   </para>  
+  
+   <note>  
+    <para>  
+     It is best to use batch mode with <application>libpq</application> in  
+     <link linkend="libpq-pqsetnonblocking">non-blocking mode</link>. If used in  
+     blocking mode it is possible for a client/server deadlock to occur. The  
+     client will block trying to send queries to the server, but the server will  
+     block trying to send results from queries it's already processed to the  
+     client. This only occurs when the client sends enough queries to fill its  
+     output buffer and the server's receive buffer before switching to  
+     processing input from the server, but it's hard to predict exactly when  
+     that'll happen so it's best to always use non-blocking mode.  
+    </para>  
+   </note>  
+  
+   <sect3 id="libpq-batch-sending">  
+    <title>Issuing queries</title>  
+  
+    <para>  
+     After entering batch mode the application dispatches requests  
+     using normal asynchronous <application>libpq</application> functions like  
+     <function>PQsendQueryParams</function>, <function>PQsendPrepare</function>,  
+     etc. The asynchronous requests are followed by a <link  
+     linkend="libpq-pqsendendbatch"><function>PQsendEndBatch(conn)</function></link> call to mark  
+     the end of the batch. The client <emphasis>does not</emphasis> need to call  
+     <function>PQgetResult</function> immediately after dispatching each  
+     operation. <link linkend="libpq-batch-results">Result processing</link>  
+     is handled separately.  
+    </para>  
+      
+    <para>  
+     Batched operations will be executed by the server in the order the client  
+     sends them. The server will send the results in the order the statements  
+     executed. The server usually begins executing the batch before all commands  
+     in the batch are queued and the end of batch command is sent. If any  
+     statement encounters an error the server aborts the current transaction and  
+     skips processing the rest of the batch. Query processing resumes after the  
+     end of the failed batch.  
+    </para>  
+  
+    <para>  
+     It's fine for one operation to depend on the results of a  
+     prior one. One query may define a table that the next query in the same  
+     batch uses; similarly, an application may create a named prepared statement  
+     then execute it with later statements in the same batch.  
+    </para>  
+  
+   </sect3>  
+  
+   <sect3 id="libpq-batch-results">  
+    <title>Processing results</title>  
+  
+    <para>  
+     The client <link linkend="libpq-batch-interleave">interleaves result  
+     processing with sending batch queries</link>, or for small batches may  
+     process all results after sending the whole batch.  
+    </para>  
+  
+    <para>  
+     To get the result of the first batch entry the client must call <link  
+     linkend="libpq-pqgetnextquery"><function>PQgetNextQuery</function></link>. It must then call  
+     <function>PQgetResult</function> and handle the results until  
+     <function>PQgetResult</function> returns null (or would return null if  
+     called). The result from the next batch entry may then be retrieved using  
+     <function>PQgetNextQuery</function> and the cycle repeated.  The  
+     application handles individual statement results as normal.  
+    </para>  
+  
+    <para>  
+     <function>PQgetResult</function> behaves the same as for normal asynchronous  
+     processing except that it may contain the new <type>PGresult</type> types  
+     <literal>PGRES_BATCH_END</literal> and <literal>PGRES_BATCH_ABORTED</literal>.  
+     <literal>PGRES_BATCH_END</literal> is reported exactly once for each  
+     <function>PQsendEndBatch</function> call at the corresponding point in  
+     the result stream and at no other time. <literal>PGRES_BATCH_ABORTED</literal>  
+     is emitted during error handling; see <link linkend="libpq-batch-errors">  
+     error handling</link>.  
+    </para>  
+  
+    <para>  
+     <function>PQisBusy</function>, <function>PQconsumeInput</function>, etc  
+     operate as normal when processing batch results.  
+    </para>  
+  
+    <para>  
+     <application>libpq</application> does not provide any information to the  
+     application about the query currently being processed. The application  
+     must keep track of the order in which it sent queries and the expected  
+     results. Applications will typically use a state machine or a FIFO queue  
+     for this.  
+    </para>  

这个patch的讨论,详见邮件组,本文末尾URL。

PostgreSQL社区的作风非常严谨,一个patch可能在邮件组中讨论几个月甚至几年,根据大家的意见反复的修正,patch合并到master已经非常成熟,所以PostgreSQL的稳定性也是远近闻名的。

参考

https://commitfest.postgresql.org/13/1024/

https://www.postgresql.org/message-id/attachment/44303/0001-Pipelining-batch-support-for-libpq.patch

https://github.com/2ndQuadrant/postgres/tree/dev/libpq-async-batch

https://www.postgresql.org/message-id/flat/CAMsr+YFUjJytRyV4J-16bEoiZyH=4nj+sQ7JP9ajwz=B4dMMZw@mail.gmail.com#CAMsr+YFUjJytRyV4J-16bEoiZyH=4nj+sQ7JP9ajwz=B4dMMZw@mail.gmail.com

相关实践学习
使用PolarDB和ECS搭建门户网站
本场景主要介绍基于PolarDB和ECS实现搭建门户网站。
阿里云数据库产品家族及特性
阿里云智能数据库产品团队一直致力于不断健全产品体系,提升产品性能,打磨产品功能,从而帮助客户实现更加极致的弹性能力、具备更强的扩展能力、并利用云设施进一步降低企业成本。以云原生+分布式为核心技术抓手,打造以自研的在线事务型(OLTP)数据库Polar DB和在线分析型(OLAP)数据库Analytic DB为代表的新一代企业级云原生数据库产品体系, 结合NoSQL数据库、数据库生态工具、云原生智能化数据库管控平台,为阿里巴巴经济体以及各个行业的企业客户和开发者提供从公共云到混合云再到私有云的完整解决方案,提供基于云基础设施进行数据从处理、到存储、再到计算与分析的一体化解决方案。本节课带你了解阿里云数据库产品家族及特性。
目录
相关文章
|
7月前
|
存储 关系型数据库 数据库
postgresql|数据库|提升查询性能的物化视图解析
postgresql|数据库|提升查询性能的物化视图解析
745 0
|
27天前
|
Oracle 关系型数据库 数据库
【赵渝强老师】PostgreSQL中的模式
在PostgreSQL中,所有数据库对象均隶属于模式,包括表、索引、视图等,每个对象有唯一的oid标识。创建数据库时,默认生成名为“public”的Schema。用户可自定义模式,如通过SQL语句创建名为demo的模式及其下的表。与Oracle不同,PostgreSQL中用户和模式不是一一对应关系。
58 12
【赵渝强老师】PostgreSQL中的模式
|
4月前
|
缓存 关系型数据库 数据库
PostgreSQL性能
【8月更文挑战第26天】PostgreSQL性能
76 1
|
3月前
|
缓存 关系型数据库 数据库
如何优化 PostgreSQL 数据库性能?
如何优化 PostgreSQL 数据库性能?
160 2
|
2月前
|
存储 关系型数据库 MySQL
四种数据库对比MySQL、PostgreSQL、ClickHouse、MongoDB——特点、性能、扩展性、安全性、适用场景
四种数据库对比 MySQL、PostgreSQL、ClickHouse、MongoDB——特点、性能、扩展性、安全性、适用场景
|
3月前
|
缓存 关系型数据库 数据库
PostgreSQL的性能
PostgreSQL的性能
196 2
|
4月前
|
缓存 关系型数据库 数据库
PostgreSQL 查询性能
【8月更文挑战第5天】PostgreSQL 查询性能
88 8
|
4月前
|
关系型数据库 Java 数据库
PostgreSQL性能
【8月更文挑战第5天】PostgreSQL性能
136 7
|
4月前
|
监控 关系型数据库 数据库
如何优化PostgreSQL的性能?
【8月更文挑战第4天】如何优化PostgreSQL的性能?
282 7
|
5月前
|
分布式计算 DataWorks 关系型数据库
DataWorks操作报错合集之使用连接串模式新增PostgreSQL数据源时遇到了报错"not support data sync channel, error code: 0001",该怎么办
DataWorks是阿里云提供的一站式大数据开发与治理平台,支持数据集成、数据开发、数据服务、数据质量管理、数据安全管理等全流程数据处理。在使用DataWorks过程中,可能会遇到各种操作报错。以下是一些常见的报错情况及其可能的原因和解决方法。

相关产品

  • 云原生数据库 PolarDB
  • 云数据库 RDS PostgreSQL 版