PostgreSQL 大表扫描策略 - BAS_BULKREAD , synchronize_seqscans

本文涉及的产品
RDS MySQL Serverless 基础系列,0.5-2RCU 50GB
云原生数据库 PolarDB MySQL 版,Serverless 5000PCU 100GB
云数据库 Redis 版,社区版 2GB
推荐场景:
搭建游戏排行榜
简介:

标签

PostgreSQL , 大表扫描 , synchronize_seqscans


背景

数据库BUFFER是一个对物理IO对冲非常有效的组件,所以BUFFER的稳态对IO波动影响很大。

在一些操作时,可能会导致一些IO的波动,比如刷脏页,检查点。但是PostgreSQL在这方面都有很好的调度策略,避免IO的波动。具体见PG文档的介绍,或如下

《PostgreSQL 9.6 平滑fsync, write原理浅析》

还有一些操作也可能打破BUFFER的稳态,例如全表扫描访问大表,不过还好PG也有对应的AM策略,当表大小超过了四分之一的shared_buffers时,并且对其进行全表扫描时,会使用大表访问策略:

包括每个ring访问多少个BLOCK,以及这些BUFFER的标记,是否优先清除。(从而避免大表全表扫描时,导致一些较热的BUFFER被清出。而是优先清出全表扫描的大表BUFFER)

另一个大表全表扫访问可能导致IO紊乱的原因,如果全表扫描大表的并发比较多,比如有5个并发,如果每个会话都需要扫描一遍全表,实际IO会是5乘以表的大小。

PostgreSQL对此也做了优化。

PostgreSQL 并行会话大表同步扫描,共享IO。

设置synchronize_seqscans=true时,如果有并行的会话全表扫描大表,那么PG会使用共享IO,协调多个会话的扫描步调,使得并行的会话尽量的同时访问同一个BLOCK,最后大家都扫描完整表。

这样的话从磁盘读取到SHARED BUFFER的IO可能就不是表大小 乘以 会话数的N备,而是更小,甚至1.

synchronize_seqscans (boolean)

This allows sequential scans of large tables to synchronize with each other, so that concurrent scans read the same block at about the same time and hence share the I/O workload. When this is enabled, a scan might start in the middle of the table and then “wrap around” the end to cover all rows, so as to synchronize with the activity of scans already in progress. This can result in unpredictable changes in the row ordering returned by queries that have no ORDER BY clause. Setting this parameter to off ensures the pre-8.3 behavior in which a sequential scan always starts from the beginning of the table. The default is on.

如果将其设置为OFF,那么全表扫描都是从0号数据块开始扫。那么全表扫描的记录返回顺序完全会与行号一致。

而设置为ON时,由于要协调并行访问的步调,全表扫描可能从中间开始扫,所以返回数据的顺序可能会乱序。

社区全表扫描的大表访问策略

/* ----------------  
 *              initscan - scan code common to heap_beginscan and heap_rescan  
 * ----------------  
 */  
static void  
initscan(HeapScanDesc scan, ScanKey key, bool keep_startblock)  
{  
  
  
        /*  
         * If the table is large relative to NBuffers, use a bulk-read access  
         * strategy and enable synchronized scanning (see syncscan.c).  Although  
         * the thresholds for these features could be different, we make them the  
         * same so that there are only two behaviors to tune rather than four.  
         * (However, some callers need to be able to disable one or both of these  
         * behaviors, independently of the size of the table; also there is a GUC  
         * variable that can disable synchronized scanning.)  
         *  
         * Note that heap_parallelscan_initialize has a very similar test; if you  
         * change this, consider changing that one, too.  
         */  
        if (!RelationUsesLocalBuffers(scan->rs_rd) &&  
                scan->rs_nblocks > NBuffers / 4)  
        {  
                allow_strat = scan->rs_allow_strat;  
                allow_sync = scan->rs_allow_sync;  
        }  
        else  
                allow_strat = allow_sync = false;  
  
        if (allow_strat)  
        {  
                /* During a rescan, keep the previous strategy object. */  
                if (scan->rs_strategy == NULL)  
                        scan->rs_strategy = GetAccessStrategy(BAS_BULKREAD);  
        }  
  
....  
  
  
        else if (keep_startblock)  
        {  
                /*  
                 * When rescanning, we want to keep the previous startblock setting,  
                 * so that rewinding a cursor doesn't generate surprising results.  
                 * Reset the active syncscan setting, though.  
                 */  
                scan->rs_syncscan = (allow_sync && synchronize_seqscans);  
        }  
        else if (allow_sync && synchronize_seqscans)  
        {  
                scan->rs_syncscan = true;  
                scan->rs_startblock = ss_get_location(scan->rs_rd, scan->rs_nblocks);  
        }  

NBuffers变量指的是SHARED BUFFER

        /*  
         * We sometimes multiply the number of shared buffers by two without  
         * checking for overflow, so we mustn't allow more than INT_MAX / 2.  
         */  
        {  
                {"shared_buffers", PGC_POSTMASTER, RESOURCES_MEM,  
                        gettext_noop("Sets the number of shared memory buffers used by the server."),  
                        NULL,  
                        GUC_UNIT_BLOCKS  
                },  
                &NBuffers,  
                1024, 16, INT_MAX / 2,  
                NULL, NULL, NULL  
        },  

目前PG设计的几种访问策略

/* Possible arguments for GetAccessStrategy() */  
typedef enum BufferAccessStrategyType  
{  
        BAS_NORMAL,                                     /* Normal random access */  
        BAS_BULKREAD,                           /* Large read-only scan (hint bit updates are  
                                                                 * ok) */  
        BAS_BULKWRITE,                          /* Large multi-block write (e.g. COPY IN) */  
        BAS_VACUUM                                      /* VACUUM */  
} BufferAccessStrategyType;  

获取访问策略

/*  
 * GetAccessStrategy -- create a BufferAccessStrategy object  
 *  
 * The object is allocated in the current memory context.  
 */  
BufferAccessStrategy  
GetAccessStrategy(BufferAccessStrategyType btype)  
{  
        BufferAccessStrategy strategy;  
        int                     ring_size;  
  
        /*  
         * Select ring size to use.  See buffer/README for rationales.  
         *  
         * Note: if you change the ring size for BAS_BULKREAD, see also  
         * SYNC_SCAN_REPORT_INTERVAL in access/heap/syncscan.c.  
         */  
        switch (btype)  
        {  
                case BAS_NORMAL:  
                        /* if someone asks for NORMAL, just give 'em a "default" object */  
                        return NULL;  
  
                case BAS_BULKREAD:  
                        ring_size = 256 * 1024 / BLCKSZ;  
                        break;  
                case BAS_BULKWRITE:  
                        ring_size = 16 * 1024 * 1024 / BLCKSZ;  
                        break;  
                case BAS_VACUUM:  
                        ring_size = 256 * 1024 / BLCKSZ;  
                        break;  
  
                default:  
                        elog(ERROR, "unrecognized buffer access strategy: %d",  
                                 (int) btype);  
                        return NULL;            /* keep compiler quiet */  
        }  
  
        /* Make sure ring isn't an undue fraction of shared buffers */  
        ring_size = Min(NBuffers / 8, ring_size);  
  
        /* Allocate the object and initialize all elements to zeroes */  
        strategy = (BufferAccessStrategy)  
                palloc0(offsetof(BufferAccessStrategyData, buffers) +  
                                ring_size * sizeof(Buffer));  
  
        /* Set fields that don't start out zero */  
        strategy->btype = btype;  
        strategy->ring_size = ring_size;  
  
        return strategy;  
}  

检查BAS_BULKREAD标记

/*  
 * StrategyRejectBuffer -- consider rejecting a dirty buffer  
 *  
 * When a nondefault strategy is used, the buffer manager calls this function  
 * when it turns out that the buffer selected by StrategyGetBuffer needs to  
 * be written out and doing so would require flushing WAL too.  This gives us  
 * a chance to choose a different victim.  
 *  
 * Returns true if buffer manager should ask for a new victim, and false  
 * if this buffer should be written and re-used.  
 */  
bool  
StrategyRejectBuffer(BufferAccessStrategy strategy, BufferDesc *buf)  
{  
        /* We only do this in bulkread mode */  
        if (strategy->btype != BAS_BULKREAD)  
                return false;  
  
        /* Don't muck with behavior of normal buffer-replacement strategy */  
        if (!strategy->current_was_in_ring ||  
                strategy->buffers[strategy->current] != BufferDescriptorGetBuffer(buf))  
                return false;  
  
        /*  
         * Remove the dirty buffer from the ring; necessary to prevent infinite  
         * loop if all ring members are dirty.  
         */  
        strategy->buffers[strategy->current] = InvalidBuffer;  
  
        return true;  
}  

优先干掉BAS_BULKREAD标记的BUFFER

                                        if (XLogNeedsFlush(lsn) &&  
                                                StrategyRejectBuffer(strategy, buf))  
                                        {  
                                                /* Drop lock/pin and loop around for another buffer */  
                                                LWLockRelease(BufferDescriptorGetContentLock(buf));  
                                                UnpinBuffer(buf, true);  
                                                continue;  
                                        }  

参考

src/backend/access/heap/heapam.c

src/backend/access/heap/syncscan.c

src/include/storage/bufmgr.h

src/backend/storage/buffer/freelist.c

src/backend/utils/misc/guc.c

src/backend/storage/buffer/bufmgr.c

相关实践学习
使用PolarDB和ECS搭建门户网站
本场景主要介绍基于PolarDB和ECS实现搭建门户网站。
阿里云数据库产品家族及特性
阿里云智能数据库产品团队一直致力于不断健全产品体系,提升产品性能,打磨产品功能,从而帮助客户实现更加极致的弹性能力、具备更强的扩展能力、并利用云设施进一步降低企业成本。以云原生+分布式为核心技术抓手,打造以自研的在线事务型(OLTP)数据库Polar DB和在线分析型(OLAP)数据库Analytic DB为代表的新一代企业级云原生数据库产品体系, 结合NoSQL数据库、数据库生态工具、云原生智能化数据库管控平台,为阿里巴巴经济体以及各个行业的企业客户和开发者提供从公共云到混合云再到私有云的完整解决方案,提供基于云基础设施进行数据从处理、到存储、再到计算与分析的一体化解决方案。本节课带你了解阿里云数据库产品家族及特性。
相关文章
|
10月前
|
存储 关系型数据库 PostgreSQL
PostgreSQL表扫描方法解析
PostgreSQL表扫描方法解析
77 0
|
10月前
|
存储 SQL Oracle
PostgreSQL扫描方法综述
PostgreSQL扫描方法综述
82 0
|
存储 关系型数据库 PostgreSQL
PostgreSQL优化器之从一个关于扫描方式选择引发的思考
# 一个关于PostgreSQL使用组合索引的问题 近期阅读了《数据库查询优化器的艺术》这本书,对PG和Mysql优化器技术的轮廓有了一定了解。在阅读的过程中,因为知识背景和书本身的表述问题产生了许多困惑,这里就分享对其中一个困惑的探索过程作为看完书的总结。 在这本书的第十八章,关于PG和Mysql的优化器对于索引的优化能力对比中的一段让我困惑不已。如图一所示,单独使用组合索引的后半部分作为查
4648 0
|
SQL 存储 关系型数据库
PostgreSQL 10.1 手册_部分 II. SQL 语言_第 11 章 索引_11.11. 只用索引的扫描
11.11. 只用索引的扫描 PostgreSQL中的所有索引都是二级索引,表示每一个索引都被存储在表的主数据区域(在PostgreSQL术语中被称为该表的堆)之外。这意味着在一次普通索引扫描中,每次取一行需要从索引和堆中取得数据。
1057 0

相关产品

  • 云原生数据库 PolarDB