PostgreSQL 大表扫描策略 - BAS_BULKREAD , synchronize_seqscans

本文涉及的产品
RDS MySQL Serverless 基础系列,0.5-2RCU 50GB
Redis 开源版,标准版 2GB
推荐场景:
搭建游戏排行榜
云数据库 Tair(兼容Redis),内存型 2GB
简介:

标签

PostgreSQL , 大表扫描 , synchronize_seqscans


背景

数据库BUFFER是一个对物理IO对冲非常有效的组件,所以BUFFER的稳态对IO波动影响很大。

在一些操作时,可能会导致一些IO的波动,比如刷脏页,检查点。但是PostgreSQL在这方面都有很好的调度策略,避免IO的波动。具体见PG文档的介绍,或如下

《PostgreSQL 9.6 平滑fsync, write原理浅析》

还有一些操作也可能打破BUFFER的稳态,例如全表扫描访问大表,不过还好PG也有对应的AM策略,当表大小超过了四分之一的shared_buffers时,并且对其进行全表扫描时,会使用大表访问策略:

包括每个ring访问多少个BLOCK,以及这些BUFFER的标记,是否优先清除。(从而避免大表全表扫描时,导致一些较热的BUFFER被清出。而是优先清出全表扫描的大表BUFFER)

另一个大表全表扫访问可能导致IO紊乱的原因,如果全表扫描大表的并发比较多,比如有5个并发,如果每个会话都需要扫描一遍全表,实际IO会是5乘以表的大小。

PostgreSQL对此也做了优化。

PostgreSQL 并行会话大表同步扫描,共享IO。

设置synchronize_seqscans=true时,如果有并行的会话全表扫描大表,那么PG会使用共享IO,协调多个会话的扫描步调,使得并行的会话尽量的同时访问同一个BLOCK,最后大家都扫描完整表。

这样的话从磁盘读取到SHARED BUFFER的IO可能就不是表大小 乘以 会话数的N备,而是更小,甚至1.

synchronize_seqscans (boolean)

This allows sequential scans of large tables to synchronize with each other, so that concurrent scans read the same block at about the same time and hence share the I/O workload. When this is enabled, a scan might start in the middle of the table and then “wrap around” the end to cover all rows, so as to synchronize with the activity of scans already in progress. This can result in unpredictable changes in the row ordering returned by queries that have no ORDER BY clause. Setting this parameter to off ensures the pre-8.3 behavior in which a sequential scan always starts from the beginning of the table. The default is on.

如果将其设置为OFF,那么全表扫描都是从0号数据块开始扫。那么全表扫描的记录返回顺序完全会与行号一致。

而设置为ON时,由于要协调并行访问的步调,全表扫描可能从中间开始扫,所以返回数据的顺序可能会乱序。

社区全表扫描的大表访问策略

/* ----------------  
 *              initscan - scan code common to heap_beginscan and heap_rescan  
 * ----------------  
 */  
static void  
initscan(HeapScanDesc scan, ScanKey key, bool keep_startblock)  
{  
  
  
        /*  
         * If the table is large relative to NBuffers, use a bulk-read access  
         * strategy and enable synchronized scanning (see syncscan.c).  Although  
         * the thresholds for these features could be different, we make them the  
         * same so that there are only two behaviors to tune rather than four.  
         * (However, some callers need to be able to disable one or both of these  
         * behaviors, independently of the size of the table; also there is a GUC  
         * variable that can disable synchronized scanning.)  
         *  
         * Note that heap_parallelscan_initialize has a very similar test; if you  
         * change this, consider changing that one, too.  
         */  
        if (!RelationUsesLocalBuffers(scan->rs_rd) &&  
                scan->rs_nblocks > NBuffers / 4)  
        {  
                allow_strat = scan->rs_allow_strat;  
                allow_sync = scan->rs_allow_sync;  
        }  
        else  
                allow_strat = allow_sync = false;  
  
        if (allow_strat)  
        {  
                /* During a rescan, keep the previous strategy object. */  
                if (scan->rs_strategy == NULL)  
                        scan->rs_strategy = GetAccessStrategy(BAS_BULKREAD);  
        }  
  
....  
  
  
        else if (keep_startblock)  
        {  
                /*  
                 * When rescanning, we want to keep the previous startblock setting,  
                 * so that rewinding a cursor doesn't generate surprising results.  
                 * Reset the active syncscan setting, though.  
                 */  
                scan->rs_syncscan = (allow_sync && synchronize_seqscans);  
        }  
        else if (allow_sync && synchronize_seqscans)  
        {  
                scan->rs_syncscan = true;  
                scan->rs_startblock = ss_get_location(scan->rs_rd, scan->rs_nblocks);  
        }  

NBuffers变量指的是SHARED BUFFER

        /*  
         * We sometimes multiply the number of shared buffers by two without  
         * checking for overflow, so we mustn't allow more than INT_MAX / 2.  
         */  
        {  
                {"shared_buffers", PGC_POSTMASTER, RESOURCES_MEM,  
                        gettext_noop("Sets the number of shared memory buffers used by the server."),  
                        NULL,  
                        GUC_UNIT_BLOCKS  
                },  
                &NBuffers,  
                1024, 16, INT_MAX / 2,  
                NULL, NULL, NULL  
        },  

目前PG设计的几种访问策略

/* Possible arguments for GetAccessStrategy() */  
typedef enum BufferAccessStrategyType  
{  
        BAS_NORMAL,                                     /* Normal random access */  
        BAS_BULKREAD,                           /* Large read-only scan (hint bit updates are  
                                                                 * ok) */  
        BAS_BULKWRITE,                          /* Large multi-block write (e.g. COPY IN) */  
        BAS_VACUUM                                      /* VACUUM */  
} BufferAccessStrategyType;  

获取访问策略

/*  
 * GetAccessStrategy -- create a BufferAccessStrategy object  
 *  
 * The object is allocated in the current memory context.  
 */  
BufferAccessStrategy  
GetAccessStrategy(BufferAccessStrategyType btype)  
{  
        BufferAccessStrategy strategy;  
        int                     ring_size;  
  
        /*  
         * Select ring size to use.  See buffer/README for rationales.  
         *  
         * Note: if you change the ring size for BAS_BULKREAD, see also  
         * SYNC_SCAN_REPORT_INTERVAL in access/heap/syncscan.c.  
         */  
        switch (btype)  
        {  
                case BAS_NORMAL:  
                        /* if someone asks for NORMAL, just give 'em a "default" object */  
                        return NULL;  
  
                case BAS_BULKREAD:  
                        ring_size = 256 * 1024 / BLCKSZ;  
                        break;  
                case BAS_BULKWRITE:  
                        ring_size = 16 * 1024 * 1024 / BLCKSZ;  
                        break;  
                case BAS_VACUUM:  
                        ring_size = 256 * 1024 / BLCKSZ;  
                        break;  
  
                default:  
                        elog(ERROR, "unrecognized buffer access strategy: %d",  
                                 (int) btype);  
                        return NULL;            /* keep compiler quiet */  
        }  
  
        /* Make sure ring isn't an undue fraction of shared buffers */  
        ring_size = Min(NBuffers / 8, ring_size);  
  
        /* Allocate the object and initialize all elements to zeroes */  
        strategy = (BufferAccessStrategy)  
                palloc0(offsetof(BufferAccessStrategyData, buffers) +  
                                ring_size * sizeof(Buffer));  
  
        /* Set fields that don't start out zero */  
        strategy->btype = btype;  
        strategy->ring_size = ring_size;  
  
        return strategy;  
}  

检查BAS_BULKREAD标记

/*  
 * StrategyRejectBuffer -- consider rejecting a dirty buffer  
 *  
 * When a nondefault strategy is used, the buffer manager calls this function  
 * when it turns out that the buffer selected by StrategyGetBuffer needs to  
 * be written out and doing so would require flushing WAL too.  This gives us  
 * a chance to choose a different victim.  
 *  
 * Returns true if buffer manager should ask for a new victim, and false  
 * if this buffer should be written and re-used.  
 */  
bool  
StrategyRejectBuffer(BufferAccessStrategy strategy, BufferDesc *buf)  
{  
        /* We only do this in bulkread mode */  
        if (strategy->btype != BAS_BULKREAD)  
                return false;  
  
        /* Don't muck with behavior of normal buffer-replacement strategy */  
        if (!strategy->current_was_in_ring ||  
                strategy->buffers[strategy->current] != BufferDescriptorGetBuffer(buf))  
                return false;  
  
        /*  
         * Remove the dirty buffer from the ring; necessary to prevent infinite  
         * loop if all ring members are dirty.  
         */  
        strategy->buffers[strategy->current] = InvalidBuffer;  
  
        return true;  
}  

优先干掉BAS_BULKREAD标记的BUFFER

                                        if (XLogNeedsFlush(lsn) &&  
                                                StrategyRejectBuffer(strategy, buf))  
                                        {  
                                                /* Drop lock/pin and loop around for another buffer */  
                                                LWLockRelease(BufferDescriptorGetContentLock(buf));  
                                                UnpinBuffer(buf, true);  
                                                continue;  
                                        }  

参考

src/backend/access/heap/heapam.c

src/backend/access/heap/syncscan.c

src/include/storage/bufmgr.h

src/backend/storage/buffer/freelist.c

src/backend/utils/misc/guc.c

src/backend/storage/buffer/bufmgr.c

相关实践学习
使用PolarDB和ECS搭建门户网站
本场景主要介绍如何基于PolarDB和ECS实现搭建门户网站。
阿里云数据库产品家族及特性
阿里云智能数据库产品团队一直致力于不断健全产品体系,提升产品性能,打磨产品功能,从而帮助客户实现更加极致的弹性能力、具备更强的扩展能力、并利用云设施进一步降低企业成本。以云原生+分布式为核心技术抓手,打造以自研的在线事务型(OLTP)数据库Polar DB和在线分析型(OLAP)数据库Analytic DB为代表的新一代企业级云原生数据库产品体系, 结合NoSQL数据库、数据库生态工具、云原生智能化数据库管控平台,为阿里巴巴经济体以及各个行业的企业客户和开发者提供从公共云到混合云再到私有云的完整解决方案,提供基于云基础设施进行数据从处理、到存储、再到计算与分析的一体化解决方案。本节课带你了解阿里云数据库产品家族及特性。
目录
相关文章
|
SQL 弹性计算 关系型数据库
HTAP数据库 PostgreSQL 场景与性能测试之 3.1 - (OLAP) 大表JOIN统计查询-10亿 join 1亿 agg
标签 PostgreSQL , HTAP , OLTP , OLAP , 场景与性能测试 背景 PostgreSQL是一个历史悠久的数据库,历史可以追溯到1973年,最早由2014计算机图灵奖得主,关系数据库的鼻祖Michael_Stonebraker 操刀设计,PostgreSQL具备与Oracle类似的功能、性能、架构以及稳定性。 PostgreSQL社区的贡献者众多
2205 0
|
监控 关系型数据库 数据库
PostgreSQL的索引优化策略?
【8月更文挑战第26天】PostgreSQL的索引优化策略?
394 1
|
存储 关系型数据库 数据库
PostgreSQL的备份策略
【8月更文挑战第4天】PostgreSQL的备份策略
349 4
|
存储 监控 关系型数据库
PostgreSQL的备份策略是什么?
【8月更文挑战第4天】PostgreSQL的备份策略是什么?
192 7
|
关系型数据库 MySQL 测试技术
【专栏】PostgreSQL数据库向MySQL迁移的过程、挑战及策略
【4月更文挑战第29天】本文探讨了PostgreSQL数据库向MySQL迁移的过程、挑战及策略。迁移步骤包括评估规划、数据导出与转换、创建MySQL数据库、数据导入。挑战包括数据类型不匹配、函数和语法差异、数据完整性和性能问题。应对策略涉及数据类型映射、代码调整、数据校验和性能优化。迁移后需进行数据验证、性能测试和业务验证,确保顺利过渡。在数字化时代,掌握数据库迁移技能对技术人员至关重要。
823 5
|
安全 关系型数据库 数据库
postgresql|数据库|基于本地备份的远程备份策略
postgresql|数据库|基于本地备份的远程备份策略
297 0
|
SQL 存储 Oracle
PostgreSQL 分页, offset, 返回顺序, 扫描方法原理(seqscan, index scan, index only scan, bitmap scan, parallel xx scan),游标
PostgreSQL 分页, offset, 返回顺序, 扫描方法原理(seqscan, index scan, index only scan, bitmap scan, parallel xx scan),游标
3987 0
|
存储 关系型数据库 PostgreSQL
PostgreSQL表扫描方法解析
PostgreSQL表扫描方法解析
304 0
|
存储 SQL Oracle
PostgreSQL扫描方法综述
PostgreSQL扫描方法综述
247 0
|
存储 关系型数据库 数据库
如何在PostgreSQL中更新大表
如何在PostgreSQL中更新大表
421 0

相关产品

  • 云原生数据库 PolarDB
  • 推荐镜像

    更多