PostgreSQL vs Oracle checksum 配置与性能

本文涉及的产品
云数据库 RDS MySQL,集群系列 2核4GB
推荐场景:
搭建个人博客
RDS MySQL Serverless 基础系列,0.5-2RCU 50GB
RDS PostgreSQL Serverless,0.5-4RCU 50GB 3个月
推荐场景:
对影评进行热评分析
简介:

标签

PostgreSQL , Oracle , checksum , IO错误


背景

开启数据库block checksum可以发现磁盘、存储、IO系统的问题引入一些的物理错误。

PostgreSQL checksum

在PostgreSQL中,默认强制对XLOG(WAL)开启了checksum,因此可以保证从redo buffer写入redo file里面的数据是一致的,读取的时候也会根据每个REDO PAGE的checksum检查REDO PAGE的内容是否正确。(换言之没有参数来关闭WAL的checksum)

而对于DATAFILE,是否开启CHECKSUM则取决于initdb时配置的checksum参数。

initdb  
  
  -k, --data-checksums      use data page checksums  

通过pg_controldata检查当前数据库集群是否开启了checksum。

pg_controldata  
  
Data page checksum version:           0  

PostgreSQL 11 允许用户动态的修改checksum的开关,而不是初始化实例时固定:

《PostgreSQL 11 preview - Allow on-line enabling and disabling of data checksums (含pg_verify_checksums工具,离线检查数据文件有误块错误)》

Oracle checksum

Checksums allow Oracle to detect corruption caused by underlying disks, storage systems, or I/O systems.

If this parameter is set to OFF, DBWn calculates checksums only for the SYSTEM tablespace, but not for user tablespaces. In addition, no log checksum is performed when this parameter is set to OFF.

https://docs.oracle.com/cd/B28359_01/server.111/b28320/initparams046.htm#REFRN10030

DB_BLOCK_CHECKSUM determines whether DBWn and the direct loader will calculate a checksum (a number calculated from all the bytes stored in the block) and store it in the cache header of every data block when writing it to disk. Checksums are verified when a block is read - only if this parameter is TYPICAL or FULL and the last write of the block stored a checksum. In FULL mode, Oracle also verifies the checksum before a change application from update/delete statements and recomputes it after the change is applied. In addition, Oracle gives every log block a checksum before writing it to the current log.

Starting with Oracle Database 11g, most of the log block checksum is done by the generating foreground processes, while the LGWR performs the rest of the work, for better CPU and cache efficiency. Prior to Oracle Database 11g, the LGWR solely performed the log block checksum. When this parameter is set to FULL, the LGWR verifies the checksum of each log block generated by the foreground processes before writing it to disk.

If this parameter is set to OFF, DBWn calculates checksums only for the SYSTEM tablespace, but not for user tablespaces. In addition, no log checksum is performed when this parameter is set to OFF.

Checksums allow Oracle to detect corruption caused by underlying disks, storage systems, or I/O systems. If set to FULL, DB_BLOCK_CHECKSUM also catches in-memory corruptions and stops them from making it to the disk. Turning on this feature in TYPICAL mode causes only an additional 1% to 2% overhead. In the FULL mode it causes 4% to 5% overhead. Oracle recommends that you set DB_BLOCK_CHECKSUM to TYPICAL.

For backward compatibility the use of TRUE (implying TYPICAL) and FALSE (implying OFF) values is preserved.

PostgreSQL checksum 行为

1、开启checksum后,PostgreSQL 从shared buffer把数据write出去,需要计算checksum。

2、开启checksum后,从shared buffer外面(disk, os page cache)读取BLOCK到shared buffer里面,需要计算block的checksum,对比存储在page head里头的checksum是否一致。

3、已经在shared buffer里面的block,变更、读取时并不需要计算checksum。

checksum计算逻辑

1、pg_checksum_page

src/include/storage/checksum_impl.h

/*  
 * Compute the checksum for a Postgres page.  The page must be aligned on a  
 * 4-byte boundary.  
 *  
 * The checksum includes the block number (to detect the case where a page is  
 * somehow moved to a different location), the page header (excluding the  
 * checksum itself), and the page data.  
 */  
uint16  
pg_checksum_page(char *page, BlockNumber blkno)  
{  
        PageHeader      phdr = (PageHeader) page;  
        uint16          save_checksum;  
        uint32          checksum;  
  
        /* We only calculate the checksum for properly-initialized pages */  
        Assert(!PageIsNew(page));  
  
        /*  
         * Save pd_checksum and temporarily set it to zero, so that the checksum  
         * calculation isn't affected by the old checksum stored on the page.  
         * Restore it after, because actually updating the checksum is NOT part of  
         * the API of this function.  
         */  
        save_checksum = phdr->pd_checksum;  
        phdr->pd_checksum = 0;  
        checksum = pg_checksum_block(page, BLCKSZ);  
        phdr->pd_checksum = save_checksum;  
  
        /* Mix in the block number to detect transposed pages */  
        checksum ^= blkno;  
  
        /*  
         * Reduce to a uint16 (to fit in the pd_checksum field) with an offset of  
         * one. That avoids checksums of zero, which seems like a good idea.  
         */  
        return (checksum % 65535) + 1;  
}  

2、异步写(bg writer, backend process evict dirty page)时,计算checksum

src/backend/storage/buffer/bufmgr.c

/*  
 * FlushBuffer  
 *              Physically write out a shared buffer.  
 *  
 * NOTE: this actually just passes the buffer contents to the kernel; the  
 * real write to disk won't happen until the kernel feels like it.  This  
 * is okay from our point of view since we can redo the changes from WAL.  
 * However, we will need to force the changes to disk via fsync before  
 * we can checkpoint WAL.  
 *  
 * The caller must hold a pin on the buffer and have share-locked the  
 * buffer contents.  (Note: a share-lock does not prevent updates of  
 * hint bits in the buffer, so the page could change while the write  
 * is in progress, but we assume that that will not invalidate the data  
 * written.)  
 *  
 * If the caller has an smgr reference for the buffer's relation, pass it  
 * as the second parameter.  If not, pass NULL.  
 */  
static void  
FlushBuffer(BufferDesc *buf, SMgrRelation reln)  
  
  
        /*  
         * Update page checksum if desired.  Since we have only shared lock on the  
         * buffer, other processes might be updating hint bits in it, so we must  
         * copy the page to private storage if we do checksumming.  
         */  
        bufToWrite = PageSetChecksumCopy((Page) bufBlock, buf->tag.blockNum);  

3、将数据从shared buffer外围读入shared buffer时,校验checksum

src/backend/storage/page/bufpage.c

/*  
 * PageIsVerified  
 *              Check that the page header and checksum (if any) appear valid.  
 *  
 * This is called when a page has just been read in from disk.  The idea is  
 * to cheaply detect trashed pages before we go nuts following bogus item  
 * pointers, testing invalid transaction identifiers, etc.  
 *  
 * It turns out to be necessary to allow zeroed pages here too.  Even though  
 * this routine is *not* called when deliberately adding a page to a relation,  
 * there are scenarios in which a zeroed page might be found in a table.  
 * (Example: a backend extends a relation, then crashes before it can write  
 * any WAL entry about the new page.  The kernel will already have the  
 * zeroed page in the file, and it will stay that way after restart.)  So we  
 * allow zeroed pages here, and are careful that the page access macros  
 * treat such a page as empty and without free space.  Eventually, VACUUM  
 * will clean up such a page and make it usable.  
 */  
bool  
PageIsVerified(Page page, BlockNumber blkno)  
{  
        PageHeader      p = (PageHeader) page;  
        size_t     *pagebytes;  
        int                     i;  
        bool            checksum_failure = false;  
        bool            header_sane = false;  
        bool            all_zeroes = false;  
        uint16          checksum = 0;  
  
        /*  
         * Don't verify page data unless the page passes basic non-zero test  
         */  
        if (!PageIsNew(page))  
        {  
                if (DataChecksumsEnabled())  
                {  
                        checksum = pg_checksum_page((char *) page, blkno);  
  
                        if (checksum != p->pd_checksum)  
                                checksum_failure = true;  
                }  
  
...................  
  
  
/*  
 * Set checksum for a page in shared buffers.  
 *  
 * If checksums are disabled, or if the page is not initialized, just return  
 * the input.  Otherwise, we must make a copy of the page before calculating  
 * the checksum, to prevent concurrent modifications (e.g. setting hint bits)  
 * from making the final checksum invalid.  It doesn't matter if we include or  
 * exclude hints during the copy, as long as we write a valid page and  
 * associated checksum.  
 *  
 * Returns a pointer to the block-sized data that needs to be written. Uses  
 * statically-allocated memory, so the caller must immediately write the  
 * returned page and not refer to it again.  
 */  
char *  
PageSetChecksumCopy(Page page, BlockNumber blkno)  
{  
        static char *pageCopy = NULL;  
  
        /* If we don't need a checksum, just return the passed-in data */  
        if (PageIsNew(page) || !DataChecksumsEnabled())  
                return (char *) page;  
  
        /*  
         * We allocate the copy space once and use it over on each subsequent  
         * call.  The point of palloc'ing here, rather than having a static char  
         * array, is first to ensure adequate alignment for the checksumming code  
         * and second to avoid wasting space in processes that never call this.  
         */  
        if (pageCopy == NULL)  
                pageCopy = MemoryContextAlloc(TopMemoryContext, BLCKSZ);  
  
        memcpy(pageCopy, (char *) page, BLCKSZ);  
        ((PageHeader) pageCopy)->pd_checksum = pg_checksum_page(pageCopy, blkno);  
        return pageCopy;  
}  
  
/*  
 * Set checksum for a page in private memory.  
 *  
 * This must only be used when we know that no other process can be modifying  
 * the page buffer.  
 */  
void  
PageSetChecksumInplace(Page page, BlockNumber blkno)  
{  
        /* If we don't need a checksum, just return */  
        if (PageIsNew(page) || !DataChecksumsEnabled())  
                return;  
  
        ((PageHeader) page)->pd_checksum = pg_checksum_page((char *) page, blkno);  
}  

性能测试

对比开启checksum, 关闭checksum的性能。

从代码我们已经了解什么时候会需要计算checksum,所以设计一个这个的CASE,数据大量写出,使得bgwrite writeout dirty page频繁。

用到如下CASE

《HTAP数据库 PostgreSQL 场景与性能测试之 42 - (OLTP+OLAP) unlogged table 不含索引多表批量写入》

测试结果

checksum=0

transaction type: ./test.sql  
scaling factor: 1  
query mode: prepared  
number of clients: 56  
number of threads: 56  
duration: 120 s  
number of transactions actually processed: 885490  
latency average = 7.588 ms  
latency stddev = 10.896 ms  
tps = 7376.390493 (including connections establishing)  
tps = 7377.158206 (excluding connections establishing)  
script statistics:  
 - statement latencies in milliseconds:  
         0.002  \set sid random(1,1024)        
         7.586  select ins_sensor(:sid, 1000);  

checksum=1

transaction type: ./test.sql  
scaling factor: 1  
query mode: prepared  
number of clients: 56  
number of threads: 56  
duration: 120 s  
number of transactions actually processed: 867269  
latency average = 7.748 ms  
latency stddev = 20.287 ms  
tps = 7225.742548 (including connections establishing)  
tps = 7226.431737 (excluding connections establishing)  
script statistics:  
 - statement latencies in milliseconds:  
         0.002  \set sid random(1,1024)        
         7.746  select ins_sensor(:sid, 1000);  

开启checksum时,观察perf,可以看到pg_checksum_page占用了少量overhead,体现在性能差异上也非常的小。

   0.64%  postgres               [.] pg_checksum_page    

小结

在bgwriter, backend process write dirty page时,需要计算checksum。(耗费CPU)

在从shared buffer外面读入page时,需要校验checksum。(耗费CPU)

当shared buffer较少,同时产生脏页较快,bg writer 或者backend process 刷脏页较频繁时,可能会使得计算checksum引入一定的CPU消耗。实测极端写出的情况下,pg_checksum_page引入了0.64%左右的开销。

因此开启checksum,实际引入的开销并不大。

参考

1、https://www.postgresql.org/docs/11/static/runtime-config-preset.html

data_checksums (boolean)  

Reports whether data checksums are enabled for this cluster. See data checksums for more information.

2、https://www.postgresql.org/docs/11/static/runtime-config-developer.html

ignore_checksum_failure (boolean)  

Only has effect if data checksums are enabled.

Detection of a checksum failure during a read normally causes PostgreSQL to report an error, aborting the current transaction. Setting ignore_checksum_failure to on causes the system to ignore the failure (but still report a warning), and continue processing. This behavior may cause crashes, propagate or hide corruption, or other serious problems. However, it may allow you to get past the error and retrieve undamaged tuples that might still be present in the table if the block header is still sane. If the header is corrupt an error will be reported even if this option is enabled. The default setting is off, and it can only be changed by a superuser.

3、https://www.postgresql.org/docs/11/static/pageinspect.html#id-1.11.7.31.4

page_header(page bytea) returns record  

page_header shows fields that are common to all PostgreSQL heap and index pages.

A page image obtained with get_raw_page should be passed as argument. For example:

test=# SELECT * FROM page_header(get_raw_page('pg_class', 0));  
    lsn    | checksum | flags  | lower | upper | special | pagesize | version | prune_xid  
-----------+----------+--------+-------+-------+---------+----------+---------+-----------  
 0/24A1B50 |        0 |      1 |   232 |   368 |    8192 |     8192 |       4 |         0  

The returned columns correspond to the fields in the PageHeaderData struct. See src/include/storage/bufpage.h for details.

The checksum field is the checksum stored in the page, which might be incorrect if the page is somehow corrupted. If data checksums are not enabled for this instance, then the value stored is meaningless.

page_checksum(page bytea, blkno int4) returns smallint  

page_checksum computes the checksum for the page, as if it was located at the given block.

A page image obtained with get_raw_page should be passed as argument. For example:

test=# SELECT page_checksum(get_raw_page('pg_class', 0), 0);  
 page_checksum  
---------------  
         13443  

Note that the checksum depends on the block number, so matching block numbers should be passed (except when doing esoteric debugging).

The checksum computed with this function can be compared with the checksum result field of the function page_header. If data checksums are enabled for this instance, then the two values should be equal.

4、https://www.postgresql.org/docs/11/static/pgverifychecksums.html

pg_verify_checksums  
  
pg_verify_checksums — verify data checksums in an offline PostgreSQL database cluster  
  
Synopsis  
pg_verify_checksums [option] [[-D] datadir]  
  
Description  
pg_verify_checksums verifies data checksums in a PostgreSQL cluster.  

5、https://docs.oracle.com/cd/B28359_01/server.111/b28320/initparams046.htm#REFRN10030

DB_BLOCK_CHECKSUM determines whether DBWn and the direct loader will calculate a checksum (a number calculated from all the bytes stored in the block) and store it in the cache header of every data block when writing it to disk. Checksums are verified when a block is read - only if this parameter is TYPICAL or FULL and the last write of the block stored a checksum. In FULL mode, Oracle also verifies the checksum before a change application from update/delete statements and recomputes it after the change is applied. In addition, Oracle gives every log block a checksum before writing it to the current log.

Starting with Oracle Database 11g, most of the log block checksum is done by the generating foreground processes, while the LGWR performs the rest of the work, for better CPU and cache efficiency. Prior to Oracle Database 11g, the LGWR solely performed the log block checksum. When this parameter is set to FULL, the LGWR verifies the checksum of each log block generated by the foreground processes before writing it to disk.

If this parameter is set to OFF, DBWn calculates checksums only for the SYSTEM tablespace, but not for user tablespaces. In addition, no log checksum is performed when this parameter is set to OFF.

Checksums allow Oracle to detect corruption caused by underlying disks, storage systems, or I/O systems. If set to FULL, DB_BLOCK_CHECKSUM also catches in-memory corruptions and stops them from making it to the disk. Turning on this feature in TYPICAL mode causes only an additional 1% to 2% overhead. In the FULL mode it causes 4% to 5% overhead. Oracle recommends that you set DB_BLOCK_CHECKSUM to TYPICAL.

For backward compatibility the use of TRUE (implying TYPICAL) and FALSE (implying OFF) values is preserved.

相关实践学习
使用PolarDB和ECS搭建门户网站
本场景主要介绍基于PolarDB和ECS实现搭建门户网站。
阿里云数据库产品家族及特性
阿里云智能数据库产品团队一直致力于不断健全产品体系,提升产品性能,打磨产品功能,从而帮助客户实现更加极致的弹性能力、具备更强的扩展能力、并利用云设施进一步降低企业成本。以云原生+分布式为核心技术抓手,打造以自研的在线事务型(OLTP)数据库Polar DB和在线分析型(OLAP)数据库Analytic DB为代表的新一代企业级云原生数据库产品体系, 结合NoSQL数据库、数据库生态工具、云原生智能化数据库管控平台,为阿里巴巴经济体以及各个行业的企业客户和开发者提供从公共云到混合云再到私有云的完整解决方案,提供基于云基础设施进行数据从处理、到存储、再到计算与分析的一体化解决方案。本节课带你了解阿里云数据库产品家族及特性。
目录
相关文章
|
7天前
|
存储 关系型数据库 MySQL
MySQL vs. PostgreSQL:选择适合你的开源数据库
在众多开源数据库中,MySQL和PostgreSQL无疑是最受欢迎的两个。它们都有着强大的功能、广泛的社区支持和丰富的生态系统。然而,它们在设计理念、性能特点、功能特性等方面存在着显著的差异。本文将从这三个方面对MySQL和PostgreSQL进行比较,以帮助您选择更适合您需求的开源数据库。
32 4
|
2月前
|
Oracle NoSQL 关系型数据库
主流数据库对比:MySQL、PostgreSQL、Oracle和Redis的优缺点分析
主流数据库对比:MySQL、PostgreSQL、Oracle和Redis的优缺点分析
234 2
|
2月前
|
Ubuntu Oracle 关系型数据库
Oracle VM VirtualBox之Ubuntu 22.04LTS双网卡网络模式配置
这篇文章是关于如何在Oracle VM VirtualBox中配置Ubuntu 22.04LTS虚拟机双网卡网络模式的详细指南,包括VirtualBox网络概述、双网卡网络模式的配置步骤以及Ubuntu系统网络配置。
168 3
|
3月前
|
关系型数据库 MySQL Linux
在Linux中,如何配置数据库服务器(如MySQL或PostgreSQL)?
在Linux中,如何配置数据库服务器(如MySQL或PostgreSQL)?
|
3月前
|
Oracle 关系型数据库 BI
ORACLE Apex: EBS多组织结构 理解与配置
【8月更文挑战第11天】在Oracle Apex中理解和配置与EBS多组织结构相关内容需掌握:1) EBS多组织结构概念及组成部分,如法律实体、业务单位与库存组织;2) Oracle Apex与EBS集成的目的与方式,包括提供友好界面及自定义业务流程;3) 在Apex中配置多组织结构应用,涉及数据访问控制、页面报表设计及业务流程集成。整体而言,需精通EBS架构与Apex开发技术,以实现高效灵活的企业解决方案。
|
3月前
|
关系型数据库 MySQL 数据库
探究数据库开源协议:PostgreSQL vs MySQL
探究数据库开源协议:PostgreSQL vs MySQL
|
4月前
|
SQL Oracle 关系型数据库
关系型数据库Oracle性能问题
【7月更文挑战第15天】
41 4
|
4月前
|
SQL 缓存 Oracle
关系型数据库Oracle性能问题
【7月更文挑战第16天】
54 2
|
4月前
|
安全 关系型数据库 Linux
|
5月前
|
缓存 关系型数据库 数据库
postgresql.conf配置详解
postgresql.conf配置详解

相关产品

  • 云原生数据库 PolarDB
  • 云数据库 RDS PostgreSQL 版
  • 推荐镜像

    更多