hint bits 源码讲解, 为什么PostgreSQL查询语句也可能产生 xlog, 并且可能对buffer有write操作 ?

本文涉及的产品
云数据库 RDS SQL Server,基础系列 2核4GB
RDS MySQL Serverless 基础系列,0.5-2RCU 50GB
RDS SQL Server Serverless,2-4RCU 50GB 3个月
推荐场景:
简介: 本文还可以回答另一个问题,为什么数据库开启wal_hint_log后或者使用initdb -k参数开启checksum后,产生的XLOG变多了。 PostgreSQL 查询是可能产生XLOG的,这需要从PostgreSQL tuple上的hint bits说起。什么是hint bits?你可以

本文还可以回答另外2个问题,
.1. 为什么数据库开启wal_hint_log后或者使用initdb -k参数开启checksum后,产生的XLOG变多了。
.2. 为什么使用pg_rewind修复时间线分歧,需要开启wal_hint_log或者checksum?

pg_rewind.c

static void
sanityChecks(void)
{
...
        /*
         * Target cluster need to use checksums or hint bit wal-logging, this to
         * prevent from data corruption that could occur because of hint bits.
         */
        if (ControlFile_target.data_checksum_version != PG_DATA_CHECKSUM_VERSION &&
                !ControlFile_target.wal_log_hints)
        {
                pg_fatal("target server needs to use either data checksums or \"wal_log_hints = on\"\n");
        }
...

PostgreSQL 查询是可能产生XLOG的,这需要从PostgreSQL tuple上的hint bits说起。
什么是hint bits?你可以参考这个页面:
https://wiki.postgresql.org/wiki/Hint_Bits
hint bits是tuple头部的infomask里的2个BIT。用来表示该tuple的事务状态。

src/include/access/htup_details.h  
#define HEAP_XMIN_COMMITTED             0x0100  /* t_xmin committed 256 */  
#define HEAP_XMIN_INVALID               0x0200  /* t_xmin invalid/aborted 512  */  
#define HEAP_XMAX_COMMITTED             0x0400  /* t_xmax committed 1024  */  
#define HEAP_XMAX_INVALID               0x0800  /* t_xmax invalid/aborted 2048 */  

hint bits含义:
如果XMIN对应的BIT都没有设置,有两种可能,事务未结束,或者事务已结束,但是BIT还未被第一个检查该TUPLE HINT BIT的会话设置。
If neither of the XMIN bits is set, then either:
.1. The creating transaction is still in progress, which you can check by examining the list of running transactions in shared memory;
.2. You are the first one to check since it ended, in which case you need to consult pg_clog to know the transaction's status, and you can update the hint bits if you find out its final state.
XMAX表示被删除的记录,原理一样。
If the tuple has been marked deleted, then similar remarks apply to the XMAX bits.
假设没有hint bits,我们只能从pg_clog中或者PGXACT内存结构中(未结束的或未清除的事务信息内存)得知该tuple对应的事务提交状态,显然如果每条tuple都要查询pg_clog的话,性能一定会很差。
所以为了提升性能,PostgreSQL在tuple的头部t_infomask中通过4个比特位来存储事务的提交状态。从而我们不需要查询pg_clog来获得事务信息。
但是请注意,并不是在事务结束时设置t_infomask的hint bits。而是在后面的DML或者DQL,VACUUM等SQL扫描到对应的TUPLE时,触发SET BITS的操作。
设置hint bits的代码如下:

SetHintBits@src/backend/utils/time/tqual.c  
src/backend/utils/time/tqual.c  
 * tqual.c  
 *        POSTGRES "time qualification" code, ie, tuple visibility rules.  
 *  
 * NOTE: all the HeapTupleSatisfies routines will update the tuple's  
 * "hint" status bits if we see that the inserting or deleting transaction  
 * has now committed or aborted (and it is safe to set the hint bits).  
 * If the hint bits are changed, MarkBufferDirtyHint is called on  
 * the passed-in buffer.  The caller must hold not only a pin, but at least  
 * shared buffer content lock on the buffer containing the tuple.  
 *  
 * NOTE: must check TransactionIdIsInProgress (which looks in PGXACT array)  
 * before TransactionIdDidCommit/TransactionIdDidAbort (which look in  
 * pg_clog).  Otherwise we have a race condition: we might decide that a  
 * just-committed transaction crashed, because none of the tests succeed.  
 * xact.c is careful to record commit/abort in pg_clog before it unsets  
 * MyPgXact->xid in PGXACT array.  That fixes that problem, but it also  
 * means there is a window where TransactionIdIsInProgress and  
 * TransactionIdDidCommit will both return true.  If we check only  
 * TransactionIdDidCommit, we could consider a tuple committed when a  
 * later GetSnapshotData call will still think the originating transaction  
 * is in progress, which leads to application-level inconsistency.  The  
 * upshot is that we gotta check TransactionIdIsInProgress first in all  
 * code paths, except for a few cases where we are looking at  
 * subtransactions of our own main transaction and so there can't be any  
 * race condition.  
 *  
 * Summary of visibility functions:  
 *  
 *       HeapTupleSatisfiesMVCC()  
 *                visible to supplied snapshot, excludes current command  
 *       HeapTupleSatisfiesUpdate()  
 *                visible to instant snapshot, with user-supplied command  
 *                counter and more complex result  
 *       HeapTupleSatisfiesSelf()  
 *                visible to instant snapshot and current command  
 *       HeapTupleSatisfiesDirty()  
 *                like HeapTupleSatisfiesSelf(), but includes open transactions  
 *       HeapTupleSatisfiesVacuum()  
 *                visible to any running transaction, used by VACUUM  
 *       HeapTupleSatisfiesToast()  
 *                visible unless part of interrupted vacuum, used for TOAST  
 *       HeapTupleSatisfiesAny()  
 *                all tuples are visible  
 *  
 * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group  
 * Portions Copyright (c) 1994, Regents of the University of California  
 *  
 * IDENTIFICATION  
 *        src/backend/utils/time/tqual.c  
......  
/*  
 * SetHintBits()  
 *  
 * Set commit/abort hint bits on a tuple, if appropriate at this time.  
 *  
 * It is only safe to set a transaction-committed hint bit if we know the  
 * transaction's commit record has been flushed to disk, or if the table is  
 * temporary or unlogged and will be obliterated by a crash anyway.  We  
 * cannot change the LSN of the page here because we may hold only a share  
 * lock on the buffer, so we can't use the LSN to interlock this; we have to  
 * just refrain from setting the hint bit until some future re-examination  
 * of the tuple.  
 *  
 * We can always set hint bits when marking a transaction aborted.  (Some  
 * code in heapam.c relies on that!)  
 *  
 * Also, if we are cleaning up HEAP_MOVED_IN or HEAP_MOVED_OFF entries, then  
 * we can always set the hint bits, since pre-9.0 VACUUM FULL always used  
 * synchronous commits and didn't move tuples that weren't previously  
 * hinted.  (This is not known by this subroutine, but is applied by its  
 * callers.)  Note: old-style VACUUM FULL is gone, but we have to keep this  
 * module's support for MOVED_OFF/MOVED_IN flag bits for as long as we  
 * support in-place update from pre-9.0 databases.  
 *  
 * Normal commits may be asynchronous, so for those we need to get the LSN  
 * of the transaction and then check whether this is flushed.  
 *  
 * The caller should pass xid as the XID of the transaction to check, or  
 * InvalidTransactionId if no check is needed.  
 */  
static inline void  
SetHintBits(HeapTupleHeader tuple, Buffer buffer,  
                        uint16 infomask, TransactionId xid)  
{  
        if (TransactionIdIsValid(xid))  
        {  
                /* NB: xid must be known committed here! */  
                XLogRecPtr      commitLSN = TransactionIdGetCommitLSN(xid);  // 获取事务对应的commitLSN  
  
                if (XLogNeedsFlush(commitLSN) && BufferIsPermanent(buffer))  // 在设置hint bits前,必须确保事务对应的xlog 已经flush到磁盘,否则可能出现不一致的情况。例如数据恢复时xlog没有,但是CLOG显示已提交。  
                        return;                         /* not flushed yet, so don't set hint */  
        }  
  
        tuple->t_infomask |= infomask;  // 设置hint bits  
        MarkBufferDirtyHint(buffer, true);  // 将buffer标记为dirty,当initdb 打开了checksum或者使用了wal_log_hints时,如果它刚好是checkpoint后的第一个脏页,则写full page到WAL。  
}  

我们可以来验证一下:

> truncate t;  
postgres=# select pg_backend_pid();  
 pg_backend_pid   
----------------  
           5497  
(1 row)  

验证用的stap跟踪脚本 :

[root@digoal ~]# cat trc.stp   
global f_start[999999]  
  
probe process("/opt/pgsql/bin/postgres").function("SetHintBits@/opt/soft_bak/postgresql-9.4.4/src/backend/utils/time/tqual.c").call {   
   f_start[execname(), pid(), tid(), cpu()] = gettimeofday_ms()  
   printf("%s <- time:%d, pp:%s, par:%s\n", thread_indent(-1), gettimeofday_ms(), pp(), $$parms$$)  
   # printf("%s -> time:%d, pp:%s\n", thread_indent(1), f_start[execname(), pid(), tid(), cpu()], pp() )  
}  
  
probe process("/opt/pgsql/bin/postgres").function("SetHintBits@/opt/soft_bak/postgresql-9.4.4/src/backend/utils/time/tqual.c").return {   
  t=gettimeofday_ms()  
  a=execname()  
  b=cpu()  
  c=pid()  
  d=pp()  
  e=tid()  
  if (f_start[a,c,e,b]) {  
  #  printf("%s <- time:%d, pp:%s, par:%s\n", thread_indent(-1), t - f_start[a,c,e,b], d, $$params$$)  
  printf("%s <- time:%d, pp:%s\n", thread_indent(-1), t - f_start[a,c,e,b], d)  
  }  
}  
  
[root@digoal ~]# stap -vp 5 -DMAXSKIPPED=9999999 -DSTP_NO_OVERLOAD -DMAXTRYLOCK=100 ./trc.stp -x 5497  
  
postgres=# insert into t values (1);  
INSERT 0 1  

insert事务结束后,没有跟踪到SetHintBits,因为事务结束时不设置hint bits。

postgres=# select * from t;  
 id   
----  
  1  
(1 row)  

执行select时跟踪到了sethintbits,参数infomask=256,即HEAP_XMIN_COMMITTED 。

71259406 postgres(5497): <- time:1441448520839, pp:process("/opt/pgsql9.4.4/bin/postgres").function("SetHintBits@/opt/soft_bak/postgresql-9.4.4/src/backend/utils/time/tqual.c:110").call, par:tuple={.t_choice={.t_heap={.t_xmin=390734170, .t_xmax=0, .t_field3={.t_cid=0, .t_xvac=0}}, .t_datum={.datum_len_=390734170, .datum_typmod=0, .datum_typeid=0}}, .t_ctid={.ip_blkid={.bi_hi=0, .bi_lo=0}, .ip_posid=1}, .t_infomask2=1, .t_infomask=2048, .t_hoff='\030', .t_bits=""} buffer=3657 infomask=256 xid=390734170  
71259458 postgres(5497): <- time:0, pp:process("/opt/pgsql9.4.4/bin/postgres").function("SetHintBits@/opt/soft_bak/postgresql-9.4.4/src/backend/utils/time/tqual.c:110").return  

再次插入一条记录

postgres=# insert into t values (2);  
INSERT 0 1  

无SetHintBits
更新2条记录,你会发现有一次SetHintBits

postgres=# update t set id=3;  
UPDATE 2  

这次SetHintBits实际上对应的是values(2)的那条记录,而values(1)那条已经在前面的select *被调用了SetHintBits。

5356459357 postgres(5497): <- time:1441453806039, pp:process("/opt/pgsql9.4.4/bin/postgres").function("SetHintBits@/opt/soft_bak/postgresql-9.4.4/src/backend/utils/time/tqual.c:110").call, par:tuple={.t_choice={.t_heap={.t_xmin=390734178, .t_xmax=0, .t_field3={.t_cid=0, .t_xvac=0}}, .t_datum={.datum_len_=390734178, .datum_typmod=0, .datum_typeid=0}}, .t_ctid={.ip_blkid={.bi_hi=0, .bi_lo=0}, .ip_posid=2}, .t_infomask2=1, .t_infomask=2048, .t_hoff='\030', .t_bits=""} buffer=3657 infomask=256 xid=390734178  
5356459410 postgres(5497): <- time:0, pp:process("/opt/pgsql9.4.4/bin/postgres").function("SetHintBits@/opt/soft_bak/postgresql-9.4.4/src/backend/utils/time/tqual.c:110").return  

那么现在实际上有4条记录,2条是老的版本,2条是新的版本。
再次执行select,全表扫描,会扫到所有的tuple,而且这4条TUPLE都是在最后一次事务后没有被设置SetHintBits的。所以会产生4次SetHintBits

postgres=# select * from t;  
 id   
----  
  3  
  3  
(2 rows)  
5464475078 postgres(5497): <- time:1441453914055, pp:process("/opt/pgsql9.4.4/bin/postgres").function("SetHintBits@/opt/soft_bak/postgresql-9.4.4/src/backend/utils/time/tqual.c:110").call, par:tuple={.t_choice={.t_heap={.t_xmin=390734177, .t_xmax=390734179, .t_field3={.t_cid=0, .t_xvac=0}}, .t_datum={.datum_len_=390734177, .datum_typmod=390734179, .datum_typeid=0}}, .t_ctid={.ip_blkid={.bi_hi=0, .bi_lo=0}, .ip_posid=3}, .t_infomask2=16385, .t_infomask=256, .t_hoff='\030', .t_bits=""} buffer=3657 infomask=1024 xid=390734179  
5464475132 postgres(5497): <- time:0, pp:process("/opt/pgsql9.4.4/bin/postgres").function("SetHintBits@/opt/soft_bak/postgresql-9.4.4/src/backend/utils/time/tqual.c:110").return  
5464475156 postgres(5497): <- time:1441453914055, pp:process("/opt/pgsql9.4.4/bin/postgres").function("SetHintBits@/opt/soft_bak/postgresql-9.4.4/src/backend/utils/time/tqual.c:110").call, par:tuple={.t_choice={.t_heap={.t_xmin=390734178, .t_xmax=390734179, .t_field3={.t_cid=0, .t_xvac=0}}, .t_datum={.datum_len_=390734178, .datum_typmod=390734179, .datum_typeid=0}}, .t_ctid={.ip_blkid={.bi_hi=0, .bi_lo=0}, .ip_posid=4}, .t_infomask2=16385, .t_infomask=256, .t_hoff='\030', .t_bits=""} buffer=3657 infomask=1024 xid=390734179  
5464475190 postgres(5497): <- time:0, pp:process("/opt/pgsql9.4.4/bin/postgres").function("SetHintBits@/opt/soft_bak/postgresql-9.4.4/src/backend/utils/time/tqual.c:110").return  

前2条SetHintBits的infomask参数=1024对应HEAP_XMAX_COMMITTED,对应的是老的版本。

5464475210 postgres(5497): <- time:1441453914055, pp:process("/opt/pgsql9.4.4/bin/postgres").function("SetHintBits@/opt/soft_bak/postgresql-9.4.4/src/backend/utils/time/tqual.c:110").call, par:tuple={.t_choice={.t_heap={.t_xmin=390734179, .t_xmax=0, .t_field3={.t_cid=0, .t_xvac=0}}, .t_datum={.datum_len_=390734179, .datum_typmod=0, .datum_typeid=0}}, .t_ctid={.ip_blkid={.bi_hi=0, .bi_lo=0}, .ip_posid=3}, .t_infomask2=32769, .t_infomask=10240, .t_hoff='\030', .t_bits=""} buffer=3657 infomask=256 xid=390734179  
5464475243 postgres(5497): <- time:0, pp:process("/opt/pgsql9.4.4/bin/postgres").function("SetHintBits@/opt/soft_bak/postgresql-9.4.4/src/backend/utils/time/tqual.c:110").return  
5464475263 postgres(5497): <- time:1441453914055, pp:process("/opt/pgsql9.4.4/bin/postgres").function("SetHintBits@/opt/soft_bak/postgresql-9.4.4/src/backend/utils/time/tqual.c:110").call, par:tuple={.t_choice={.t_heap={.t_xmin=390734179, .t_xmax=0, .t_field3={.t_cid=0, .t_xvac=0}}, .t_datum={.datum_len_=390734179, .datum_typmod=0, .datum_typeid=0}}, .t_ctid={.ip_blkid={.bi_hi=0, .bi_lo=0}, .ip_posid=4}, .t_infomask2=32769, .t_infomask=10240, .t_hoff='\030', .t_bits=""} buffer=3657 infomask=256 xid=390734179  
5464475294 postgres(5497): <- time:0, pp:process("/opt/pgsql9.4.4/bin/postgres").function("SetHintBits@/opt/soft_bak/postgresql-9.4.4/src/backend/utils/time/tqual.c:110").return  

后2条SetHintBits的infomask参数=256对应HEAP_XMIN_COMMITTED,对应的是新的版本。

接下来我们看看SetHintBits是不是产生了xlog,需要关注SetHintBits里的这个调用:

MarkBufferDirtyHint(buffer, true)@src/backend/storage/buffer/bufmgr.c  
/*  
 * MarkBufferDirtyHint  
 *  
 *    Mark a buffer dirty for non-critical changes.  
 *  
 * This is essentially the same as MarkBufferDirty, except:  
 *  
 * 1. The caller does not write WAL; so if checksums are enabled, we may need  
 *      to write an XLOG_HINT WAL record to protect against torn pages.  
 * 2. The caller might have only share-lock instead of exclusive-lock on the  
 *      buffer's content lock.  
 * 3. This function does not guarantee that the buffer is always marked dirty  
 *      (due to a race condition), so it cannot be used for important changes.  
 */  
void  
MarkBufferDirtyHint(Buffer buffer, bool buffer_std)  
......  

仅仅当设置了wal_log_hints或初始化数据库initdb -k打开了checksum时,
它会调用XLogSaveBufferForHint来写WAL,当然不是所有的set hint都会写WAL,只有当checkpoint后,如果这个数据块是第一次被修改才需要写。

        if (XLogHintBitIsNeeded() && (bufHdr->flags & BM_PERMANENT))  
        {  
...  
            MyPgXact->delayChkpt = delayChkpt = true;  
            lsn = XLogSaveBufferForHint(buffer, buffer_std);  
......  

什么情况下的set hint bits操作需要写WAL?

src/include/access/xlog.h  
#define XLogHintBitIsNeeded() (DataChecksumsEnabled() || wal_log_hints)  
src/backend/access/transam/xlog.c  
/*  
 * Are checksums enabled for data pages?  
 */  
bool  
DataChecksumsEnabled(void)  
{  
    Assert(ControlFile != NULL);  
    return (ControlFile->data_checksum_version > 0);  
}  
......  
XLogSaveBufferForHint(Buffer buffer, bool buffer_std)@src/backend/access/transam/xlog.c  
/*  
 * Write a backup block if needed when we are setting a hint. Note that  
 * this may be called for a variety of page types, not just heaps.  
 *  
 * Callable while holding just share lock on the buffer content.  
 *  
 * We can't use the plain backup block mechanism since that relies on the  
 * Buffer being exclusively locked. Since some modifications (setting LSN, hint  
 * bits) are allowed in a sharelocked buffer that can lead to wal checksum  
 * failures. So instead we copy the page and insert the copied data as normal  
 * record data.  
 *  
 * We only need to do something if page has not yet been full page written in  
 * this checkpoint round. The LSN of the inserted wal record is returned if we  
 * had to write, InvalidXLogRecPtr otherwise.  
 *  
 * It is possible that multiple concurrent backends could attempt to write WAL  
 * records. In that case, multiple copies of the same block would be recorded  
 * in separate WAL records by different backends, though that is still OK from  
 * a correctness perspective.  // 可能写多次哦  
 */  
XLogRecPtr  
XLogSaveBufferForHint(Buffer buffer, bool buffer_std)  
{  

通过XLogCheckBuffer来判断是否需要写XLOG。

/*  
 * Determine whether the buffer referenced by an XLogRecData item has to  
 * be backed up, and if so fill a BkpBlock struct for it.  In any case  
 * save the buffer's LSN at *lsn.  
 */  
static bool  
XLogCheckBuffer(XLogRecData *rdata, bool holdsExclusiveLock,  
                                XLogRecPtr *lsn, BkpBlock *bkpb)  
{  

验证:

[root@digoal ~]# cat trc.stp   
global f_start[999999]  
  
probe process("/opt/pgsql/bin/postgres").function("XLogCheckBuffer@/opt/soft_bak/postgresql-9.4.4/src/backend/access/transam/xlog.c").call {   
   f_start[execname(), pid(), tid(), cpu()] = gettimeofday_ms()  
   # printf("%s <- time:%d, pp:%s, par:%s\n", thread_indent(-1), gettimeofday_ms(), pp(), $$parms$$)  
   printf("%s -> time:%d, pp:%s\n", thread_indent(1), f_start[execname(), pid(), tid(), cpu()], pp() )  
}  
  
  
  
probe process("/opt/pgsql/bin/postgres").function("XLogCheckBuffer@/opt/soft_bak/postgresql-9.4.4/src/backend/access/transam/xlog.c").return {   
  t=gettimeofday_ms()  
  a=execname()  
  b=cpu()  
  c=pid()  
  d=pp()  
  e=tid()  
  if (f_start[a,c,e,b]) {  
  printf("%s <- time:%d, pp:%s, par:%s\n", thread_indent(-1), t - f_start[a,c,e,b], d, $return$$)  
  # printf("%s <- time:%d, pp:%s\n", thread_indent(-1), t - f_start[a,c,e,b], d)  
  }  
}  

观察检查点后的没有发生full page write的数据块上的sethintbits操作:

postgres=# update t set id=4;  
UPDATE 2  
postgres=# checkpoint;  
CHECKPOINT  
postgres=# select * from t;  
 id   
----  
  4  
  4  
(2 rows)  

跟踪到XLogCheckBuffer的返回结果为true,因为这个块是在checkpoint后第一次被修改,所以还没有FPW,因此这次SETHINTBITs需要写XLOG。

     0 postgres(5497): -> time:1441457600685, pp:process("/opt/pgsql9.4.4/bin/postgres").function("XLogCheckBuffer@/opt/soft_bak/postgresql-9.4.4/src/backend/access/transam/xlog.c:2031").call  
    30 postgres(5497): <- time:0, pp:process("/opt/pgsql9.4.4/bin/postgres").function("XLogCheckBuffer@/opt/soft_bak/postgresql-9.4.4/src/backend/access/transam/xlog.c:2031").return, par:'\001'  

再次更新

postgres=# update t set id=5;  
UPDATE 2  

跟踪到XLogCheckBuffer返回FALSE,因为这个数据块在CHECKPOINT后,已经调用了FPW,所以这次SETHINTBIT不需要写XLOG。

     0 postgres(5497): -> time:1441457627431, pp:process("/opt/pgsql9.4.4/bin/postgres").function("XLogCheckBuffer@/opt/soft_bak/postgresql-9.4.4/src/backend/access/transam/xlog.c:2031").call  
    27 postgres(5497): <- time:0, pp:process("/opt/pgsql9.4.4/bin/postgres").function("XLogCheckBuffer@/opt/soft_bak/postgresql-9.4.4/src/backend/access/transam/xlog.c:2031").return, par:'\000'  
     0 postgres(5497): -> time:1441457627431, pp:process("/opt/pgsql9.4.4/bin/postgres").function("XLogCheckBuffer@/opt/soft_bak/postgresql-9.4.4/src/backend/access/transam/xlog.c:2031").call  
    20 postgres(5497): <- time:0, pp:process("/opt/pgsql9.4.4/bin/postgres").function("XLogCheckBuffer@/opt/soft_bak/postgresql-9.4.4/src/backend/access/transam/xlog.c:2031").return, par:'\000'  

因此我们可以认为,当发生SetHintBits时,并不是每次都需要写XLOG,仅仅当开启了wal_log_hints或者initdb时开启了checksum,并且这个被sethintbits的数据块是在checkpoint后第一次被修改(改 bit也算一种修改。),这种情况下才需要写full page到WAL。其他情况不会写WAL。
同时,还有一种可能性,因为SetHintBits是针对单条TUPLE的,所以当有并行的会话在对一个PAGE的多个TUPLE进行SetHintBits操作时,可能导致这个PAGE在多次checkpoint时被写多次到WAL。或者在2个checkpoint之间,多次被bgwriter刷到OS dirty page,可能造成多次OS IO。

用gdb可以跟踪到XLogSaveBufferForHint 调用recptr = XLogInsert(RM_XLOG_ID, XLOG_FPI, rdata)来写XLOG。

其他例子,当发生行级别锁冲突时,等待锁的会话在获得锁后,也会对修改的TUPLE设置HINT BITS。
例如两个update语句更新同一条记录时,后获得锁的事务会在先获得锁的事务提交后,对它所修改的记录设置hint bits。

/*  
 * UpdateXmaxHintBits - update tuple hint bits after xmax transaction ends  
 *  
 * This is called after we have waited for the XMAX transaction to terminate.  
 * If the transaction aborted, we guarantee the XMAX_INVALID hint bit will  
 * be set on exit.  If the transaction committed, we set the XMAX_COMMITTED  
 * hint bit if possible --- but beware that that may not yet be possible,  
 * if the transaction committed asynchronously.  
 *  
 * Note that if the transaction was a locker only, we set HEAP_XMAX_INVALID  
 * even if it commits.  
 *  
 * Hence callers should look only at XMAX_INVALID.  
 *  
 * Note this is not allowed for tuples whose xmax is a multixact.  
 */  
static void  
UpdateXmaxHintBits(HeapTupleHeader tuple, Buffer buffer, TransactionId xid)  
{  
        Assert(TransactionIdEquals(HeapTupleHeaderGetRawXmax(tuple), xid));  
        Assert(!(tuple->t_infomask & HEAP_XMAX_IS_MULTI));  
  
        if (!(tuple->t_infomask & (HEAP_XMAX_COMMITTED | HEAP_XMAX_INVALID)))  
        {  
                if (!HEAP_XMAX_IS_LOCKED_ONLY(tuple->t_infomask) &&  
                        TransactionIdDidCommit(xid))  
                        HeapTupleSetHintBits(tuple, buffer, HEAP_XMAX_COMMITTED,  
                                                                 xid);  
                else  
                        HeapTupleSetHintBits(tuple, buffer, HEAP_XMAX_INVALID,  
                                                                 InvalidTransactionId);  
        }  
}  

[参考]
https://wiki.postgresql.org/wiki/Hint_Bits
src/include/access/htup_details.h
src/backend/utils/time/tqual.c
src/backend/storage/buffer/bufmgr.c
src/include/access/xlog.h
src/backend/access/transam/xlog.c

相关实践学习
使用PolarDB和ECS搭建门户网站
本场景主要介绍基于PolarDB和ECS实现搭建门户网站。
阿里云数据库产品家族及特性
阿里云智能数据库产品团队一直致力于不断健全产品体系,提升产品性能,打磨产品功能,从而帮助客户实现更加极致的弹性能力、具备更强的扩展能力、并利用云设施进一步降低企业成本。以云原生+分布式为核心技术抓手,打造以自研的在线事务型(OLTP)数据库Polar DB和在线分析型(OLAP)数据库Analytic DB为代表的新一代企业级云原生数据库产品体系, 结合NoSQL数据库、数据库生态工具、云原生智能化数据库管控平台,为阿里巴巴经济体以及各个行业的企业客户和开发者提供从公共云到混合云再到私有云的完整解决方案,提供基于云基础设施进行数据从处理、到存储、再到计算与分析的一体化解决方案。本节课带你了解阿里云数据库产品家族及特性。
目录
相关文章
|
1月前
|
SQL 关系型数据库 数据库
PostgreSQL性能飙升的秘密:这几个调优技巧让你的数据库查询速度翻倍!
【10月更文挑战第25天】本文介绍了几种有效提升 PostgreSQL 数据库查询效率的方法,包括索引优化、查询优化、配置优化和硬件优化。通过合理设计索引、编写高效 SQL 查询、调整配置参数和选择合适硬件,可以显著提高数据库性能。
360 1
|
7月前
|
SQL 关系型数据库 数据库
实时计算 Flink版操作报错之使用SQL 将 PostgreSQL 的 date 类型字段转换为 TIMESTAMP 类型时遇到报错,该如何处理
在使用实时计算Flink版过程中,可能会遇到各种错误,了解这些错误的原因及解决方法对于高效排错至关重要。针对具体问题,查看Flink的日志是关键,它们通常会提供更详细的错误信息和堆栈跟踪,有助于定位问题。此外,Flink社区文档和官方论坛也是寻求帮助的好去处。以下是一些常见的操作报错及其可能的原因与解决策略。
|
6月前
|
消息中间件 Java 关系型数据库
实时计算 Flink版操作报错合集之从 PostgreSQL 读取数据并写入 Kafka 时,遇到 "initial slot snapshot too large" 的错误,该怎么办
在使用实时计算Flink版过程中,可能会遇到各种错误,了解这些错误的原因及解决方法对于高效排错至关重要。针对具体问题,查看Flink的日志是关键,它们通常会提供更详细的错误信息和堆栈跟踪,有助于定位问题。此外,Flink社区文档和官方论坛也是寻求帮助的好去处。以下是一些常见的操作报错及其可能的原因与解决策略。
1022 0
|
4月前
|
缓存 关系型数据库 数据库
PostgreSQL 查询性能
【8月更文挑战第5天】PostgreSQL 查询性能
88 8
|
5月前
|
分布式计算 DataWorks 关系型数据库
DataWorks操作报错合集之使用连接串模式新增PostgreSQL数据源时遇到了报错"not support data sync channel, error code: 0001",该怎么办
DataWorks是阿里云提供的一站式大数据开发与治理平台,支持数据集成、数据开发、数据服务、数据质量管理、数据安全管理等全流程数据处理。在使用DataWorks过程中,可能会遇到各种操作报错。以下是一些常见的报错情况及其可能的原因和解决方法。
|
5月前
|
SQL 监控 关系型数据库
实时计算 Flink版操作报错合集之在设置监控PostgreSQL数据库时,将wal_level设置为logical,出现一些表更新和删除操作报错,怎么办
在使用实时计算Flink版过程中,可能会遇到各种错误,了解这些错误的原因及解决方法对于高效排错至关重要。针对具体问题,查看Flink的日志是关键,它们通常会提供更详细的错误信息和堆栈跟踪,有助于定位问题。此外,Flink社区文档和官方论坛也是寻求帮助的好去处。以下是一些常见的操作报错及其可能的原因与解决策略。
|
5月前
|
关系型数据库 MySQL 数据库
实时计算 Flink版操作报错合集之在处理PostgreSQL数据库遇到报错。该如何解决
在使用实时计算Flink版过程中,可能会遇到各种错误,了解这些错误的原因及解决方法对于高效排错至关重要。针对具体问题,查看Flink的日志是关键,它们通常会提供更详细的错误信息和堆栈跟踪,有助于定位问题。此外,Flink社区文档和官方论坛也是寻求帮助的好去处。以下是一些常见的操作报错及其可能的原因与解决策略。
|
6月前
|
SQL 关系型数据库 数据库
Python执行PostgreSQL数据库查询语句,并打印查询结果
本文介绍了如何使用Python连接和查询PostgreSQL数据库。首先,确保安装了`psycopg2`库,然后创建数据库连接函数。接着,展示如何编写SQL查询并执行,例如从`employees`表中选取所有记录。此外,还讨论了处理查询结果、格式化输出和异常处理的方法。最后,提到了参数化查询和事务处理以增强安全性及确保数据一致性。
Python执行PostgreSQL数据库查询语句,并打印查询结果
|
5月前
|
Java 关系型数据库 API
使用Spring Boot和PostgreSQL构建高级查询
使用Spring Boot和PostgreSQL构建高级查询
|
6月前
|
SQL 关系型数据库 数据库
Python查询PostgreSQL数据库
木头左教你如何用Python连接PostgreSQL数据库:安装`psycopg2`库,建立连接,执行SQL脚本如创建表、插入数据,同时掌握错误处理和事务管理。别忘了性能优化,利用索引、批量操作提升效率。下期更精彩!💡 csvfile
Python查询PostgreSQL数据库

相关产品

  • 云原生数据库 PolarDB
  • 云数据库 RDS PostgreSQL 版