PostgreSQL 时间点恢复(PITR)时查找wal record的顺序 - loop(pg_wal, restore_command, stream)

本文涉及的产品
RDS PostgreSQL Serverless,0.5-4RCU 50GB 3个月
推荐场景:
对影评进行热评分析
云数据库 RDS MySQL,集群系列 2核4GB
推荐场景:
搭建个人博客
RDS SQL Server Serverless,2-4RCU 50GB 3个月
推荐场景:
简介:

标签

PostgreSQL , 物理恢复 , startup , wal , restore_command , recovery.conf , stream replication


背景

PostgreSQL recovery时,如何获取需要的wal record呢?

流程

PostgreSQL recovery时,可以从三个地方获取wal record

1、pg_wal 目录

2、recovery.conf中配置的restore_command

3、recovery.conf中配置的stream replication

优先从1开始,如果找不到则返回FALSE,接下来去RESTORE_COMMAND中找,最后是stream。然后再次循环从1开始找。

代码如下

static bool  
  
typedef enum  
{  
        XLOG_FROM_ANY = 0,                      /* request to read WAL from any source */  
        XLOG_FROM_ARCHIVE,                      /* restored using restore_command */  
        XLOG_FROM_PG_WAL,                       /* existing file in pg_wal */  
        XLOG_FROM_STREAM                        /* streamed from master */  
} XLogSource;  

src/backend/access/transam/xlog.c

/*  
 * Open the WAL segment containing WAL location 'RecPtr'.  
 *  
 * The segment can be fetched via restore_command, or via walreceiver having  
 * streamed the record, or it can already be present in pg_wal. Checking  
 * pg_wal is mainly for crash recovery, but it will be polled in standby mode  
 * too, in case someone copies a new segment directly to pg_wal. That is not  
 * documented or recommended, though.  
 *  
 * If 'fetching_ckpt' is true, we're fetching a checkpoint record, and should  
 * prepare to read WAL starting from RedoStartLSN after this.  
 *  
 * 'RecPtr' might not point to the beginning of the record we're interested  
 * in, it might also point to the page or segment header. In that case,  
 * 'tliRecPtr' is the position of the WAL record we're interested in. It is  
 * used to decide which timeline to stream the requested WAL from.  
 *  
 * If the record is not immediately available, the function returns false  
 * if we're not in standby mode. In standby mode, waits for it to become  
 * available.  
 *  
 * When the requested record becomes available, the function opens the file  
 * containing it (if not open already), and returns true. When end of standby  
 * mode is triggered by the user, and there is no more WAL available, returns  
 * false.  
 */  
static bool  
WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,  
                                                        bool fetching_ckpt, XLogRecPtr tliRecPtr)  
{  
        static TimestampTz last_fail_time = 0;  
        TimestampTz now;  
        bool            streaming_reply_sent = false;  
  
        /*-------  
         * Standby mode is implemented by a state machine:  
         *  
         * 1. Read from either archive or pg_wal (XLOG_FROM_ARCHIVE), or just  
         *        pg_wal (XLOG_FROM_PG_WAL)  
         * 2. Check trigger file  
         * 3. Read from primary server via walreceiver (XLOG_FROM_STREAM)  
         * 4. Rescan timelines  
         * 5. Sleep wal_retrieve_retry_interval milliseconds, and loop back to 1.  
         *  
         * Failure to read from the current source advances the state machine to  
         * the next state.  
         *  
         * 'currentSource' indicates the current state. There are no currentSource  
         * values for "check trigger", "rescan timelines", and "sleep" states,  
         * those actions are taken when reading from the previous source fails, as  
         * part of advancing to the next state.  
         *-------  
         */  
...........  
  
  
        for (;;)  
        {  
                int                     oldSource = currentSource;  
  
                /*  
                 * First check if we failed to read from the current source, and  
                 * advance the state machine if so. The failure to read might've  
                 * happened outside this function, e.g when a CRC check fails on a  
                 * record, or within this loop.  
                 */  
                if (lastSourceFailed)  
                {  
                        switch (currentSource)  
                        {  
                                case XLOG_FROM_ARCHIVE:  
                                case XLOG_FROM_PG_WAL:  
  
                                        /*  
                                         * Check to see if the trigger file exists. Note that we  
                                         * do this only after failure, so when you create the  
                                         * trigger file, we still finish replaying as much as we  
                                         * can from archive and pg_wal before failover.  
                                         */  
                                        if (StandbyMode && CheckForStandbyTrigger())  
                                        {  
                                                ShutdownWalRcv();  
                                                return false;  
                                        }  
  
                                        /*  
                                         * Not in standby mode, and we've now tried the archive  
                                         * and pg_wal.  
                                         */  
                                        if (!StandbyMode)  
                                                return false;  
  
                                        /*  
                                         * If primary_conninfo is set, launch walreceiver to try  
                                         * to stream the missing WAL.  
                                         *  
                                         * If fetching_ckpt is true, RecPtr points to the initial  
                                         * checkpoint location. In that case, we use RedoStartLSN  
                                         * as the streaming start position instead of RecPtr, so  
                                         * that when we later jump backwards to start redo at  
                                         * RedoStartLSN, we will have the logs streamed already.  
                                         */  
                                        if (PrimaryConnInfo)  
                                        {  
                                                XLogRecPtr      ptr;  
                                                TimeLineID      tli;  
  
                                                if (fetching_ckpt)  
                                                {  
                                                        ptr = RedoStartLSN;  
                                                        tli = ControlFile->checkPointCopy.ThisTimeLineID;  
                                                }  
                                                else  
                                                {  
                                                        ptr = tliRecPtr;  
                                                        tli = tliOfPointInHistory(tliRecPtr, expectedTLEs);  
  
                                                        if (curFileTLI > 0 && tli < curFileTLI)  
                                                                elog(ERROR, "according to history file, WAL location %X/%X belongs to timeline %u, but previous recovered WAL file came from timeline %u",  
                                                                         (uint32) (ptr >> 32), (uint32) ptr,  
                                                                         tli, curFileTLI);  
                                                }  
                                                curFileTLI = tli;  
                                                RequestXLogStreaming(tli, ptr, PrimaryConnInfo,  
                                                                                         PrimarySlotName);  
                                                receivedUpto = 0;  
                                        }  
  
                                        /*  
                                         * Move to XLOG_FROM_STREAM state in either case. We'll  
                                         * get immediate failure if we didn't launch walreceiver,  
                                         * and move on to the next state.  
                                         */  
                                        currentSource = XLOG_FROM_STREAM;  
                                        break;  
  
                                case XLOG_FROM_STREAM:  
  
                                        /*  
                                         * Failure while streaming. Most likely, we got here  
                                         * because streaming replication was terminated, or  
                                         * promotion was triggered. But we also get here if we  
                                         * find an invalid record in the WAL streamed from master,  
                                         * in which case something is seriously wrong. There's  
                                         * little chance that the problem will just go away, but  
                                         * PANIC is not good for availability either, especially  
                                         * in hot standby mode. So, we treat that the same as  
                                         * disconnection, and retry from archive/pg_wal again. The  
                                         * WAL in the archive should be identical to what was  
                                         * streamed, so it's unlikely that it helps, but one can  
                                         * hope...  
                                         */  
  
                                        /*  
                                         * Before we leave XLOG_FROM_STREAM state, make sure that  
                                         * walreceiver is not active, so that it won't overwrite  
                                         * WAL that we restore from archive.  
                                         */  
                                        if (WalRcvStreaming())  
                                                ShutdownWalRcv();  
  
                                        /*  
                                         * Before we sleep, re-scan for possible new timelines if  
                                         * we were requested to recover to the latest timeline.  
                                         */  
                                        if (recoveryTargetIsLatest)  
  
                                        {  
                                                if (rescanLatestTimeLine())  
                                                {  
                                                        currentSource = XLOG_FROM_ARCHIVE;  
                                                        break;  
                                                }  
                                        }  
  
                                        /*  
                                         * XLOG_FROM_STREAM is the last state in our state  
                                         * machine, so we've exhausted all the options for  
                                         * obtaining the requested WAL. We're going to loop back  
                                         * and retry from the archive, but if it hasn't been long  
                                         * since last attempt, sleep wal_retrieve_retry_interval  
                                         * milliseconds to avoid busy-waiting.  
                                         */  
                                        now = GetCurrentTimestamp();  
                                        if (!TimestampDifferenceExceeds(last_fail_time, now,  
                                                                                                        wal_retrieve_retry_interval))  
                                        {  
                                                long            secs,  
                                                                        wait_time;  
                                                int                     usecs;  
  
                                                TimestampDifference(last_fail_time, now, &secs, &usecs);  
                                                wait_time = wal_retrieve_retry_interval -  
                                                        (secs * 1000 + usecs / 1000);  
  
                                                WaitLatch(&XLogCtl->recoveryWakeupLatch,  
                                                                  WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,  
                                                                  wait_time, WAIT_EVENT_RECOVERY_WAL_STREAM);  
                                                ResetLatch(&XLogCtl->recoveryWakeupLatch);  
                                                now = GetCurrentTimestamp();  
                                        }  
                                        last_fail_time = now;  
                                        currentSource = XLOG_FROM_ARCHIVE;  
                                        break;  
  
                                default:  
                                        elog(ERROR, "unexpected WAL source %d", currentSource);  
                        }  
                }  
                else if (currentSource == XLOG_FROM_PG_WAL)  
                {  
                        /*  
                         * We just successfully read a file in pg_wal. We prefer files in  
                         * the archive over ones in pg_wal, so try the next file again  
                         * from the archive first.  
                         */  
                        if (InArchiveRecovery)  
                                currentSource = XLOG_FROM_ARCHIVE;  
                }  
  
                if (currentSource != oldSource)  
                        elog(DEBUG2, "switched WAL source from %s to %s after %s",  
                                 xlogSourceNames[oldSource], xlogSourceNames[currentSource],  
                                 lastSourceFailed ? "failure" : "success");  
  
                /*  
                 * We've now handled possible failure. Try to read from the chosen  
                 * source.  
                 */  
                lastSourceFailed = false;  
  
                switch (currentSource)  
                {  
                        case XLOG_FROM_ARCHIVE:  
                        case XLOG_FROM_PG_WAL:  
                                /* Close any old file we might have open. */  
                                if (readFile >= 0)  
                                {  
                                        close(readFile);  
                                        readFile = -1;  
                                }  
                                /* Reset curFileTLI if random fetch. */  
                                if (randAccess)  
                                        curFileTLI = 0;  
  
                                /*  
                                 * Try to restore the file from archive, or read an existing  
                                 * file from pg_wal.  
                                 */  
                                readFile = XLogFileReadAnyTLI(readSegNo, DEBUG2,  
                                                                                          currentSource == XLOG_FROM_ARCHIVE ? XLOG_FROM_ANY :  
                                                                                          currentSource);  
                                if (readFile >= 0)  
                                        return true;    /* success! */  
  
                                /*  
                                 * Nope, not found in archive or pg_wal.  
                                 */  
                                lastSourceFailed = true;  
                                break;  
  
                        case XLOG_FROM_STREAM:  
                                {  
                                        bool            havedata;  
  
                                        /*  
                                         * Check if WAL receiver is still active.  
                                         */  
                                        if (!WalRcvStreaming())  
                                        {  
                                                lastSourceFailed = true;  
                                                break;  
                                        }  
  
                                        /*  
                                         * Walreceiver is active, so see if new data has arrived.  
                                         *  
                                         * We only advance XLogReceiptTime when we obtain fresh  
                                         * WAL from walreceiver and observe that we had already  
                                         * processed everything before the most recent "chunk"  
                                         * that it flushed to disk.  In steady state where we are  
                                         * keeping up with the incoming data, XLogReceiptTime will  
                                         * be updated on each cycle. When we are behind,  
                                         * XLogReceiptTime will not advance, so the grace time  
                                         * allotted to conflicting queries will decrease.  
                                         */  
                                        if (RecPtr < receivedUpto)  
                                                havedata = true;  
                                        else  
                                        {  
                                                XLogRecPtr      latestChunkStart;  
  
                                                receivedUpto = GetWalRcvWriteRecPtr(&latestChunkStart, &receiveTLI);  
                                                if (RecPtr < receivedUpto && receiveTLI == curFileTLI)  
                                                {  
                                                        havedata = true;  
                                                        if (latestChunkStart <= RecPtr)  
                                                        {  
                                                                XLogReceiptTime = GetCurrentTimestamp();  
                                                                SetCurrentChunkStartTime(XLogReceiptTime);  
                                                        }  
                                                }  
                                                else  
                                                        havedata = false;  
                                        }  
                                        if (havedata)  
                                        {  
                                                /*  
                                                 * Great, streamed far enough.  Open the file if it's  
                                                 * not open already.  Also read the timeline history  
                                                 * file if we haven't initialized timeline history  
                                                 * yet; it should be streamed over and present in  
                                                 * pg_wal by now.  Use XLOG_FROM_STREAM so that source  
                                                 * info is set correctly and XLogReceiptTime isn't  
                                                 * changed.  
                                                 */  
                                                if (readFile < 0)  
                                                {  
                                                        if (!expectedTLEs)  
                                                                expectedTLEs = readTimeLineHistory(receiveTLI);  
                                                        readFile = XLogFileRead(readSegNo, PANIC,  
                                                                                                        receiveTLI,  
                                                                                                        XLOG_FROM_STREAM, false);  
                                                        Assert(readFile >= 0);  
                                                }  
                                                else  
                                                {  
                                                        /* just make sure source info is correct... */  
                                                        readSource = XLOG_FROM_STREAM;  
                                                        XLogReceiptSource = XLOG_FROM_STREAM;  
                                                        return true;  
                                                }  
                                                break;  
                                        }  
  
                                        /*  
                                         * Data not here yet. Check for trigger, then wait for  
                                         * walreceiver to wake us up when new WAL arrives.  
                                         */  
                                        if (CheckForStandbyTrigger())  
                                        {  
                                                /*  
                                                 * Note that we don't "return false" immediately here.  
                                                 * After being triggered, we still want to replay all  
                                                 * the WAL that was already streamed. It's in pg_wal  
                                                 * now, so we just treat this as a failure, and the  
                                                 * state machine will move on to replay the streamed  
                                                 * WAL from pg_wal, and then recheck the trigger and  
                                                 * exit replay.  
                                                 */  
                                                lastSourceFailed = true;  
                                                break;  
                                        }  
  
                                        /*  
                                         * Since we have replayed everything we have received so  
                                         * far and are about to start waiting for more WAL, let's  
                                         * tell the upstream server our replay location now so  
                                         * that pg_stat_replication doesn't show stale  
                                         * information.  
                                         */  
                                        if (!streaming_reply_sent)  
                                        {  
                                                WalRcvForceReply();  
                                                streaming_reply_sent = true;  
                                        }  
  
                                        /*  
                                         * Wait for more WAL to arrive. Time out after 5 seconds  
                                         * to react to a trigger file promptly.  
                                         */  
                                        WaitLatch(&XLogCtl->recoveryWakeupLatch,  
                                                          WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,  
                                                          5000L, WAIT_EVENT_RECOVERY_WAL_ALL);  
                                        ResetLatch(&XLogCtl->recoveryWakeupLatch);  
                                        break;  
                                }  
  
                        default:  
                                elog(ERROR, "unexpected WAL source %d", currentSource);  
                }  
  
                /*  
                 * This possibly-long loop needs to handle interrupts of startup  
                 * process.  
                 */  
                HandleStartupProcInterrupts();  
        }  
  
        return false;                           /* not reached */  

这里决定XLOG从哪里获取,会在pg_wal目录, resotre_command, stream三个SOURCE之间轮询,失败跳到下一个SOURCE获取WAL。

参考

src/backend/access/transam/xlog.c

recovery.conf

#restore_command = ''           # e.g. 'cp /mnt/server/archivedir/%f %p'  
  
#primary_conninfo = ''          # e.g. 'host=localhost port=5432'  
相关实践学习
使用PolarDB和ECS搭建门户网站
本场景主要介绍基于PolarDB和ECS实现搭建门户网站。
阿里云数据库产品家族及特性
阿里云智能数据库产品团队一直致力于不断健全产品体系,提升产品性能,打磨产品功能,从而帮助客户实现更加极致的弹性能力、具备更强的扩展能力、并利用云设施进一步降低企业成本。以云原生+分布式为核心技术抓手,打造以自研的在线事务型(OLTP)数据库Polar DB和在线分析型(OLAP)数据库Analytic DB为代表的新一代企业级云原生数据库产品体系, 结合NoSQL数据库、数据库生态工具、云原生智能化数据库管控平台,为阿里巴巴经济体以及各个行业的企业客户和开发者提供从公共云到混合云再到私有云的完整解决方案,提供基于云基础设施进行数据从处理、到存储、再到计算与分析的一体化解决方案。本节课带你了解阿里云数据库产品家族及特性。
目录
相关文章
|
6月前
|
存储 Oracle 关系型数据库
postgresql数据库|wal日志的开启以及如何管理
postgresql数据库|wal日志的开启以及如何管理
1138 0
|
6月前
|
SQL 关系型数据库 数据库
实时计算 Flink版产品使用合集之同步PostgreSQL数据时,WAL 日志无限增长,是什么导致的
实时计算Flink版作为一种强大的流处理和批处理统一的计算框架,广泛应用于各种需要实时数据处理和分析的场景。实时计算Flink版通常结合SQL接口、DataStream API、以及与上下游数据源和存储系统的丰富连接器,提供了一套全面的解决方案,以应对各种实时计算需求。其低延迟、高吞吐、容错性强的特点,使其成为众多企业和组织实时数据处理首选的技术平台。以下是实时计算Flink版的一些典型使用合集。
|
Oracle 安全 关系型数据库
如何在openGauss/PostgreSQL手动清理XLOG/WAL 文件?
openGauss/PostgreSQL中的预写式日志WAL(Write Ahead Log),又名Xlog或redo log,相当于oracle的online redo log, 不同的是oracle online redo log是提前创建几组滚动使用,但在opengauss中只需要本配置参数控制WAL日志的周期,数据库会一直的创建并自动清理,但存在一些情况WAL日志未清理导致目录空间耗尽,或目录空间紧张时手动删除wal日志时,比如如何确认在非归档模式下哪些WAL日志文件可以安全删除?
947 0
|
6月前
|
存储 关系型数据库 MySQL
MySQL中的WAL技术
MySQL中的WAL技术
|
存储 算法 安全
[翻译]PostgreSQL中的WAL压缩以及版本15中的改进
[翻译]PostgreSQL中的WAL压缩以及版本15中的改进
209 0
|
SQL 缓存 Oracle
PostgreSQL 14中提升Nested Loop Joins性能的enable_memoize
PostgreSQL 14中提升Nested Loop Joins性能的enable_memoize
215 0
|
关系型数据库 PostgreSQL
PostgreSQL崩溃恢复读取WAL
PostgreSQL崩溃恢复读取WAL
219 0
|
存储 固态存储 Ubuntu
postgresql email列表对NVM WAL BUFFER的讨论
postgresql email列表对NVM WAL BUFFER的讨论
66 0
|
存储 关系型数据库 PostgreSQL
PostgreSQL WAL解析:构建WAL记录准备
PostgreSQL WAL解析:构建WAL记录准备
140 0
|
关系型数据库 分布式数据库 PolarDB
《阿里云产品手册2022-2023 版》——PolarDB for PostgreSQL
《阿里云产品手册2022-2023 版》——PolarDB for PostgreSQL
363 0

相关产品

  • 云原生数据库 PolarDB
  • 云数据库 RDS PostgreSQL 版