PostgreSQL PITR 任意时间点恢复过程中如何手工得到recovery需要的下一个WAL文件名 - 默认情况下restore_command自动获取

本文涉及的产品
云原生数据库 PolarDB MySQL 版,Serverless 5000PCU 100GB
简介: 标签 PostgreSQL , recovery , recovery.conf , restore_command , timeline , 时间线 , next wal , PITR , 时间点恢复 背景 PostgreSQL数据库支持PITR时间点恢复。默认情况下,只需要配置目标是时间点,resotre_command即可,PG会自动调用resotre_command去找需要的WA

标签

PostgreSQL , recovery , recovery.conf , restore_command , timeline , 时间线 , next wal , PITR , 时间点恢复


背景

PostgreSQL数据库支持PITR时间点恢复。默认情况下,只需要配置目标是时间点,resotre_command即可,PG会自动调用resotre_command去找需要的WAL文件。

一个典型的recovery.conf配置如下:

#---------------------------------------------------------------------------  
# ARCHIVE RECOVERY PARAMETERS  
#---------------------------------------------------------------------------  
#  
# restore_command  
#  
# specifies the shell command that is executed to copy log files  
# back from archival storage.  The command string may contain %f,  
# which is replaced by the name of the desired log file, and %p,  
# which is replaced by the absolute path to copy the log file to.  
#  
# This parameter is *required* for an archive recovery, but optional  
# for streaming replication.  
#  
# It is important that the command return nonzero exit status on failure.  
# The command *will* be asked for log files that are not present in the  
# archive; it must return nonzero when so asked.  
#  
# NOTE that the basename of %p will be different from %f; do not  
# expect them to be interchangeable.  
#  
restore_command = 'cp /data01/digoal/wal/%f %p'  
  
#---------------------------------------------------------------------------  
# RECOVERY TARGET PARAMETERS  
#---------------------------------------------------------------------------  
#  
# By default, recovery will rollforward to the end of the WAL log.  
# If you want to stop rollforward at a specific point, you  
# must set a recovery target.  
#  
# You may set a recovery target either by transactionId, by name, by  
# timestamp or by WAL location (LSN). Recovery may either include or  
# exclude the transaction(s) with the recovery target value (i.e.,  
# stop either just after or just before the given target,  
# respectively).  
#  
#  
#recovery_target_name = ''      # e.g. 'daily backup 2011-01-26'  
#  
recovery_target_time = '2019-03-05 20:52:16.294366+08'      # e.g. '2004-07-14 22:39:00 EST'  
#  
#recovery_target_xid = ''  
#  
#recovery_target_lsn = ''       # e.g. '0/70006B8'  
#  
#recovery_target_inclusive = true  
  
recovery_target_timeline = 'latest'  
  
# If recovery_target_action = 'pause', recovery will pause when the  
# recovery target is reached. The pause state will continue until  
# pg_wal_replay_resume() is called. This setting has no effect if  
# no recovery target is set. If hot_standby is not enabled then the  
# server will shutdown instead, though you may request this in  
# any case by specifying 'shutdown'.  
#  
#recovery_target_action = 'pause'  
  
#---------------------------------------------------------------------------  
# STANDBY SERVER PARAMETERS  
#---------------------------------------------------------------------------  
#  
# standby_mode  
#  
# When standby_mode is enabled, the PostgreSQL server will work as a  
# standby. It will continuously wait for the additional XLOG records, using  
# restore_command and/or primary_conninfo.  
#  
standby_mode = on  

恢复目标支持:

1、时间

2、自定义还原点名字

3、事务ID

4、WAL LSN

《PostgreSQL PITR THREE recovery target MODE: name,xid,time USE CASE - 2》

《PostgreSQL PITR THREE recovery target MODE: name,xid,time USE CASE - 1》

《PostgreSQL recovery target introduce》

接下来的问题,如果无法直接通过restore_command获取文件,又当如何呢?

restore_command = 'cp /data01/digoal/wal/%f %p'  

方法1,通过restore_command吐出需要的文件名

recovery.conf  
  
restore_command = 'cp /data01/digoal/wal/%f %p || echo "`date +%F%T` %f" >> /tmp/needwalfile;'  

当找不到WAL文件时,就会吐到/tmp/needwalfile

cat /tmp/needwalfile  
  
2019-03-0522:11:28 000000010000005D000000B2  
2019-03-0522:11:28 00000002.history  
2019-03-0522:11:33 000000010000005D000000B2  
2019-03-0522:11:33 00000002.history  
2019-03-0522:11:38 000000010000005D000000B2  
2019-03-0522:11:38 00000002.history  

将文件拷贝到restore_command配置的/data01/digoal/wal目录,restore_command命令将继续。

优先拷贝history文件(走到新的时间线),原理参考末尾引用文档。

通过pg_is_wal_replay_paused()函数得到当前实例是否已经到达目标还原点。如果返回T,表示已到达,则不再需要给PG新的文件。

postgres=# select pg_is_wal_replay_paused();  
 pg_is_wal_replay_paused   
-------------------------  
 f  
(1 row)  

方法2,通过log文件得到需要的WAL文件名

配置PG的LOG文件,一样能得到上面的内容。

postgresql.conf  
  
# - Where to Log -  
  
log_destination = 'csvlog'              # Valid values are combinations of  
                                        # stderr, csvlog, syslog, and eventlog,  
                                        # depending on platform.  csvlog  
                                        # requires logging_collector to be on.  
  
# This is used when logging to stderr:  
logging_collector = on          # Enable capturing of stderr and csvlog  
                                        # into log files. Required to be on for  
                                        # csvlogs.  
                                        # (change requires restart)  
  
# These are only used if logging_collector is on:  
log_directory = 'log'                   # directory where log files are written,  
                                        # can be absolute or relative to PGDATA  
#log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log'        # log file name pattern,  
log_filename='pg.log'  
                                        # can include strftime() escapes  
#log_file_mode = 0600                   # creation mode for log files,  
                                        # begin with 0 to use octal notation  
log_truncate_on_rotation = on           # If on, an existing log file with the  
                                        # same name as the new log file will be  
                                        # truncated rather than appended to.  
                                        # But such truncation only occurs on  
                                        # time-driven rotation, not on restarts  
                                        # or size-driven rotation.  Default is  
                                        # off, meaning append to existing files  
                                        # in all cases.  
log_rotation_age = 1d                   # Automatic rotation of logfiles will  
                                        # happen after that time.  0 disables.  
##log_rotation_size = 10MB              # Automatic rotation of logfiles will  
                                        # happen after that much log output.  
                                        # 0 disables.  

当找不到WAL文件时,就会吐到$PGDATA/log/pg.log

digoal@pg11-test-> cat pg.log  
  
2019-03-05 22:14:00.167 CST [38155] LOG:  00000: ending log output to stderr  
2019-03-05 22:14:00.167 CST [38155] HINT:  Future log output will go to log destination "csvlog".  
2019-03-05 22:14:00.167 CST [38155] LOCATION:  PostmasterMain, postmaster.c:1298  
cp: cannot stat ‘/data01/digoal/wal/00000002.history’: No such file or directory  
cp: cannot stat ‘/data01/digoal/wal/000000010000005D000000B2’: No such file or directory  
cp: cannot stat ‘/data01/digoal/wal/00000002.history’: No such file or directory  
cp: cannot stat ‘/data01/digoal/wal/000000010000005D000000B2’: No such file or directory  
cp: cannot stat ‘/data01/digoal/wal/00000002.history’: No such file or directory  
cp: cannot stat ‘/data01/digoal/wal/000000010000005D000000B2’: No such file or directory  
cp: cannot stat ‘/data01/digoal/wal/00000002.history’: No such file or directory  

将文件拷贝到restore_command配置的/data01/digoal/wal目录,restore_command命令将继续。

优先拷贝history文件(走到新的时间线),原理参考末尾引用文档。

通过pg_is_wal_replay_paused()函数得到当前实例是否已经到达目标还原点。如果返回T,表示已到达,则不再需要给PG新的文件。

postgres=# select pg_is_wal_replay_paused();  
 pg_is_wal_replay_paused   
-------------------------  
 f  
(1 row)  

方法3,修改内核,通过UDF支持

例如,直接从UDF中获取当前startup进程需要的WAL文件名和TL history文件名。

src/backend/access/transam/xlog.c

UDF支持的弊端:当数据库还没有进入一致状态时,并不能连接到数据库执行查询,另外如果没有开启hot_standby模式,也不能连到恢复中的从库进行查询。使用场景受限。

开启数据库的hot_standby模式,确保可以在恢复过程中,连接到数据库进行UDF查询。

# These settings are ignored on a master server.  
  
hot_standby = on                        # "off" disallows queries during recovery  
                                        # (change requires restart)  
#max_standby_archive_delay = 30s        # max delay before canceling queries  
                                        # when reading WAL from archive;  
                                        # -1 allows indefinite delay  
#max_standby_streaming_delay = 30s      # max delay before canceling queries  
                                        # when reading streaming WAL;  
                                        # -1 allows indefinite delay  
wal_receiver_status_interval = 1s       # send replies at least this often  
                                        # 0 disables  
#hot_standby_feedback = off             # send info from standby to prevent  
                                        # query conflicts  
#wal_receiver_timeout = 60s             # time that receiver waits for  
                                        # communication from master  
                                        # in milliseconds; 0 disables  
#wal_retrieve_retry_interval = 5s       # time to wait before retrying to  
                                        # retrieve WAL after a failed attempt  

其他知识点

时间点恢复,如何取下一个文件。

将数据库恢复配置为hot_standby模式,允许在数据库恢复过程中,连接到数据库。获取需要的信息。

1、当返回database is in startup mode,表示无法连接数据库时,说明还需要日志文件,数据库才能到一致性点允许连接,此时,除了从前面说的LOG文件中获得需要的文件,实际上进程也会突出对应的内容。

例如

digoal   25596 25594  0 19:54 ?        00:00:00 postgres: startup   recovering 000000010000005D000000A7  
  
digoal    20   0 16.684g   1860   1276 S   0.0  0.0   0:02.59 postgres: startup   waiting for 000000010000005D000000B2    

日志的内容如下

cp: cannot stat ‘/data01/digoal/waltest/000000010000005D000000A7’: No such file or directory  
cp: cannot stat ‘/data01/digoal/waltest/00000002.history’: No such file or directory  

当我们将需要的归档拷贝到对应目录后,

digoal@pg11-test-> cp wal/000000010000005D000000A7 waltest/  

当我们将需要的归档拷贝到对应目录后,需要的WAL文件向前推移,日志的内容如下

cp: cannot stat ‘/data01/digoal/waltest/000000010000005D000000A8’: No such file or directory  
cp: cannot stat ‘/data01/digoal/waltest/00000002.history’: No such file or directory  

2、当可以连接恢复中的数据库后,可以通过一些系统函数,查看到数据库的一些信息

2.1、查看当前数据库正在replay 的wal LSN

postgres=# select pg_last_wal_replay_lsn();  
 pg_last_wal_replay_lsn   
------------------------  
 5D/A7FFFFE0  
(1 row)    

2.2、查看当前数据库的恢复是否pause,(如果是自动pause的,说明已经到达设置的还原点)

postgres=# select pg_is_wal_replay_paused();  
 pg_is_wal_replay_paused   
-------------------------  
 f  
(1 row)  

2.3、查看lsn对应的wal文件,不允许在standby实例中执行,如果能执行的话,可以直接从当前数据库正在replay 的wal LSN得到WAL文件名。

postgres=# select * from pg_walfile_name('5D/A7FFFFE0');  
ERROR:  55000: recovery is in progress  
HINT:  pg_walfile_name() cannot be executed during recovery.  
LOCATION:  pg_walfile_name, xlogfuncs.c:521  

2.4、当前数据库的时间线 (history)

postgres=# select * from pg_control_checkpoint();  
-[ RECORD 1 ]--------+-------------------------  
checkpoint_lsn       | 5D/A7000028  
redo_lsn             | 5D/A7000028  
redo_wal_file        | 000000010000005D000000A7  
timeline_id          | 1  
prev_timeline_id     | 1  
full_page_writes     | t  
next_xid             | 0:1286297007  
next_oid             | 1912406  
next_multixact_id    | 1  
next_multi_offset    | 0  
oldest_xid           | 101420357  
oldest_xid_dbid      | 13285  
oldest_active_xid    | 0  
oldest_multi_xid     | 1  
oldest_multi_dbid    | 1910618  
oldest_commit_ts_xid | 0  
newest_commit_ts_xid | 0  
checkpoint_time      | 2019-03-05 19:44:51+08  

2.5、控制文件内容

postgres=# select * from pg_control_system();  
-[ RECORD 1 ]------------+-----------------------  
pg_control_version       | 1100  
catalog_version_no       | 201809051  
system_identifier        | 6636510237226062864  
pg_control_last_modified | 2019-03-05 19:54:56+08  

2.6、当前实例如果重启,需要的最早的REDO。

postgres=# select * from pg_control_recovery();
-[ RECORD 1 ]-----------------+------------
min_recovery_end_lsn | 5D/A7FFFFE0
min_recovery_end_timeline | 1
backup_start_lsn | 0/0
backup_end_lsn | 0/0
end_of_backup_record_required | f

2.7、从当前wal目录中,获取到最大的WAL文件名,通常会是当前需要的WAL或者上一个已经REPLAY万的WAL文件。

postgres=# select * from pg_ls_waldir() order by 1 desc limit 1;  
           name           |   size   |      modification        
--------------------------+----------+------------------------  
 000000010000005D000000A7 | 16777216 | 2019-03-05 19:48:53+08  
(1 row)  

时间点恢复,手工拷贝wal文件的流程 - 通常不需要手工拷贝,只要指定restore_command让数据库自己来即可

配置recovery.conf

1、配置恢复目标

2、配置restore_command命令,打印下一个需要的WAL文件以及HISTORY文件,输出到某个文件中。参考方法1。

3、配置pause

4、配置打开hot_standby

5、从restore_command命令输出到某个文件中得到。下一个需要的WAL文件以及HISTORY文件。

优先拷贝history文件,防止走错时间线。

如果history文件确实存在并拷贝成功,下一个拷贝的文件是.partial文件,千万不要搞错。

6、通过pg_is_wal_replay_paused判断是否停止

postgres=# select pg_is_wal_replay_paused();  
 pg_is_wal_replay_paused   
-------------------------  
 f  
(1 row)  

如果返回T,表示已经到达还原点,不需要在拷贝文件。

参考

《PostgreSQL 时间点恢复(PITR)在异步流复制主从模式下,如何避免主备切换后PITR恢复(备库、容灾节点、只读节点)走错时间线(timeline , history , partial , restore_command , recovery.conf)》
《PostgreSQL PITR THREE recovery target MODE: name,xid,time USE CASE - 2》

《PostgreSQL PITR THREE recovery target MODE: name,xid,time USE CASE - 1》

《PostgreSQL recovery target introduce》

 

免费领取阿里云RDS PostgreSQL实例、ECS虚拟机

相关实践学习
使用PolarDB和ECS搭建门户网站
本场景主要介绍基于PolarDB和ECS实现搭建门户网站。
阿里云数据库产品家族及特性
阿里云智能数据库产品团队一直致力于不断健全产品体系,提升产品性能,打磨产品功能,从而帮助客户实现更加极致的弹性能力、具备更强的扩展能力、并利用云设施进一步降低企业成本。以云原生+分布式为核心技术抓手,打造以自研的在线事务型(OLTP)数据库Polar DB和在线分析型(OLAP)数据库Analytic DB为代表的新一代企业级云原生数据库产品体系, 结合NoSQL数据库、数据库生态工具、云原生智能化数据库管控平台,为阿里巴巴经济体以及各个行业的企业客户和开发者提供从公共云到混合云再到私有云的完整解决方案,提供基于云基础设施进行数据从处理、到存储、再到计算与分析的一体化解决方案。本节课带你了解阿里云数据库产品家族及特性。
相关文章
|
4月前
|
存储 Oracle 关系型数据库
postgresql数据库|wal日志的开启以及如何管理
postgresql数据库|wal日志的开启以及如何管理
273 0
|
3月前
|
存储 关系型数据库 MySQL
MySQL中的WAL技术
MySQL中的WAL技术
|
10月前
|
Oracle 安全 关系型数据库
如何在openGauss/PostgreSQL手动清理XLOG/WAL 文件?
openGauss/PostgreSQL中的预写式日志WAL(Write Ahead Log),又名Xlog或redo log,相当于oracle的online redo log, 不同的是oracle online redo log是提前创建几组滚动使用,但在opengauss中只需要本配置参数控制WAL日志的周期,数据库会一直的创建并自动清理,但存在一些情况WAL日志未清理导致目录空间耗尽,或目录空间紧张时手动删除wal日志时,比如如何确认在非归档模式下哪些WAL日志文件可以安全删除?
566 0
|
7月前
|
关系型数据库 分布式数据库 数据库
polardb里面的wal流水线技术的优势是什么
polardb里面的wal流水线技术的优势是什么
54 1
|
9月前
|
存储 算法 安全
[翻译]PostgreSQL中的WAL压缩以及版本15中的改进
[翻译]PostgreSQL中的WAL压缩以及版本15中的改进
124 0
|
9月前
|
关系型数据库 PostgreSQL
PostgreSQL崩溃恢复读取WAL
PostgreSQL崩溃恢复读取WAL
165 0
|
9月前
|
存储 固态存储 Ubuntu
postgresql email列表对NVM WAL BUFFER的讨论
postgresql email列表对NVM WAL BUFFER的讨论
44 0
|
9月前
|
存储 关系型数据库 PostgreSQL
PostgreSQL WAL解析:构建WAL记录准备
PostgreSQL WAL解析:构建WAL记录准备
95 0
|
9月前
|
SQL Cloud Native 关系型数据库
ADBPG(AnalyticDB for PostgreSQL)是阿里云提供的一种云原生的大数据分析型数据库
ADBPG(AnalyticDB for PostgreSQL)是阿里云提供的一种云原生的大数据分析型数据库
739 1
|
9月前
|
数据可视化 关系型数据库 MySQL
将 PostgreSQL 迁移到 MySQL 数据库
将 PostgreSQL 迁移到 MySQL 数据库
1064 2