标签
PostgreSQL , checkpoint , 调度 , lazy , immediate , pg_start_backup , pg_basebackup
背景
PostgreSQL支持在线全量备份与增量归档备份。在线全量备份实际上就是拷贝文件,增量备份则分为两种,一种是基于BLOCK lsn变化的BLOCK即增量备份,另一种是基于WAL的持续归档文件备份。
全量备份通常使用pg_basebackup客户端实现,或者使用SQL函数pg_start_backup()+COPY文件、打快照的方式实现。
全量备份开启前,需要对数据库做一次checkpoint,并强制开启full page write,确保partial block在后续可以通过wal进行恢复。备份结束时通过pg_stop_backup告知,关闭full page write(如果参数开启了FPW则不受影响)。
有时你可能会发现使用pg_basebackup或pg_start_backup接口时,好像hang住确没有开始拷贝文件。实际上是在做checkpoint,但是为什么这个checkpoint比较慢,但是直接SQL执行checkopint命令确很快呢?
原因是checkpoint分为调度和非调度模式。
调度模式的checkpoint和checkpoint_completion_target以及配置的max_wal_size区间大小有关,checkpoint_completion_target和max_wal_size越大,表示这个checkpoint将在这么大的区间内调度完成,所以总耗时会非常长,好处是减少CHECKPOINT带来的大量刷脏和FSYNC,从而减少抖动。
坏处就是你会发现checkpoint很漫长。
非调度模式的checkpoint,就是尽快完成检查点,会全速刷脏,不进行调度。好处是快,坏处是,如果脏页特别多,可能会有大量IO影响其他会话性能。
postgres=# show max_wal_size ;
max_wal_size
--------------
128GB
(1 row)
postgres=# show min_wal_size;
min_wal_size
--------------
32GB
(1 row)
postgres=# show checkpoint_completion_target ;
checkpoint_completion_target
------------------------------
0.1
(1 row)
代码中可以看到,checkpoint有如下flag来控制检查点行为。
* RequestCheckpoint
* Called in backend processes to request a checkpoint
*
* flags is a bitwise OR of the following:
* CHECKPOINT_IS_SHUTDOWN: checkpoint is for database shutdown.
* CHECKPOINT_END_OF_RECOVERY: checkpoint is for end of WAL recovery.
* CHECKPOINT_IMMEDIATE: finish the checkpoint ASAP,
* ignoring checkpoint_completion_target parameter.
* CHECKPOINT_FORCE: force a checkpoint even if no XLOG activity has occurred
* since the last one (implied by CHECKPOINT_IS_SHUTDOWN or
* CHECKPOINT_END_OF_RECOVERY).
* CHECKPOINT_WAIT: wait for completion before returning (otherwise,
* just signal checkpointer to do it, and return).
* CHECKPOINT_CAUSE_XLOG: checkpoint is requested due to xlog filling.
* (This affects logging, and in particular enables CheckPointWarning.)
*/
void
RequestCheckpoint(int flags)
start backup如何控制是使用快速checkpoint(非调度模式)、或者调度模式的checkpoint呢?
XLogRecPtr
do_pg_start_backup(const char *backupidstr, bool fast, TimeLineID *starttli_p,
StringInfo labelfile, DIR *tblspcdir, List **tablespaces,
StringInfo tblspcmapfile, bool infotbssize,
bool needtblspcmapfile)
{
* Since the fact that we are executing do_pg_start_backup()
* during recovery means that checkpointer is running, we can use
* RequestCheckpoint() to establish a restartpoint.
*
* We use CHECKPOINT_IMMEDIATE only if requested by user (via
* passing fast = true). Otherwise this can take awhile.
*/
RequestCheckpoint(CHECKPOINT_FORCE | CHECKPOINT_WAIT |
(fast ? CHECKPOINT_IMMEDIATE : 0));
1、pg_basebackup客户端命令,通过-c参数控制(fast表示使用非调度模式checkpoint)
-c, --checkpoint=fast|spread
set fast or spread checkpointing
2、pg_start_backup SQL函数,通过参数fast控制
postgres=# \df pg_start_backup
List of functions
Schema | Name | Result data type | Argument data types | Type
------------+-----------------+------------------+------------------------------------------------------------------------+------
pg_catalog | pg_start_backup | pg_lsn | label text, fast boolean DEFAULT false, exclusive boolean DEFAULT true | func
(1 row)
小结
如果你需要快速的开始备份,可以使用fast(非调度模式)参数。