PostgreSQL cluster role switchover between primary and standby

本文涉及的产品
云原生数据库 PolarDB MySQL 版,Serverless 5000PCU 100GB
简介:
虽然PostgreSQL没有明确指出 primary 和 standby 节点的角色可以相互切换。(即standby promote成primary ,primary demote成standby .不需要重建集群)
但是经过实际测试,是可以实现的。
以下测试数据库版本9.0.2 . 低版本可能有一定差异。(比如有一个版本改进了standby shutdown后自动获取checkpoint的功能,最初的9.0是没有的,没有这个功能重启后需要用到一些已经APPLY的WAL,带来比较麻烦的问题。有兴趣的朋友可以参考我以前写的BLOG)
首先来看几个和XLOG相关的东西,要确保安全的SWITCHOVER,必须确保一下提及的数据在primary和standby切换前是一致的。
1. primary 和 standby 关闭后使用pg_controldata输出的checkpoint location (-m immediate关闭的话这个值不可信,因此必须确保都是-m fast 或者 smart模式关闭的,fast和smart关闭都是干净的关闭模式,起来时不需要recovery)

具体的操作步骤:
1. on primary  
pg_ctl stop -m fast $PGDATA  
确认返回成功后
查看pg_controldata输出  
Latest checkpoint location:           1B/6C000020
Latest checkpoint's REDO location:    1B/6C000020
2. on standby
pg_ctl stop -m fast $PGDATA
确认返回成功后
查看pg_controldata输出  
Latest checkpoint location:           1B/6C000020
Latest checkpoint's REDO location:    1B/6C000020
3. 确保standby和primary 的pg_controldata    checkpoint的输出结果一致。就可以切换standby 和 primary了。
4. 启动standby,并且touch一个在recovery.conf中的trigger文件,使standby promote.为primary库
5. 修改原primary库的配置文件 并新建recovery.conf,需要注意的是新增一条
recovery_target_timeline = 'latest' ,否则会报类似如下错误timeline 6 of the primary does not match recovery target timeline 5。(其他配置略,清参考我以前写的BLOG。)
6. 启动原primary , 此时切换完成,primary 变standby ,standby 变primary ,而且可以再次切换。

切换完成后查看pg_controldata的输出,TimeLineID 将变大。
TimeLineID递增后,在新的PRIMARY 库会出现类似如下进程
postgres: archiver process   last was 00000005.history
而TimeLineID可能就等于5.
与此同时,新的standby TimeLineID还是老的,因为还没有更新过来。
要更新过来的话,要么等CHECKPOINT,要么关闭主节点,然后再关闭副节点。起来之后就同步了。

下面来分析一下,一个archive和stream都打开的主副数据库集群.
1. 主节点在干净的关闭数据库时干了些啥?
对于一个STREAM复制的环境,主节点上面有一个WAL_SENDER进程,干净关闭数据库时,postmaster对这个进程的处理方法和普通的SERVER PROCESS不一样,数据库关闭时首先发信号给server process关闭,然后写shutdown checkpoint ,(有兴趣的朋友可以看一看CHECKPOINT的源码,CHECKPOINT分为好几种),然后postmaster告知wal_sender shutdown_checkpoint,并且把截止这个CHECKPOINT的XLOG发送给standby上面的wal_receiver进程。
从而主节点干净的关闭数据库时,standby节点可以接收到shutdown_checkpoint,达到一致的目的。
原文引用:
Therefore postmaster treats walsenders like the pgarch process,
and instructs them to terminate at PM_SHUTDOWN_2 phase, after all regular
backends have died and bgwriter has written the shutdown checkpoint.

2. 副节点在干净的关闭数据库时干了些啥?
standby节点干净的关闭数据库,直接关闭,目前的版本不尝试且不等待正在从archive ,pg_xlog,或wal_sender恢复完成。但是已经接收到的CHECKPOINT将被写入持久化存储,因此关闭后pg_controldata与主节点一致.


干净的关闭后的primary 和standby 库的pg_controldata输入示例 :  
primary :
postgres@digoal-172-16-3-39-> pg_controldata 
pg_control version number:            903
Catalog version number:               201008051
Database system identifier:           5556352067747738614
Database cluster state:               shut down
pg_control last modified:             Wed 11 May 2011 08:52:38 PM CST
Latest checkpoint location:           1B/70000020
Prior checkpoint location:            1B/6C000078
Latest checkpoint's REDO location:    1B/70000020
Latest checkpoint's TimeLineID:       5
Latest checkpoint's NextXID:          0/4369
Latest checkpoint's NextOID:          2074455
Latest checkpoint's NextMultiXactId:  1
Latest checkpoint's NextMultiOffset:  0
Latest checkpoint's oldestXID:        655
Latest checkpoint's oldestXID's DB:   1
Latest checkpoint's oldestActiveXID:  0
Time of latest checkpoint:            Wed 11 May 2011 08:52:38 PM CST
Minimum recovery ending location:     0/0
Backup start location:                0/0
Current wal_level setting:            hot_standby
Current max_connections setting:      2000
Current max_prepared_xacts setting:   50
Current max_locks_per_xact setting:   64
Maximum data alignment:               8
Database block size:                  8192
Blocks per segment of large relation: 1048576
WAL block size:                       8192
Bytes per WAL segment:                67108864
Maximum length of identifiers:        64
Maximum columns in an index:          32
Maximum size of a TOAST chunk:        1996
Date/time type storage:               64-bit integers
Float4 argument passing:              by value
Float8 argument passing:              by value


standby : 
postgres@digoal-172-16-3-33-> pg_controldata 
pg_control version number:            903
Catalog version number:               201008051
Database system identifier:           5556352067747738614
Database cluster state:               shut down in recovery
pg_control last modified:             Wed 11 May 2011 08:52:52 PM CST
Latest checkpoint location:           1B/70000020
Prior checkpoint location:            1B/6C000020
Latest checkpoint's REDO location:    1B/70000020
Latest checkpoint's TimeLineID:       5
Latest checkpoint's NextXID:          0/4369
Latest checkpoint's NextOID:          2074455
Latest checkpoint's NextMultiXactId:  1
Latest checkpoint's NextMultiOffset:  0
Latest checkpoint's oldestXID:        655
Latest checkpoint's oldestXID's DB:   1
Latest checkpoint's oldestActiveXID:  0
Time of latest checkpoint:            Wed 11 May 2011 08:52:38 PM CST
Minimum recovery ending location:     1B/6C000020
Backup start location:                0/0
Current wal_level setting:            hot_standby
Current max_connections setting:      2000
Current max_prepared_xacts setting:   50
Current max_locks_per_xact setting:   64
Maximum data alignment:               8
Database block size:                  8192
Blocks per segment of large relation: 1048576
WAL block size:                       8192
Bytes per WAL segment:                67108864
Maximum length of identifiers:        64
Maximum columns in an index:          32
Maximum size of a TOAST chunk:        1996
Date/time type storage:               64-bit integers
Float4 argument passing:              by value
Float8 argument passing:              by value


对于一个因异常而FAILOVER的库,因为failover时不能确保primary 和 standby 的checkpoint一致,所以,在primary数据库服务器修复后,不能直接转换为standby,而需要一个checkpoint小于standby切换为primary时的standby库的checkpoint的基础备份,并且需要足够的archive log来做apply,
因此failover后主备集群的重新建立的速度或简易程度,大部分决定于平时对standby和archive log的备份。


# 补充
# 2011-06-09 今天对一个生产系统做了切换,版本是9.0.3
# 切换后,起主节点的时候一直处于等待状态
postgres: startup process   waiting for 00000001000001200000001C
# 日志里面报错如下:
cp: cannot stat `/opt/pg_arch/00000002.history': No such file or directory
# 00000002.history这个文件来自于新的primary节点pg_xlog目录,拷贝到/opt/pg_arch/00000002.history,另外
拷贝00000001000001200000001C到/opt/pg_arch/.
# 拷贝完后重启老的primary节点.
2011-06-09 14:31:57.095 CST,,,1357,,4df068dd.54d,1,,2011-06-09 14:31:57 CST,,0,LOG,00000,"database system was shut down in recovery at 2011-06-09 14:31:53 CST",,,,,,,,,""
2011-06-09 14:31:57.099 CST,,,1357,,4df068dd.54d,2,,2011-06-09 14:31:57 CST,,0,LOG,00000,"restored log file ""00000002.history"" from archive",,,,,,,,,""
2011-06-09 14:31:57.105 CST,,,1357,,4df068dd.54d,3,,2011-06-09 14:31:57 CST,,0,LOG,00000,"restored log file ""00000002.history"" from archive",,,,,,,,,""
2011-06-09 14:31:57.105 CST,,,1357,,4df068dd.54d,4,,2011-06-09 14:31:57 CST,,0,LOG,00000,"entering standby mode",,,,,,,,,""
2011-06-09 14:31:57.275 CST,,,1357,,4df068dd.54d,5,,2011-06-09 14:31:57 CST,,0,LOG,00000,"restored log file ""00000001000001200000001C"" from archive",,,,,,,,,""
2011-06-09 14:31:58.120 CST,,,1357,,4df068dd.54d,6,,2011-06-09 14:31:57 CST,1/0,0,LOG,00000,"consistent recovery state reached at 120/70000078",,,,,,,,,""
2011-06-09 14:31:58.120 CST,,,1357,,4df068dd.54d,7,,2011-06-09 14:31:57 CST,1/0,0,LOG,00000,"invalid record length at 120/70000078",,,,,,,,,""
2011-06-09 14:31:58.121 CST,,,1355,,4df068dc.54b,1,,2011-06-09 14:31:56 CST,,0,LOG,00000,"database system is ready to accept read only connections",,,,,,,,,""
2011-06-09 14:31:58.126 CST,,,1365,,4df068de.555,1,,2011-06-09 14:31:58 CST,,0,LOG,00000,"streaming replication successfully connected to primary",,,,,,,,,""
2011-06-09 14:31:59.320 CST,,,1357,,4df068dd.54d,8,,2011-06-09 14:31:57 CST,1/0,0,LOG,00000,"redo starts at 120/70000078",,,,,,,,,""

成功的切换为standby节点.
相关实践学习
使用PolarDB和ECS搭建门户网站
本场景主要介绍基于PolarDB和ECS实现搭建门户网站。
阿里云数据库产品家族及特性
阿里云智能数据库产品团队一直致力于不断健全产品体系,提升产品性能,打磨产品功能,从而帮助客户实现更加极致的弹性能力、具备更强的扩展能力、并利用云设施进一步降低企业成本。以云原生+分布式为核心技术抓手,打造以自研的在线事务型(OLTP)数据库Polar DB和在线分析型(OLAP)数据库Analytic DB为代表的新一代企业级云原生数据库产品体系, 结合NoSQL数据库、数据库生态工具、云原生智能化数据库管控平台,为阿里巴巴经济体以及各个行业的企业客户和开发者提供从公共云到混合云再到私有云的完整解决方案,提供基于云基础设施进行数据从处理、到存储、再到计算与分析的一体化解决方案。本节课带你了解阿里云数据库产品家族及特性。
相关文章
|
存储 关系型数据库 PostgreSQL
PostgreSQL cluster table using index
PostgreSQL CLUSTER意在将表按照索引的顺序排布.  可以通过ctid来观察这个排布, 或者通过pg_stats.
1142 0
|
关系型数据库 流计算 PostgreSQL
|
缓存 关系型数据库 PostgreSQL
PostgreSQL cluster大幅减少nestloop离散IO的优化方法
背景 对于较大数据量的表,如果在索引字段上面有小结果集JOIN,用nestloop JOIN是比较好的方法。 但是nestloop带来的一个问题就是离散IO,这个是无法回避的问题,特别是硬件IO能力不行的情况下,性能会比较糟糕。 有什么优化方法呢? PostgreSQL提供了一个
3539 0
|
存储 关系型数据库 索引
postgresql cluster和correlation
--今天查看pg_statsinfo报告时发现有如下警告,说明字段物理行序和逻辑行序相关不大 correlation of the clustered table fell below threshold in snapshot '2015-11-30 08:30:00' --- 't.
1050 0
|
28天前
|
关系型数据库 分布式数据库 数据库
成都晨云信息技术完成阿里云PolarDB数据库产品生态集成认证
近日,成都晨云信息技术有限责任公司(以下简称晨云信息)与阿里云PolarDB PostgreSQL版数据库产品展开产品集成认证。测试结果表明,晨云信息旗下晨云-站群管理系统(V1.0)与阿里云以下产品:开源云原生数据库PolarDB PostgreSQL版(V11),完全满足产品兼容认证要求,兼容性良好,系统运行稳定。