前段时间在使用9.3 beta2的流复制时发现无法完成主备数据库的角色互相切换. 原因是备节点的receiver进程在知晓主节点的sender进程关闭后也会自动关闭,可能导致无法接收primary节点关闭时完全的xlog信息. 因此角色无法互相切换.
9.3beta2发布的第二天这个补丁就提交了, 如下 :
author | Fujii Masao <fujii@postgresql.org> | |
Tue, 25 Jun 2013 17:14:37 +0000 (02:14 +0900) | ||
committer | Fujii Masao <fujii@postgresql.org> | |
Tue, 25 Jun 2013 17:14:37 +0000 (02:14 +0900) | ||
commit | 985bd7d49726c9f178558491d31a570d47340459 | |
tree | 765e3c0e22ee4991c5fb0b00dd8888e8161f4422 | tree | snapshot |
parent | 4f14c86d7434376b95477aeeb07fcc7272f4c47d | commit | diff |
Support clean switchover.
In replication, when we shutdown the master, walsender tries to send
all the outstanding WAL records to the standby, and then to exit. This
basically means that all the WAL records are fully synced between
two servers after the clean shutdown of the master. So, after
promoting the standby to new master, we can restart the stopped
master as new standby without the need for a fresh backup from
new master.
But there was one problem so far: though walsender tries to send all
the outstanding WAL records, it doesn't wait for them to be replicated
to the standby. Then, before receiving all the WAL records,
walreceiver can detect the closure of connection and exit. We cannot
guarantee that there is no missing WAL in the standby after clean
shutdown of the master. In this case, backup from new master is
required when restarting the stopped master as new standby.
This patch fixes this problem. It just changes walsender so that it
waits for all the outstanding WAL records to be replicated to the
standby before closing the replication connection.
Per discussion, this is a fix that needs to get backpatched rather than
new feature. So, back-patch to 9.1 where enough infrastructure for
this exists.
Patch by me, reviewed by Andres Freund.
In replication, when we shutdown the master, walsender tries to send
all the outstanding WAL records to the standby, and then to exit. This
basically means that all the WAL records are fully synced between
two servers after the clean shutdown of the master. So, after
promoting the standby to new master, we can restart the stopped
master as new standby without the need for a fresh backup from
new master.
But there was one problem so far: though walsender tries to send all
the outstanding WAL records, it doesn't wait for them to be replicated
to the standby. Then, before receiving all the WAL records,
walreceiver can detect the closure of connection and exit. We cannot
guarantee that there is no missing WAL in the standby after clean
shutdown of the master. In this case, backup from new master is
required when restarting the stopped master as new standby.
This patch fixes this problem. It just changes walsender so that it
waits for all the outstanding WAL records to be replicated to the
standby before closing the replication connection.
Per discussion, this is a fix that needs to get backpatched rather than
new feature. So, back-patch to 9.1 where enough infrastructure for
this exists.
Patch by me, reviewed by Andres Freund.
src/backend/replication/walsender.c | diff | blob | blame | history |
这个补丁支持9.1,9.2,9.3三个版本.
9.0因为流复制架构问题不支持这个补丁.
In 9.0, the standby doesn't send back any message to the master and
there is no way to know whether replication has been done up to
the specified location, so I don't think that we can backpatch.
下面来测试一下. 打补丁.
将新版本的walsender.c下载到源码对应目录下替换原有文件.
例如我这里的话替换 :
替换后重新编译安装.
重启数据库 :
[参考]
pg93@db-172-16-3-39-> ll /opt/soft_bak/postgresql-9.3beta2/src/backend/replication/walsender.c
-rw-r--r-- 1 root root 58K Jul 19 07:50 /opt/soft_bak/postgresql-9.3beta2/src/backend/replication/walsender.c
替换后重新编译安装.
pg93@db-172-16-3-39-> cd /opt/soft_bak/postgresql-9.3beta2
pg93@db-172-16-3-39-> gmake world
pg93@db-172-16-3-39-> gmake install-world
重启数据库 :
pg_ctl restart -m fast
重新做主备角色切换测试正常.
感谢Fujii Masao和Andres Freund.
最后建议大家将9.1, 9.2, 9.3 的版本都打上这个补丁.