[补充说明]
本文测试fail-back未成功的bug是-m immediate的一个bug, 已经修复 :
[正文]
PostgreSQL的流复制应用非常普遍, 同时也可以作为HA的关键技术, 当主节点出现故障时, 可以通过激活备节点的方法达到高可用到目的.
当备份节点激活后, 原来的主节点可以作为备节点来使用.
但是, 如果主节点在发生故障前的WAL没有全部同步给备节点的话, 备节点激活后, 实际上与主节点是存在一定差异的. 这种情况下主节点不能直接作为备节点参与利用流复制做的增量恢复. 遇到这样的情况
怎么样把主节点变成备节点呢?
1. 使用pg_basebackup, 或者COPY 的方法重新做一个standby, 这样的话相当于要把整个数据库的数据文件全部传一遍. 数据量大或者网络交差的话是比较痛苦的事情.
2. 使用rsync, 只传输增量数据, 但是需要注意的是, 如果数据库很大的话, rsync的数据比对过程非常漫长, 并且要消耗大量的io资源.
除了以上两种方法, 还有其他的工具可以利用, 例如
在此前写过一篇关于fail-back的工具pg_rewind的使用文章.
pg_rewind利用新的primary的history 文件, 取出切换时间线是的xlog位置信息, 用这个位置信息来匹配旧的primary节点的xlog文件中的信息, 如果旧的primary在这个点之后还产生了xlog信息, 那么说明standby在promote后有未同步至standby节点的xlog数据, 所以需要对这部分xlog做差异处理. 把这部分xlog对应的变更的块以及控制文件,clog等从新的primary拷贝到老的primary节点.
具体可参考 :
除了pg_rewind, PostgreSQL可能在内核中加入类似pg_rewind的功能, 当然原理完全不一样, 加入内核的功能原理如下 :
到第四版的补丁, 新增的参数名更改为 synchronous_transfer, 可配置3个值 :
synchronous_transfer参数的含义 :
The proposal is based on the concept of master should not do any file system level change until corresponding WAL record is replicated to the standby. There are many places in the code which need to be handled to support the proposed solution. Following cases explains the need of fresh backup at the time of failover, and how can we avoid this need by our approach. 1. We must not write any heap pages to the disk before the WAL records corresponding to those changes are received by the standby. Otherwise if standby failed to receive WAL corresponding to those heap pages there will be inconsistency. 2. When CHECKPOINT happens on the master, control file of master gets updated and last checkpoint record is written to it. Suppose failover happens and standby fails to receive the WAL record corresponding to CHECKPOINT, then master and standby has inconsistent copies of control file that leads to the mismatch in redo record and recovery will not start normally. To avoid this situation we must not update the control file of master before the corresponding checkpoint WAL record is received by the standby 3. Also when we truncate any of the physical files on the master and suppose the standby failed to receive corresponding WAL, then that physical file is truncated on master but still available on standby causing inconsistency. To avoid this we must not truncate physical files on the master before the WAL record corresponding to that operation is received by the standby. 4. Same case applies to CLOG pages. If CLOG page is written to the disk and corresponding WAL record is not replicated to the standby, leads to the inconsistency. So we must not write the CLOG pages (and may be other SLRU pages too) to the disk before the corresponding WAL records are received by standby. 5. The same problem applies for the commit hint bits. But it is more complicated than the other problems, because no WAL records are generated for that, hence we cannot apply the same above method, that is wait for corresponding WAL record to be replicated on standby. So we delay the processes of updating the commit hint bits, similar to what is done by asynchronous commits. In other words we need to check if the WAL corresponding to the transaction commit is received by the failback safe standby and then only allow hint bit updates. *Patch explanation:* The initial work on this patch is done by Pavan Deolasee. I tested it and will make further enhancements based on the community feedback. This patch is not complete yet, but I plan to do so with the help of this community. At this point, the primary purpose is to understand the complexities and get some initial performance numbers to alleviate some of the concerns raised by the community. There are two GUC parameters which supports this failsafe standby 1. failback_safe_standby_name [ name of the failsafe standby ] It is the name of failsafe standby. Master will not do any file system level change before corresponding WAL is replicated on the this failsafe standby 2. failback_safe_standby_mode [ off/remote_write/remote_flush] This parameter specifies the behavior of master i.e. whether it should wait for WAL to be written on standby or WAL to be flushed on standby. We should turn it off when we do not want the failsafe standby. This failsafe mode can be combined with synchronous as well as asynchronous streaming replication. Most of the changes are done in the syncrep.c. This is a slight misnomer because that file deals with synchronous standby and a failback standby could and most like be a async standby. But keeping the changes this way has ensured that the patch is easy to read. Once we have acceptance on the approach, the patch can be modified to reorganize the code in a more logical way. The patch adds a new state SYNC_REP_WAITING_FOR_FAILBACK_SAFETY to the sync standby states. A backend which is waiting for a failback safe standby to receive WAL records, will wait in this state. Failback safe mechanism can work in two different modes, that is wait for WAL to be written or flushed on failsafe standby. That is represented by two new modes SYNC_REP_WAIT_FAILBACK_SAFE_WRITE and SYNC_REP_WAIT_FAILBACK_SAFE_FLUSH respectively. Also the SyncRepWaitForLSN() is changed for conditional wait. So that we can delay hint bit updates on master instead of blocking the wait for the failback safe standby to receiver WAL's.
AI 代码解读
到第四版的补丁, 新增的参数名更改为 synchronous_transfer, 可配置3个值 :
synchronous_transfer = commit # data page synchronization level # commit, data_flush or all
AI 代码解读
synchronous_transfer参数的含义 :
-'commit' 'commit' means that master waits for corresponding WAL to flushed to disk of standby server on commits. but master doesn't waits for replicated data pages. -'data_flush' 'data_flush' means that master waits for replicated data page (e.g, CLOG, pg_control) before flush to disk of master server. but if user set to 'data_flush' to this parameter, 'synchronous_commit' values is ignored even if user set 'synchronous_commit'. -'all' 'all' means that master waits for replicated WAL and data page.
AI 代码解读
commit类似同步流复制,记录master节点的commit状态前,等待该xlog已经在standby flushed to disk.
data_flush则表示master节点在flush data page前, 必须确保这些变更对应的xlog信息已经在standby 节点flush to disk了. (这个配置将忽略synchronous_commit的配置)
同步流复制增加1种等待模式(
SYNC_REP_WAIT_DATA_FLUSH
) :
+++ b/src/include/replication/syncrep.h
src/backend/replication/syncrep.c
@@ -19,23 +19,42 @@ #define SyncRepRequested() \ (max_wal_senders > 0 && synchronous_commit > SYNCHRONOUS_COMMIT_LOCAL_FLUSH) +#define SyncTransRequested() \ + (max_wal_senders > 0 && synchronous_transfer > SYNCHRONOUS_TRANSFER_COMMIT) + +#define IsSyncRepSkipped() \ + (max_wal_senders > 0 && synchronous_transfer == SYNCHRONOUS_TRANSFER_DATA_FLUSH) + /* SyncRepWaitMode */ -#define SYNC_REP_NO_WAIT -1 -#define SYNC_REP_WAIT_WRITE 0 -#define SYNC_REP_WAIT_FLUSH 1 +#define SYNC_REP_NO_WAIT -1 +#define SYNC_REP_WAIT_WRITE 0 +#define SYNC_REP_WAIT_FLUSH 1 +#define SYNC_REP_WAIT_DATA_FLUSH 2 -#define NUM_SYNC_REP_WAIT_MODE 2 +#define NUM_SYNC_REP_WAIT_MODE 3 /* syncRepState */ -#define SYNC_REP_NOT_WAITING 0 -#define SYNC_REP_WAITING 1 -#define SYNC_REP_WAIT_COMPLETE 2 +#define SYNC_REP_NOT_WAITING 0 +#define SYNC_REP_WAITING 1 +#define SYNC_REP_WAIT_COMPLETE 2 + +typedef enum +{ + SYNCHRONOUS_TRANSFER_COMMIT, /* no wait for flush data page */ + SYNCHRONOUS_TRANSFER_DATA_FLUSH, /* wait for data page flush only + * no wait for WAL */ + SYNCHRONOUS_TRANSFER_ALL /* wait for data page flush */ +} SynchronousTransferLevel;
AI 代码解读
src/backend/replication/syncrep.c
/* * =========================================================== * Synchronous Replication functions for normal user backends * =========================================================== */ /* * Wait for synchronous replication, if requested by user. * * Initially backends start in state SYNC_REP_NOT_WAITING and then * change that state to SYNC_REP_WAITING/SYNC_REP_WAITING_FOR_DATA_FLUSH * before adding ourselves to the wait queue. During SyncRepWakeQueue() a * WALSender changes the state to SYNC_REP_WAIT_COMPLETE once replication is * confirmed. This backend then resets its state to SYNC_REP_NOT_WAITING. * * ForDataFlush - if TRUE, we wait for the flushing data page. * Otherwise wait for the sync standby * * Wait - if FALSE, we don't actually wait, but tell the caller whether or not * the standby has already made progressed upto the given XactCommitLSN * * Return TRUE if either the sync standby is not * configured/turned off OR the standby has made enough progress */ bool SyncRepWaitForLSN(XLogRecPtr XactCommitLSN, bool ForDataFlush, bool Wait)
AI 代码解读
[测试]
下载PostgreSQL src :
http://git.postgresql.org/gitweb/?p=postgresql.git;a=snapshot;h=e5592c61adb0766eaee53ec07d2f05783d1c6548;sf=tgz
补丁 :
安装PostgreSQL
初始化数据库
配置master
创建standby
tar -zxvf postgresql-e5592c6.tar.gz cd postgresql-e5592c6
AI 代码解读
补丁 :
[root@db-172-16-3-39 postgresql-e5592c6]# wget http://www.postgresql.org/message-id/attachment/29640/failback_safe_standby_v4.patch [root@db-172-16-3-39 postgresql-e5592c6]# patch -p1 < ./failback_safe_standby_v4.patch patching file src/backend/access/transam/clog.c patching file src/backend/access/transam/slru.c patching file src/backend/access/transam/twophase.c patching file src/backend/access/transam/xact.c patching file src/backend/access/transam/xlog.c patching file src/backend/catalog/storage.c patching file src/backend/replication/syncrep.c patching file src/backend/storage/buffer/bufmgr.c patching file src/backend/utils/cache/relmapper.c patching file src/backend/utils/misc/guc.c patching file src/backend/utils/misc/postgresql.conf.sample patching file src/backend/utils/time/tqual.c patching file src/include/access/xlog.h patching file src/include/replication/syncrep.h
AI 代码解读
安装PostgreSQL
[root@db-172-16-3-39 postgresql-e5592c6]# ./configure --prefix=/home/pg94/pgsql9.4devel --with-pgport=2999 --with-perl --with-tcl --with-python --with-openssl --with-pam --without-ldap --with-libxml --with-libxslt --enable-thread-safety --with-wal-blocksize=16 && gmake && gmake install cd contrib gmake gmake install
AI 代码解读
初始化数据库
initdb -D $PGDATA -E UTF8 --locale=C -W -U postgres
AI 代码解读
配置master
cd $PGDATA vi pg_hba.conf host replication postgres 172.16.3.0/24 md5 host all all 0.0.0.0/0 md5 vi postgresql.conf listen_addresses = '0.0.0.0' # what IP address(es) to listen on; port = 2999 # (change requires restart) max_connections = 100 # (change requires restart) superuser_reserved_connections = 13 # (change requires restart) unix_socket_directories = '.' # comma-separated list of directories unix_socket_permissions = 0700 # begin with 0 to use octal notation tcp_keepalives_idle = 60 # TCP_KEEPIDLE, in seconds; tcp_keepalives_interval = 10 # TCP_KEEPINTVL, in seconds; tcp_keepalives_count = 10 # TCP_KEEPCNT; shared_buffers = 1024MB # min 128kB maintenance_work_mem = 512MB # min 1MB shared_preload_libraries = 'pg_stat_statements' # (change requires restart) vacuum_cost_delay = 10 # 0-100 milliseconds vacuum_cost_limit = 10000 # 1-10000 credits bgwriter_delay = 10ms # 10-10000ms between rounds wal_level = hot_standby # minimal, archive, or hot_standby synchronous_commit = remote_write # synchronization level; wal_sync_method = fdatasync # the default is the first option full_page_writes = on # recover from partial page writes wal_buffers = 16384kB # min 32kB, -1 sets based on shared_buffers checkpoint_segments = 32 # in logfile segments, min 1, 16MB each archive_mode = on # allows archiving to be done archive_command = '/bin/date' # command to use to archive a logfile segment max_wal_senders = 32 # max number of walsender processes wal_keep_segments = 128 # in logfile segments, 16MB each; 0 disables synchronous_transfer = data_flush # data page synchronization level hot_standby = on # "on" allows queries during recovery max_standby_archive_delay = 300s # max delay before canceling queries max_standby_streaming_delay = 300s # max delay before canceling queries wal_receiver_status_interval = 1s # send replies at least this often hot_standby_feedback = on # send info from standby to prevent random_page_cost = 1.5 # same scale as above effective_cache_size = 128000MB log_destination = 'csvlog' # Valid values are combinations of logging_collector = on # Enable capturing of stderr and csvlog log_directory = 'pg_log' # directory where log files are written, log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log' # log file name pattern, log_file_mode = 0600 # creation mode for log files, log_truncate_on_rotation = on # If on, an existing log file with the log_rotation_age = 1d # Automatic rotation of logfiles will log_rotation_size = 10MB # Automatic rotation of logfiles will log_checkpoints = on log_connections = on log_disconnections = on log_error_verbosity = verbose # terse, default, or verbose messages log_timezone = 'PRC' autovacuum = on # Enable autovacuum subprocess? 'on' log_autovacuum_min_duration = 0 # -1 disables, 0 logs all actions and datestyle = 'iso, mdy' timezone = 'PRC' lc_messages = 'C' # locale for system error message lc_monetary = 'C' # locale for monetary formatting lc_numeric = 'C' # locale for number formatting lc_time = 'C' # locale for time formatting default_text_search_config = 'pg_catalog.english' pg_stat_statements.max = 1000 pg_stat_statements.track = all vi ~/.pgpass 172.16.3.33:2999:replication:postgres:postgres 172.16.3.39:2999:replication:postgres:postgres chmod 400 ~/.pgpass vi $PGDATA/recovery.done recovery_target_timeline = 'latest' standby_mode = on primary_conninfo = 'host=172.16.3.33 port=2999 user=postgres keepalives_idle=60' # e.g. 'host=localhost port=5432'
AI 代码解读
创建standby
安装过程略
复制
配置
启动standby
主节点, 查看复制状态 :
创建测试表, 测试函数
压力测试 :
测试failback 失败 :
以上测试过程中将standby数据库关闭
测试完后, 在主节点执行checkpoint; 因为配置了synchronous_transfer = data_flush. 会出现等待状态!
此时停主库也是无法停止的, 也有flush的动作, 所以会等待.
使用-m immediate关闭后, 重启主库.
开启standby, 数据一致
测试fail-over.
pg94@db-172-16-3-33-> pg_basebackup -D $PGDATA -F p -P -v -h 172.16.3.39 -p 2999 -U postgres WARNING: skipping special file "./.s.PGSQL.2999" WARNING: skipping special file "./.s.PGSQL.2999"94/pg_root/pg_clog/0000) 26943/26943 kB (100%), 1/1 tablespace NOTICE: pg_stop_backup complete, all required WAL segments have been archived pg_basebackup: base backup completed
AI 代码解读
配置
cd $PGDATA mv recovery.done recovery.conf vi recovery.conf recovery_target_timeline = 'latest' standby_mode = on primary_conninfo = 'host=172.16.3.39 port=2999 user=postgres keepalives_idle=60' # e.g. 'host=localhost port=5432'
AI 代码解读
启动standby
pg_ctl start
AI 代码解读
主节点, 查看复制状态 :
pg94@db-172-16-3-39-> psql psql (9.4devel) Type "help" for help. digoal=# select * from pg_stat_replication ; pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | st ate | sent_location | write_location | flush_location | replay_location | sync_priority | sync_state -------+----------+----------+------------------+-------------+-----------------+-------------+-------------------------------+----- ------+---------------+----------------+----------------+-----------------+---------------+------------ 12221 | 10 | postgres | walreceiver | 172.16.3.33 | | 20441 | 2013-08-06 09:15:35.617838+08 | stre aming | 0/3000090 | 0/3000090 | 0/3000090 | 0/3000090 | 0 | async (1 row)
AI 代码解读
创建测试表, 测试函数
digoal=# create table test (id int primary key, info text, crt_time timestamp); CREATE TABLE digoal=# create or replace function func() returns void as $$ digoal$# declare digoal$# v_id int; digoal$# begin digoal$# v_id := round(5000000*random()); digoal$# update test set info=md5(random()::text),crt_time=clock_timestamp() where id=v_id; digoal$# if found then digoal$# return; digoal$# else digoal$# insert into test values (v_id,md5(random()::text),clock_timestamp()); digoal$# end if; digoal$# return; digoal$# end; digoal$# $$ language plpgsql strict; CREATE FUNCTION
AI 代码解读
测试
digoal=# select func(); func ------ (1 row) digoal=# select * from test; id | info | crt_time ---------+----------------------------------+---------------------------- 1894289 | e759b24e9cb227c300f85666c4ac2f81 | 2013-08-06 09:21:30.910392 (1 row)
AI 代码解读
压力测试 :
pg94@db-172-16-3-39-> vi test.sql select func(); pg94@db-172-16-3-39-> pgbench -M prepared -f ./test.sql -r -n -h $PGDATA -p 2999 -U postgres -c 16 -j 4 -T 10 digoal transaction type: Custom query scaling factor: 1 query mode: prepared number of clients: 16 number of threads: 4 duration: 10 s number of transactions actually processed: 211697 tps = 21156.265771 (including connections establishing) tps = 21187.462509 (excluding connections establishing) statement latencies in milliseconds: 0.752017 select func();
AI 代码解读
测试failback 失败 :
pg94@db-172-16-3-39-> pgbench -M prepared -f ./test.sql -r -n -h $PGDATA -p 2999 -U postgres -c 16 -j 4 -T 100 digoal Client 2 aborted in state 0: ERROR: duplicate key value violates unique constraint "test_pkey" DETAIL: Key (id)=(521247) already exists. CONTEXT: SQL statement "insert into test values (v_id,md5(random()::text),clock_timestamp())" PL/pgSQL function func() line 10 at SQL statement Client 14 aborted in state 0: ERROR: duplicate key value violates unique constraint "test_pkey" DETAIL: Key (id)=(4957439) already exists. CONTEXT: SQL statement "insert into test values (v_id,md5(random()::text),clock_timestamp())" PL/pgSQL function func() line 10 at SQL statement Client 15 aborted in state 0: ERROR: duplicate key value violates unique constraint "test_pkey" DETAIL: Key (id)=(3050904) already exists. CONTEXT: SQL statement "insert into test values (v_id,md5(random()::text),clock_timestamp())" PL/pgSQL function func() line 10 at SQL statement transaction type: Custom query scaling factor: 1 query mode: prepared number of clients: 16 number of threads: 4 duration: 100 s number of transactions actually processed: 2063626 tps = 20633.969217 (including connections establishing) tps = 20637.002726 (excluding connections establishing) statement latencies in milliseconds: 0.712672 select func();
AI 代码解读
以上测试过程中将standby数据库关闭
pg94@db-172-16-3-33-> pg_ctl stop -m fast waiting for server to shut down.... done server stopped
AI 代码解读
测试完后, 在主节点执行checkpoint; 因为配置了synchronous_transfer = data_flush. 会出现等待状态!
postgres: checkpointer process waiting for 0/2D0C99D8
AI 代码解读
此时停主库也是无法停止的, 也有flush的动作, 所以会等待.
12198 pg94 15 0 1181m 47m 46m S 0.0 0.6 0:00.04 /home/pg94/pgsql9.4devel/bin/postgres 12199 pg94 15 0 108m 1032 504 S 0.0 0.0 0:00.00 postgres: logger process 12201 pg94 15 0 1182m 198m 197m S 0.0 2.5 0:00.34 postgres: checkpointer process waiting for 0/2D0C99D8 12205 pg94 15 0 110m 1024 460 S 0.0 0.0 0:00.01 postgres: archiver process last was 00000001000000000000002C 12206 pg94 15 0 110m 1228 576 S 0.0 0.0 0:00.04 postgres: stats collector process 12687 pg94 15 0 66068 1552 1208 S 0.0 0.0 0:00.01 -bash 12786 pg94 15 0 109m 1748 1408 S 0.0 0.0 0:00.00 pg_ctl stop -m fast
AI 代码解读
使用-m immediate关闭后, 重启主库.
pg94@db-172-16-3-39-> psql psql (9.4devel) Type "help" for help. digoal=# select count(*),sum(hashtext(test.*::text)) from test; count | sum ---------+-------------- 1828052 | 747103747132 (1 row)
AI 代码解读
开启standby, 数据一致
pg94@db-172-16-3-33-> psql psql (9.4devel) Type "help" for help. digoal=# select count(*),sum(hashtext(test.*::text)) from test; count | sum ---------+-------------- 1828052 | 747103747132 (1 row)
AI 代码解读
测试fail-over.
主库测试过程中, promote备库.
promote备库.
注意由于备库也配置了synchronous_transfer = data_flush, 所有的data flush操作都会等待, 所以要尽快把主库转成备库.
主库 fail-back失败 :
将备库的synchronous_transfer改为commit. fail-back失败.
[其他]
[参考]
pg94@db-172-16-3-39-> pgbench -M prepared -f ./test.sql -r -n -h $PGDATA -p 2999 -U postgres -c 16 -j 4 -T 100 digoal ./test.sql: No such file or directory pg94@db-172-16-3-39-> cd pg94@db-172-16-3-39-> pgbench -M prepared -f ./test.sql -r -n -h $PGDATA -p 2999 -U postgres -c 16 -j 4 -T 100 digoal Client 4 aborted in state 0: ERROR: duplicate key value violates unique constraint "test_pkey" DETAIL: Key (id)=(2592452) already exists. CONTEXT: SQL statement "insert into test values (v_id,md5(random()::text),clock_timestamp())" PL/pgSQL function func() line 10 at SQL statement Client 7 aborted in state 0: ERROR: duplicate key value violates unique constraint "test_pkey" DETAIL: Key (id)=(3677118) already exists. CONTEXT: SQL statement "insert into test values (v_id,md5(random()::text),clock_timestamp())" PL/pgSQL function func() line 10 at SQL statement
AI 代码解读
promote备库.
pg94@db-172-16-3-33-> pg_ctl promote server promoting 2013-08-06 09:34:46.772 CST,,,4728,,52005220.1278,6,,2013-08-06 09:32:16 CST,1/0,0,LOG,00000,"received promote request",,,,,,,,"CheckForStandbyTrigger, xlog.c:11127","" 2013-08-06 09:34:46.772 CST,,,4745,,5200523e.1289,2,,2013-08-06 09:32:46 CST,,0,FATAL,57P01,"terminating walreceiver process due to administrator command",,,,,,,,"ProcessWalRcvInterrupts, walreceiver.c:167","" 2013-08-06 09:34:46.871 CST,,,4728,,52005220.1278,7,,2013-08-06 09:32:16 CST,1/0,0,LOG,00000,"record with zero length at 0/45AD5F80",,,,,,,,"ReadRecord, xlog.c:4420","" 2013-08-06 09:34:46.871 CST,,,4728,,52005220.1278,8,,2013-08-06 09:32:16 CST,1/0,0,LOG,00000,"redo done at 0/45AD5F50",,,,,,,,"StartupXLOG, xlog.c:6878","" 2013-08-06 09:34:46.871 CST,,,4728,,52005220.1278,9,,2013-08-06 09:32:16 CST,1/0,0,LOG,00000,"last completed transaction was at log time 2013-08-06 09:34:46.753686+08",,,,,,,,"StartupXLOG, xlog.c:6883","" 2013-08-06 09:34:46.871 CST,,,4728,,52005220.1278,10,,2013-08-06 09:32:16 CST,1/0,0,LOG,00000,"selected new timeline ID: 2",,,,,,,,"StartupXLOG, xlog.c:6989","" 2013-08-06 09:34:47.080 CST,,,4728,,52005220.1278,11,,2013-08-06 09:32:16 CST,1/0,0,LOG,00000,"archive recovery complete",,,,,,,,"exitArchiveRecovery, xlog.c:5639","" 2013-08-06 09:35:21.939 CST,,,4729,,52005220.1279,2,,2013-08-06 09:32:16 CST,,0,LOG,00000,"restartpoint complete: wrote 24656 buffers (18.8%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=149.076 s, sync=0.868 s, total=150.366 s; sync files=41, longest=0.527 s, average=0.021 s",,,,,,,,"LogCheckpointEnd, xlog.c:7898","" 2013-08-06 09:35:21.939 CST,,,4729,,52005220.1279,3,,2013-08-06 09:32:16 CST,,0,LOG,00000,"recovery restart point at 0/2D0C99D8","last completed transaction was at log time 2013-08-06 09:34:46.753686+08",,,,,,,"CreateRestartPoint, xlog.c:8715","" 2013-08-06 09:36:03.135 CST,,,4840,"",52005303.12e8,1,"",2013-08-06 09:36:03 CST,,0,LOG,00000,"connection received: host=[local]",,,,,,,,"BackendInitialize, postmaster.c:3923","" 2013-08-06 09:36:03.136 CST,"postgres","digoal",4840,"[local]",52005303.12e8,2,"authentication",2013-08-06 09:36:03 CST,2/3,0,LOG,00000,"connection authorized: user=postgres database=digoal",,,,,,,,"PerformAuthentication, postinit.c:239","" 2013-08-06 09:36:16.409 CST,"postgres","digoal",4840,"[local]",52005303.12e8,3,"idle",2013-08-06 09:36:03 CST,,0,LOG,00000,"disconnection: session time: 0:00:13.274 user=postgres database=digoal host=[local]",,,,,,,,"log_disconnections, postgres.c:4427","psql"
AI 代码解读
注意由于备库也配置了synchronous_transfer = data_flush, 所有的data flush操作都会等待, 所以要尽快把主库转成备库.
pg94@db-172-16-3-39-> cd $PGDATA pg94@db-172-16-3-39-> mv recovery.done recovery.conf pg94@db-172-16-3-39-> pg_ctl restart -m immediate waiting for server to shut down.... done server stopped server starting pg94@db-172-16-3-39-> LOG: 00000: loaded library "pg_stat_statements" LOCATION: load_libraries, miscinit.c:1296
AI 代码解读
主库 fail-back失败 :
2013-08-06 09:39:38.208 CST,,,13320,,520053da.3408,1,,2013-08-06 09:39:38 CST,,0,LOG,00000,"database system was interrupted; last known up at 2013-08-06 09:32:33 CST",,,,,,,,"StartupXLOG, xlog.c:6080","" 2013-08-06 09:39:38.208 CST,,,13320,,520053da.3408,2,,2013-08-06 09:39:38 CST,,0,LOG,00000,"entering standby mode",,,,,,,,"StartupXLOG, xlog.c:6133","" 2013-08-06 09:39:38.208 CST,,,13320,,520053da.3408,3,,2013-08-06 09:39:38 CST,,0,LOG,00000,"database system was not properly shut down; automatic recovery in progress",,,,,,,,"StartupXLOG, xlog.c:6443","" 2013-08-06 09:39:38.210 CST,,,13320,,520053da.3408,4,,2013-08-06 09:39:38 CST,1/0,0,LOG,00000,"redo starts at 0/36000090",,,,,,,,"StartupXLOG, xlog.c:6687","" 2013-08-06 09:39:53.644 CST,,,13320,,520053da.3408,5,,2013-08-06 09:39:38 CST,1/0,0,LOG,00000,"record with zero length at 0/52438570",,,,,,,,"ReadRecord, xlog.c:4420","" 2013-08-06 09:39:53.644 CST,,,13320,,520053da.3408,6,,2013-08-06 09:39:38 CST,1/0,0,LOG,00000,"consistent recovery state reached at 0/52438570",,,,,,,,"CheckRecoveryConsistency, xlog.c:7348","" 2013-08-06 09:39:53.645 CST,,,13318,,520053da.3406,1,,2013-08-06 09:39:38 CST,,0,LOG,00000,"database system is ready to accept read only connections",,,,,,,,"sigusr1_handler, postmaster.c:4737","" 2013-08-06 09:39:53.649 CST,,,13339,,520053e9.341b,1,,2013-08-06 09:39:53 CST,,0,LOG,00000,"started streaming WAL from primary at 0/52000000 on timeline 1",,,,,,,,"WalReceiverMain, walreceiver.c:366","" 2013-08-06 09:39:53.650 CST,,,13339,,520053e9.341b,2,,2013-08-06 09:39:53 CST,,0,FATAL,XX000,"could not receive data from WAL stream: ERROR: requested starting point 0/52000000 is ahead of the WAL flush position of this server 0/45AD6088 ",,,,,,,,"libpqrcv_receive, libpqwalreceiver.c:513","" 2013-08-06 09:40:08.653 CST,,,13346,,520053f8.3422,1,,2013-08-06 09:40:08 CST,,0,LOG,00000,"started streaming WAL from primary at 0/52000000 on timeline 1",,,,,,,,"WalReceiverMain, walreceiver.c:366","" 2013-08-06 09:40:08.654 CST,,,13346,,520053f8.3422,2,,2013-08-06 09:40:08 CST,,0,FATAL,XX000,"could not receive data from WAL stream: ERROR: requested starting point 0/52000000 is ahead of the WAL flush position of this server 0/45AD6088 ",,,,,,,,"libpqrcv_receive, libpqwalreceiver.c:513",""
AI 代码解读
将备库的synchronous_transfer改为commit. fail-back失败.
pg94@db-172-16-3-39-> pgbench -M prepared -f ./test.sql -r -n -h $PGDATA -p 2999 -U postgres -c 16 -j 4 -T 100 digoal Client 0 aborted in state 0: ERROR: duplicate key value violates unique constraint "test_pkey" DETAIL: Key (id)=(2242608) already exists. CONTEXT: SQL statement "insert into test values (v_id,md5(random()::text),clock_timestamp())" PL/pgSQL function func() line 10 at SQL statement Client 7 aborted in state 0: ERROR: duplicate key value violates unique constraint "test_pkey" DETAIL: Key (id)=(2615244) already exists. CONTEXT: SQL statement "insert into test values (v_id,md5(random()::text),clock_timestamp())" PL/pgSQL function func() line 10 at SQL statement pg94@db-172-16-3-39-> psql psql (9.4devel) Type "help" for help. digoal=# checkpoint; CHECKPOINT digoal=# checkpoint; CHECKPOINT pg94@db-172-16-3-33-> pg_ctl promote server promoting pg94@db-172-16-3-33-> top -c -u pg94 top - 09:52:29 up 74 days, 16:10, 1 user, load average: 0.29, 0.13, 0.07 Tasks: 209 total, 1 running, 208 sleeping, 0 stopped, 0 zombie Cpu(s): 0.0%us, 0.0%sy, 0.0%ni, 99.9%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 14369440k total, 9195600k used, 5173840k free, 404756k buffers Swap: 8385920k total, 184k used, 8385736k free, 8036496k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 5409 pg94 15 0 12760 1204 844 R 0.7 0.0 0:00.12 top -c -u pg94 4262 pg94 15 0 67272 2900 1328 S 0.0 0.0 0:00.26 -bash 5382 pg94 15 0 1181m 47m 46m S 0.0 0.3 0:00.04 /home/pg94/pgsql9.4devel/bin/postgres 5383 pg94 16 0 108m 1048 516 S 0.0 0.0 0:00.00 postgres: logger process 5386 pg94 15 0 1182m 296m 295m S 0.0 2.1 0:00.55 postgres: checkpointer process 5387 pg94 15 0 1182m 6744 6172 S 0.0 0.0 0:00.00 postgres: writer process 5388 pg94 15 0 110m 1100 524 S 0.0 0.0 0:00.00 postgres: stats collector process 5404 pg94 15 0 1182m 1192 620 S 0.0 0.0 0:00.00 postgres: wal writer process 5405 pg94 17 0 1183m 2248 1164 S 0.0 0.0 0:00.00 postgres: autovacuum launcher process 5406 pg94 18 0 110m 1044 472 S 0.0 0.0 0:00.00 postgres: archiver process last was 00000002.history pg94@db-172-16-3-39-> pg_ctl stop -m immediate waiting for server to shut down.... done server stopped pg94@db-172-16-3-39-> cd $PGDATA pg94@db-172-16-3-39-> mv recovery.done recovery.conf pg94@db-172-16-3-39-> pg_ctl start failed-back 失败 : pg94@db-172-16-3-39-> cat postgresql-2013-08-06_095340.csv 2013-08-06 09:53:40.585 CST,,,13969,,52005724.3691,1,,2013-08-06 09:53:40 CST,,0,LOG,00000,"database system was interrupted; last known up at 2013-08-06 09:51:32 CST",,,,,,,,"StartupXLOG, xlog.c:6080","" 2013-08-06 09:53:40.585 CST,,,13969,,52005724.3691,2,,2013-08-06 09:53:40 CST,,0,LOG,00000,"entering standby mode",,,,,,,,"StartupXLOG, xlog.c:6133","" 2013-08-06 09:53:40.585 CST,,,13969,,52005724.3691,3,,2013-08-06 09:53:40 CST,,0,LOG,00000,"database system was not properly shut down; automatic recovery in progress",,,,,,,,"StartupXLOG, xlog.c:6443","" 2013-08-06 09:53:40.587 CST,,,13969,,52005724.3691,4,,2013-08-06 09:53:40 CST,1/0,0,LOG,00000,"redo starts at 0/74BAF018",,,,,,,,"StartupXLOG, xlog.c:6687","" 2013-08-06 09:53:45.958 CST,,,13969,,52005724.3691,5,,2013-08-06 09:53:40 CST,1/0,0,LOG,00000,"record with zero length at 0/89217048",,,,,,,,"ReadRecord, xlog.c:4420","" 2013-08-06 09:53:45.959 CST,,,13969,,52005724.3691,6,,2013-08-06 09:53:40 CST,1/0,0,LOG,00000,"consistent recovery state reached at 0/89217048",,,,,,,,"CheckRecoveryConsistency, xlog.c:7348","" 2013-08-06 09:53:45.960 CST,,,13967,,52005724.368f,1,,2013-08-06 09:53:40 CST,,0,LOG,00000,"database system is ready to accept read only connections",,,,,,,,"sigusr1_handler, postmaster.c:4737","" 2013-08-06 09:53:45.965 CST,,,13976,,52005729.3698,1,,2013-08-06 09:53:45 CST,,0,LOG,00000,"fetching timeline history file for timeline 2 from primary server",,,,,,,,"WalRcvFetchTimeLineHistoryFiles, walreceiver.c:666","" 2013-08-06 09:53:45.966 CST,,,13976,,52005729.3698,2,,2013-08-06 09:53:45 CST,,0,FATAL,XX000,"could not start WAL streaming: ERROR: requested starting point 0/89000000 on timeline 1 is not in this server's history DETAIL: This server's history forked from timeline 1 at 0/82603A30 ",,,,,,,,"libpqrcv_startstreaming, libpqwalreceiver.c:195",""
AI 代码解读
[其他]
1. 加入data_flush后 , 需要特别注意等待事件, 如果standby正常的话, 影响不大, 如果standby 性能不行或者standby 挂了, 就如同sync replication一样会带来堵塞.
2. 如果配置了synchronous_transfer = data_flush|all, 在没有已经连上来的其他standby节点的情况下, 使用pg_basebackup创建standby会处于等待状态, 因为pg_basebackup需要在master节点执行checkpoint.
3. 使用异步流复制的情况下, 测试
synchronous_transfer
= data_flush, all都不能完成主节点的failback.
4. 使用同步流复制的情况下, 测试
synchronous_transfer
= data_flush, all可以完成正常的主节点的failback.
7. 第4版补丁synchronous_transfer参数介绍
we support following 2 cases. 1. SYNC server and also make same failback safe standby server 2. ASYNC server and also make same failback safe standby server 1. changed name of parameter give up 'failback_safe_standby_names' parameter from the first patch. and changed name of parameter from 'failback_safe_mode ' to 'synchronous_transfer'. this parameter accepts 'all', 'data_flush' and 'commit'. -'commit' 'commit' means that master waits for corresponding WAL to flushed to disk of standby server on commits. but master doesn't waits for replicated data pages. -'data_flush' 'data_flush' means that master waits for replicated data page (e.g, CLOG, pg_control) before flush to disk of master server. but if user set to 'data_flush' to this parameter, 'synchronous_commit' values is ignored even if user set 'synchronous_commit'. -'all' 'all' means that master waits for replicated WAL and data page. 2. put SyncRepWaitForLSN() function into XLogFlush() function we have put SyncRepWaitForLSN() function into XLogFlush() function, and change argument of XLogFlush(). they are setup case and need to set parameters. - SYNC server and also make same failback safe standgy server (case 1) synchronous_transfer = all synchronous_commit = remote_write/on synchronous_standby_names = <ServerName> - ASYNC server and also make same failback safe standgy server (case 2) synchronous_transfer = data_flush (synchronous_commit values is ignored) - default SYNC replication synchronous_transfer = commit synchronous_commit = on synchronous_standby_names = <ServerName> - default ASYNC replication synchronous_transfer = commit ToDo 1. currently this patch supports synchronous transfer. so we can't set different synchronous transfer mode to each server. we need to improve the patch for support following cases. - SYNC standby and make separate ASYNC failback safe standby - ASYNC standby and make separate ASYNC failback safe standby 2. we have not measure performance yet. we need to measure perfomance. please give me your feedback. Regards, ------- Sawada Masahiko
AI 代码解读