背景
穷鬼玩PolarDB RAC一写多读集群系列已经写了几篇:
本篇文章介绍一下如何进行实时归档? 实验环境依赖 《在Docker容器中用loop设备模拟共享存储》 , 如果没有环境, 请自行参考以上文章搭建环境.
还需要参考如下文档:
在线归档需要等wal文件切换时才会进行copy, PolarDB默认的wal文件大小是1GB, 这样可能会有一些弊端:
- 在主节点进行归档时, copy可能会带来较大的突发IO.
- 如果存储故障, 可能丢失未归档的wal日志, 最多1GB.
所以这篇文档想介绍一下实时归档. 注意: 如果你已经建立了PolarDB容灾(standby)节点, 可以忽略这篇文档, 因为容灾节点本身就有实时接收WAL, 在容灾节点归档即可, 没有必要再接收一份. 当然了, 如果你就是喜欢多弄几份WAL归档, 请继续.
DEMO
1、新建docker容器 pb4, 作为实时归档机.
cd ~/data_volumn PWD=`pwd` docker run -d -it -v $PWD:/data -P --shm-size=1g --cap-add=SYS_PTRACE --cap-add SYS_ADMIN --privileged=true --name pb4 registry.cn-hangzhou.aliyuncs.com/polardb_pg/polardb_pg_devel:ubuntu22.04 bash
在宿主机映射到docker容器的目录内创建实时归档目录.
# 进入容器pb4 docker exec -ti pb4 bash # 创建实时归档目录 mkdir /data/polardb_wal_archive/
将编译好的二进制拷贝到pb4的HOME目录, 便于调用:
$ cp -r /data/polardb/tmp_polardb_pg_15_base ~/ # 或者使用软链也可以, 软链还有个好处, 编译插件时只需要编译一次即可: $ ln -s /data/polardb/tmp_polardb_pg_15_base ~/ $ which psql /home/postgres/tmp_polardb_pg_15_base/bin/psql
2、pb1 primary节点, 配置pg_hba.conf, 允许备份机使用流复制链接
# 进入容器pb1 docker exec -ti pb1 bash # 由于这里pb4网段已经配置了, 所以不需要再配置, 如下: cat ~/primary/pg_hba.conf host replication postgres 172.17.0.0/16 trust # 如果有修改, 需要reload # pg_ctl reload -D ~/primary
3、pb1 primary节点, 创建 replication slot, 防止实时归档机未接收的WAL被主节点删除.
psql -p 5432 -d postgres -c "SELECT pg_create_physical_replication_slot('online_archwal_1');" pg_create_physical_replication_slot ------------------------------------- (online_archwal_1,) (1 row)
4、pb4 实时归档机, 使用pg_receivewal工具开始实时归档
编辑接收wal的脚本, 如果pg_receivewal进程退出了可以自动重启.
vi /home/postgres/wal.sh
脚本如下:
#!/bin/bash while true do # 假设实时归档机上只有一个pg_receivewal命令, 没有其他pg_receivewal命令. if pgrep -x "pg_receivewal" >/dev/null then echo "pg_receivewal 进程正在运行" else # 如果想降低IO频率, 可以去掉 --synchronous nohup pg_receivewal --synchronous -D /data/polardb_wal_archive -S online_archwal_1 -d "host=172.17.0.2 port=5432 user=postgres dbname=replication" >>/data/polardb_wal_archive/arch.log 2>&1 & fi sleep 10 done
修改脚本权限
chmod 555 /home/postgres/wal.sh
开启归档
nohup /home/postgres/wal.sh >/dev/null 2>&1 &
实际的生产环境当中, 可以把wal.sh
配置为服务器启动/容器启动时自动执行.
5、检查归档是否正常
pb4 实时归档机. 接收已经开始了.
$ ll -h /data/polardb_wal_archive total 1.1G drwxr-xr-x 4 postgres postgres 128 Dec 19 11:17 ./ drwxr-xr-x 9 postgres postgres 288 Dec 18 17:53 ../ -rw------- 1 postgres postgres 1.0G Dec 19 11:17 000000010000000200000000.partial -rw-r--r-- 1 postgres postgres 334 Dec 19 11:16 arch.log
pb1, PolarDB primary节点. 查看复制槽, 可以看到连接已建立.
postgres=# select * from pg_stat_replication where application_name='pg_receivewal'; -[ RECORD 1 ]----+------------------------------ pid | 4204 usesysid | 10 usename | postgres application_name | pg_receivewal client_addr | 172.17.0.5 client_hostname | client_port | 52744 backend_start | 2024-12-19 11:17:10.982052+08 backend_xmin | state | streaming sent_lsn | 2/658 write_lsn | 2/658 flush_lsn | 2/658 replay_lsn | write_lag | 00:00:12.95224 flush_lag | 00:01:14.971488 replay_lag | 00:01:14.971488 sync_priority | 0 sync_state | async reply_time | 2024-12-19 11:18:25.957865+08
6、在pb1, PolarDB primary节点写入一些数据, 可以看到pg_stat_replication.sent_lsn
的变化.
postgres=# create table test (id int, info text, ts timestamp); CREATE TABLE postgres=# insert into test select generate_series(1,100),md5(random()::text),clock_timestamp(); INSERT 0 100
接收wal位置已推进
postgres=# select * from pg_stat_replication where application_name='pg_receivewal'; -[ RECORD 1 ]----+------------------------------ pid | 4285 usesysid | 10 usename | postgres application_name | pg_receivewal client_addr | 172.17.0.5 client_hostname | client_port | 43402 backend_start | 2024-12-19 11:32:27.928248+08 backend_xmin | state | streaming sent_lsn | 2/40021960 write_lsn | 2/40021960 flush_lsn | 2/40021960 replay_lsn | write_lag | 00:00:05.118327 flush_lag | 00:00:10.002961 replay_lag | 00:00:10.002961 sync_priority | 0 sync_state | async reply_time | 2024-12-19 11:32:37.937766+08
slot状态正常
postgres=# select * from pg_replication_slots where slot_name='online_archwal_1'; -[ RECORD 1 ]-------+----------------- slot_name | online_archwal_1 plugin | slot_type | physical datoid | database | temporary | f active | t active_pid | 4285 xmin | catalog_xmin | restart_lsn | 2/40021960 confirmed_flush_lsn | wal_status | reserved safe_wal_size | two_phase | f
附, pg_receivewal 命令行帮助:
$ pg_receivewal --help pg_receivewal receives PostgreSQL streaming write-ahead logs. Usage: pg_receivewal [OPTION]... Options: -D, --directory=DIR receive write-ahead log files into this directory -E, --endpos=LSN exit after receiving the specified LSN --if-not-exists do not error if slot already exists when creating a slot -n, --no-loop do not loop on connection lost --no-sync do not wait for changes to be written safely to disk -s, --status-interval=SECS time between status packets sent to server (default: 10) -S, --slot=SLOTNAME replication slot to use --synchronous flush write-ahead log immediately after writing -v, --verbose output verbose messages -V, --version output version information, then exit -Z, --compress=METHOD[:DETAIL] compress as specified -?, --help show this help, then exit Connection options: -d, --dbname=CONNSTR connection string -h, --host=HOSTNAME database server host or socket directory -p, --port=PORT database server port number -U, --username=NAME connect as specified database user -w, --no-password never prompt for password -W, --password force password prompt (should happen automatically) Optional actions: --create-slot create a new replication slot (for the slot's name see --slot) --drop-slot drop the replication slot (for the slot's name see --slot) Report bugs to <pgsql-bugs@lists.postgresql.org>. PostgreSQL home page: <https://www.postgresql.org/>
附, libpq connstr配置.
参考
《穷鬼玩PolarDB RAC一写多读集群系列 | 在Docker容器中用loop设备模拟共享存储》
《穷鬼玩PolarDB RAC一写多读集群系列 | 如何搭建PolarDB容灾(standby)节点》
《穷鬼玩PolarDB RAC一写多读集群系列 | 共享存储在线扩容》
《穷鬼玩PolarDB RAC一写多读集群系列 | 计算节点 Switchover》
《穷鬼玩PolarDB RAC一写多读集群系列 | 在线备份》
《穷鬼玩PolarDB RAC一写多读集群系列 | 在线归档》
https://www.postgresql.org/docs/current/app-pgreceivewal.html