背景
穷鬼玩PolarDB RAC一写多读集群系列已经写了几篇:
本篇文章介绍一下如何进行在线归档? 实验环境依赖 《在Docker容器中用loop设备模拟共享存储》 , 如果没有环境, 请自行参考以上文章搭建环境.
还需要参考如下文档:
- https://www.postgresql.org/docs/current/continuous-archiving.html
- https://www.postgresql.org/docs/17/runtime-config-wal.html
DEMO
1、在macOS宿主机创建一块新的虚拟磁盘(这里10GB容量), pfs将用这个设备来存储归档wal文件.
cd ~/data_volumn dd if=/dev/zero of=./arch.img bs=1m count=10240 oflag=direct
2、在PolarDB集群任一计算节点上挂载loop设备
# 查找可用loopN postgres@dad7520e4620:~/primary$ losetup -f /dev/loop3 # 创建loop sudo losetup --direct-io=on /dev/loop3 /data/arch.img
当前有这些loop设备:
postgres@dad7520e4620:~/primary$ losetup -a /dev/loop1: [0176]:36827 (/VirtualDisk.img) /dev/loop2: [0176]:109941 (/VirtualDisk_standby.img) /dev/loop0: [65025]:3407876 (/mount-services-cache/entries/services.iso/16203eaee29495c5a513341af851be45c30488427f5e3c87764e8b89a4a47d7a) /dev/loop3: [0176]:141665 (/data/arch.img)
并在所有的(这里是p1 primary, p2 replica)计算节点创建软链
# 注意: 容器重启后, 这个软链就不存在了, 需重新创建该软链 postgres@dad7520e4620:~/primary$ sudo ln -s /dev/loop3 /dev/nvme3n1 postgres@60d4e0a501f7:~/replica1$ sudo ln -s /dev/loop3 /dev/nvme3n1
3、在PolarDB集群任意节点使用pfs格式化设备 nvme3n1
# 千万别写错设备名咯 sudo pfs -C disk mkfs -f nvme3n1
格式化成功, 日志如下:
pfs tool cmd record:mkfs -f nvme3n1 [PFS_LOG] Dec 18 17:15:43.497383 INF [998] pfs build version:libpfs_version_("pfsd-build-desc-_-Wed Sep 4 17:22:25 CST 2024") [PFS_LOG] Dec 18 17:15:43.497482 INF [998] pid: 997, caller: sudo pfs -C disk mkfs -f nvme3n1 [PFS_LOG] Dec 18 17:15:43.497519 INF [998] pid: 996, caller: sudo pfs -C disk mkfs -f nvme3n1 [PFS_LOG] Dec 18 17:15:43.497532 INF [998] pid: 12, caller: bash [PFS_LOG] Dec 18 17:15:43.497665 INF [998] open device cluster disk, devname nvme3n1, flags 0x13 [PFS_LOG] Dec 18 17:15:43.497693 INF [998] disk dev path: /dev/nvme3n1 [PFS_LOG] Dec 18 17:15:43.497702 INF [998] open local disk: open(/dev/nvme3n1, 0x10002) [PFS_LOG] Dec 18 17:15:43.497726 INF [998] ioctl status 0 [PFS_LOG] Dec 18 17:15:43.497739 INF [998] pfs_diskdev_info get pi_pbdno 0, pi_rwtype 1, pi_unitsize 4194304, pi_chunksize 10737418240, pi_disksize 10737418240 [PFS_LOG] Dec 18 17:15:43.497745 INF [998] pfs_diskdev_info waste size: 0 [PFS_LOG] Dec 18 17:15:43.497782 INF [998] disk size 0x280000000, chunk size 0x280000000 [PFS_LOG] Dec 18 17:15:43.498998 ERR [998] chunk 0 pfs magic mismatch 0 vs 0x5046534348 [PFS_LOG] Dec 18 17:15:43.499010 INF [998] mkfs PBD nvme3n1 isn't formatted Init chunk 0 metaset 0/1: sectbda 0x1000, npage 80, objsize 128, nobj 2560, oid range [ 0, a00) metaset 0/2: sectbda 0x51000, npage 64, objsize 128, nobj 2048, oid range [ 0, 800) metaset 0/3: sectbda 0x91000, npage 64, objsize 128, nobj 2048, oid range [ 0, 800) Inited filesystem(10737418240 bytes), 1 chunks, 2560 blktags, 2048 direntries, 2048 inodes per chunk making paxos file init paxos lease making journal file pfs mkfs succeeds!
4、在PolarDB集群所有节点, 启动pfsd, 负责这块盘的IO.
pb1
sudo /usr/local/polarstore/pfsd/bin/start_pfsd.sh -p nvme3n1 -w 1
pb2
sudo /usr/local/polarstore/pfsd/bin/start_pfsd.sh -p nvme3n1 -w 1
5、在PolarDB集群任意节点, 使用pfs创建目录 /nvme3n1/archive/
用来存放PolarDB wal归档文件
postgres@dad7520e4620:~/primary$ sudo pfs -C disk mkdir /nvme3n1/archive # 创建成功: postgres@dad7520e4620:~/primary$ sudo pfs -C disk ls /nvme3n1/ File 1 4194304 Wed Dec 18 17:15:43 2024 .pfs-paxos File 1 1073741824 Wed Dec 18 17:15:45 2024 .pfs-journal Dir 1 0 Wed Dec 18 17:16:31 2024 archive total 2105344 (unit: 512Bytes)
6、配置archive_mode
和archive_command
注意: 请 不要 使用以下 PolarDB集群配置命令, 除非你永远不会修改这个配置.
# 不推荐的示范 : postgres=# alter system FOR CLUSTER set archive_mode = on; ALTER SYSTEM postgres=# alter system FOR CLUSTER set archive_command = 'pfs -C disk stat /nvme3n1/archive/%f && pfs -C disk cp %p /nvme3n1/archive/%f'; ALTER SYSTEM
为什么不推荐配置集群文件?
因为 PolarDB 可能会自动把集群配置文件polar_settings.conf的内容拷贝到实例postgresql.auto.conf内. 并且postgresql.auto.conf的优先级高于polar_settings.conf, 将来你想修改archive_mode和archive_command还是要去节点的配置文件postgresql.auto.conf里修改, 那我为什么不直接配置postgresql.auto.conf呢?
配置文件优先顺序参考自以下:
src/backend/utils/polar_parameters_manage/polar_parameters_sample/README Config File Priority high to low: 1. postgresql.auto.conf 2. postgresql.conf 3. polar_settings.conf
修改PolarDB集群文件配置参考自以下:
src/backend/utils/misc/guc.c /* * POLAR: ALTER SYSTEM FOR CLUSTER * * polar_settings is a configuration file placed in the shared storage * with the instance level. When starting, we copy a copy and put it * locally.(See in polar_mount_and_extra_load_polar_settings) When * change it, we first change the local file, and then copy the local * file into the shared storage to achieve the consistency of local * file and global file. * * But in RO node, just change local file is enough. */ if (polar_alter_cluster && !polar_is_replica()) { /* * In the shared storage mode, we need to modify the * polar_settings.conf on the shared storage at the same time. * Copy the tmp file here */ if (polar_enable_shared_storage_mode) { /* * If there is a temp file left over due to a previous crash, * it's okay to truncate and reuse it. */ (void) polar_unlink(GlobalAllSettingTmpFileName); polar_copy_file(AutoConfTmpFileName, GlobalAllSettingTmpFileName, false); }
推荐的配置如下:
6.1、修改pb1 primary节点配置 下面archive_command这个pfs cp命令有问题, 后面有解释.
postgres=# alter system set archive_mode = on; ALTER SYSTEM postgres=# alter system set archive_command = 'pfs -C disk stat /nvme3n1/archive/%f || pfs -C disk cp %p /nvme3n1/archive/%f'; ALTER SYSTEM
6.2、修改pb2 replica节点配置 下面archive_command这个pfs cp命令有问题, 后面有解释.
postgres=# alter system set archive_mode = on; ALTER SYSTEM postgres=# alter system set archive_command = 'pfs -C disk stat /nvme3n1/archive/%f || pfs -C disk cp %p /nvme3n1/archive/%f'; ALTER SYSTEM
6.3、重启pb2 replica节点 (修改了archive_mode必须重启, 修改archive_command只需要reload conf.)
pg_ctl restart -m fast -D ~/replica1
检查配置是否生效?
postgres@60d4e0a501f7:~/replica1$ psql psql (PostgreSQL 15.10 (PolarDB 15.10.2.0 build d4f5477d debug) on aarch64-linux-gnu) Type "help" for help. postgres=# show archive_mode; archive_mode -------------- on (1 row) postgres=# show archive_command; archive_command ------------------------------------------------------------------------------- pfs -C disk stat /nvme3n1/archive/%f || pfs -C disk cp %p /nvme3n1/archive/%f (1 row)
6.4、重启pb1 primary节点 (修改了archive_mode必须重启, 修改archive_command只需要reload conf.)
pg_ctl restart -m fast -D ~/primary
检查配置是否生效?
postgres@dad7520e4620:~/primary$ psql psql (PostgreSQL 15.10 (PolarDB 15.10.2.0 build d4f5477d debug) on aarch64-linux-gnu) Type "help" for help. postgres=# show archive_mode; archive_mode -------------- on (1 row) postgres=# show archive_command; archive_command ------------------------------------------------------------------------------- pfs -C disk stat /nvme3n1/archive/%f || pfs -C disk cp %p /nvme3n1/archive/%f (1 row)
7、检查是否会自动归档?
在primary 节点, 切换wal日志文件
postgres=# select pg_switch_wal(); pg_switch_wal --------------- 1/80000160 (1 row)
不出意外的话, 马上就要出意外了, 在nvme3n1内查看不到归档文件.
$ pfs -C disk ls /nvme3n1/archive/ cant open logfile /var/log/pfs-nvme3n1.log [PFSD_SDK INF Dec 18 17:28:29.538610][409]pfs_mount_prepare 103: begin prepare mount cluster(disk), PBD(nvme3n1), hostid(0),flags(0x1011) [PFSD_SDK INF Dec 18 17:28:29.594918][409]chnl_connection_poll_shm 1238: ack data update s_mount_epoch 1 [PFSD_SDK INF Dec 18 17:28:29.594939][409]chnl_connection_poll_shm 1266: connect and got ack data from svr, err = 0, mntid 0 [PFSD_SDK INF Dec 18 17:28:29.595223][409]pfsd_sdk_init 191: pfsd_chnl_connect success pfs tool cmd record:ls /nvme3n1/archive/ [PFS_LOG] Dec 18 17:28:29.595240 INF [409] pfs build version:libpfs_version_("pfsd-build-desc-_-Wed Sep 4 17:22:25 CST 2024") [PFS_LOG] Dec 18 17:28:29.595301 INF [409] pid: 12, caller: bash total 0 (unit: 512Bytes) [PFSD_SDK INF Dec 18 17:28:29.595729][409]pfsd_umount 272: pbdname nvme3n1 [PFSD_SDK INF Dec 18 17:28:29.631142][409]chnl_connection_release_shm 1164: client umount return : deleted /var/run/pfsd//nvme3n1/409.pid [PFSD_SDK INF Dec 18 17:28:29.631160][409]pfsd_umount 286: umount success for nvme3n1
查看PolarDB日志文件, 发现归档命令有报错:
stat failed: No such file or directory pfs tool cmd record:cp /nvme1n1/shared_data//pg_wal/000000010000000000000001 /nvme3n1/archive/000000010000000000000001 [PFS_LOG] Dec 18 17:31:33.695839 INF [1363] pfs build version:libpfs_version_("pfsd-build-desc-_-Wed Sep 4 17:22:25 CST 2024") [PFS_LOG] Dec 18 17:31:33.695924 INF [1363] pid: 1356, caller: sh -c pfs -C disk stat /nvme3n1/archive/000000010000000000000001 || pfs -C disk cp /nvme1n1/shared_data//pg_wal/000000010000000000000001 /n vme3n1/archive/000000010000000000000001 [PFS_LOG] Dec 18 17:31:33.695942 INF [1363] pid: 1355, caller: postgres(5432): archiver archiving 000000010000000000000001 [PFS_LOG] Dec 18 17:31:33.695959 INF [1363] pid: 1109, caller: /home/postgres/tmp_polardb_pg_15_base/bin/postgres -D /home/postgres/primary [PFS_LOG] Dec 18 17:31:33.695976 ERR [1363] cp: different pbdnames are found [PFS_LOG] Dec 18 17:31:33.695982 ERR [1363] cp failed: impl specific cp failed: impl specific 1355 2024-12-18 17:31:33.703 CST FATAL: archive command failed with exit code 255 1355 2024-12-18 17:31:33.703 CST DETAIL: The failed archive command was: pfs -C disk stat /nvme3n1/archive/000000010000000000000001 || pfs -C disk cp /nvme1n1/shared_data//pg_wal /000000010000000000000001 /nvme3n1/archive/000000010000000000000001 1355 2024-12-18 17:31:33.703 CST BACKTRACE: Process Info: pid: 1355 type: archiver sql: <NULL> Backtrace: #0 errfinish+0x220 [0xaaaab5ecea50] #1 shell_archive_file+0x2f0 [0xaaaab5bd43b0] #2 pgarch_archiveXlog+0x84 [0xaaaab5bc91a8] #3 pgarch_ArchiverCopyLoop+0x20c [0xaaaab5bc9020] #4 pgarch_MainLoop+0xa4 [0xaaaab5bc8d44] #5 PgArchiverMain+0x194 [0xaaaab5bc8bd4] #6 AuxiliaryProcessMain+0x288 [0xaaaab5bc2524] #7 StartChildProcess+0x60 [0xaaaab5bd27c0] #8 ServerLoop+0x638 [0xaaaab5bcc258] #9 PostmasterMain+0x15d0 [0xaaaab5bcb628] #10 main+0x380 [0xaaaab5abde60] #11 __libc_init_first+0x7c [0xffff7fb573fc] #12 __libc_start_main+0x98 [0xffff7fb574cc] #13 _start+0x30 [0xaaaab56a1f70]
把这个命令拿出来手工执行, 报错如下. 难道 pfs不支持跨 pbdnames 拷贝? 那么应该如何配置PolarDB归档呢?
postgres@dad7520e4620:~/primary/pg_log$ pfs -C disk cp /nvme1n1/shared_data//pg_wal/000000010000000000000001 /nvme3n1/archive/000000010000000000000001 pfs tool cmd record:cp /nvme1n1/shared_data//pg_wal/000000010000000000000001 /nvme3n1/archive/000000010000000000000001 [PFS_LOG] Dec 18 17:29:56.905781 INF [1322] pfs build version:libpfs_version_("pfsd-build-desc-_-Wed Sep 4 17:22:25 CST 2024") [PFS_LOG] Dec 18 17:29:56.905877 INF [1322] pid: 12, caller: bash [PFS_LOG] Dec 18 17:29:56.905911 ERR [1322] cp: different pbdnames are found [PFS_LOG] Dec 18 17:29:56.905913 ERR [1322] cp failed: impl specific cp failed: impl specific
正解: 找了pfs研发, 确认pfs可以跨pbdnames设备拷贝. 是我的使用姿势错误, 需要指定 源S 和 目标D -S disk -D disk
. (disk代表的是nvme盘这种形态, pfs还支持polarstore, 不指定的话默认就是polarstore形态.)
# 这样就可以了 pfs cp -S disk -D disk /nvme1n1/shared_data//pg_wal/000000010000000000000001 /nvme3n1/archive/000000010000000000000001 # 前面的archive_command也需要调整一下, 我懒得改了: alter system set archive_command = 'pfs -C disk stat /nvme3n1/archive/%f || pfs cp -S disk -D disk %p /nvme3n1/archive/%f';
日志如下:
$ pfs cp -S disk -D disk /nvme1n1/shared_data//pg_wal/000000010000000200000002 /nvme3n1/archive/000000010000000200000002 pfs tool cmd record:cp -S disk -D disk /nvme1n1/shared_data//pg_wal/000000010000000200000002 /nvme3n1/archive/000000010000000200000002 [PFS_LOG] Dec 19 16:12:42.285195 INF [5187] pfs build version:libpfs_version_("pfsd-build-desc-_-Wed Sep 4 17:22:25 CST 2024") [PFS_LOG] Dec 19 16:12:42.285305 INF [5187] pid: 12, caller: bash copy file from /nvme1n1/shared_data//pg_wal/000000010000000200000002 to /nvme3n1/archive/000000010000000200000002 succeeded
附pfs cp
命令帮助:
$ pfs cp -h pfs cp [options] srcpath dstpath -r, --recursive copy directories recursively -f, --force force copy; overwrite existent file -S, --src_cluster source cluster name -D, --dst_cluster destination cluster name -s, --src_hostid source cluster hostid for cp between pbd -d, --dst_hostid destination cluster hostid for cp between pbd
接下来本文解决办法比较暴力: 使用同一块数据共享盘来进行归档, 典型的遇到问题不正面解决.
或许你想把pfs里的wal归档到本地文件系统或OSS对象存储内, 可以有2种方式: 1、看看pfs_fuse: https://github.com/ApsaraDB/PolarDB-FileSystem/blob/master/Readme-FUSE-CN.md 2、试试 pg_receivewal 通过流复制进行实时归档: https://www.postgresql.org/docs/current/app-pgreceivewal.html
在PolarDB集群任意节点, 在数据共享盘中创建归档目录
sudo pfs -C disk mkdir /nvme1n1/archive
修改pb1,pb2 primary和replica节点配置
postgres=# alter system set archive_command = 'pfs -C disk stat /nvme1n1/archive/%f || pfs -C disk cp %p /nvme1n1/archive/%f'; ALTER SYSTEM postgres=# select pg_reload_conf(); pg_reload_conf ---------------- t (1 row)
在PolarDB集群primary节点, 切换日志. 别切太多次了, 容易把磁盘跑满, PolarDB默认每个wal文件有1个GB.
postgres=# select pg_switch_wal(); pg_switch_wal --------------- 1/C0000160 (1 row)
现在归档正常了
postgres@60d4e0a501f7:~/replica1$ pfs -C disk ls /nvme1n1/archive/ File 1 227 Wed Dec 18 16:52:46 2024 polar_non_exclusive_backup_label File 1 1048576000 Wed Dec 18 17:40:37 2024 000000010000000000000001 File 1 1048576000 Wed Dec 18 17:40:40 2024 000000010000000000000002 total 4112384 (unit: 512Bytes)
8、(可选) 前面的方法把wal归档到pfs中. 如果你想把wal归档到本地文件系统中, 也可以使用pfs命令行工具. 方法介绍如下.
pfs cp
拷贝文件用法如下:
$ pfs cp -h pfs cp [options] srcpath dstpath -r, --recursive copy directories recursively -f, --force force copy; overwrite existent file -S, --src_cluster source cluster name -D, --dst_cluster destination cluster name -s, --src_hostid source cluster hostid for cp between pbd -d, --dst_hostid destination cluster hostid for cp between pbd
在pb1 primary节点创建本地目录, 存放归档文件
mkdir /data/arch sudo chown postgres:postgres /data/arch
修改pb1,pb2 primary和replica节点归档命令配置, 发生wal切换时, 拷贝wal到本地目录中:
# 注意pfs cp需使用绝对路径. alter system set archive_command = 'test ! -f /data/arch/%f && pfs cp %p /data/arch/%f'; select pg_reload_conf();
实际使用时, 建议使用pg_receivewal实时归档代替在线归档. 参考: 《穷鬼玩PolarDB RAC一写多读集群系列 | 实时归档》
下次一篇文档我们来试一试时间点恢复PolarDB.
参考
《穷鬼玩PolarDB RAC一写多读集群系列 | 在Docker容器中用loop设备模拟共享存储》
《穷鬼玩PolarDB RAC一写多读集群系列 | 如何搭建PolarDB容灾(standby)节点》
《穷鬼玩PolarDB RAC一写多读集群系列 | 共享存储在线扩容》
《穷鬼玩PolarDB RAC一写多读集群系列 | 计算节点 Switchover》
《穷鬼玩PolarDB RAC一写多读集群系列 | 在线备份》
https://www.postgresql.org/docs/current/continuous-archiving.html
https://www.postgresql.org/docs/17/runtime-config-wal.html
https://github.com/ApsaraDB/PolarDB-FileSystem/blob/master/Readme-FUSE-CN.md
https://www.postgresql.org/docs/current/app-pgreceivewal.html