PolarDB开源数据库进阶课7 实时流式归档-阿里云开发者社区

PolarDB开源数据库进阶课7 实时流式归档

2025-02-21 179

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

本文涉及的产品

云原生数据库 PolarDB PostgreSQL 版，标准版 2核4GB 50GB

云原生数据库 PolarDB MySQL 版，通用型 2核8GB 50GB

简介： 本文介绍了如何在PolarDB RAC一写多读集群中实现实时归档，确保WAL日志的及时备份。实验依赖于Docker容器和loop设备模拟的共享存储环境。通过配置主节点的`pg_hba.conf`、创建复制槽以及使用`pg_receivewal`工具，实现实时接收并归档WAL文件。此外，还提供了详细的命令行帮助和相关文档链接，方便读者参考和操作。注意：如果已搭建容灾节点，则无需重复进行实时归档。

背景

穷鬼玩PolarDB RAC一写多读集群系列已经写了几篇:

本篇文章介绍一下如何进行实时归档? 实验环境依赖《在Docker容器中用loop设备模拟共享存储》 , 如果没有环境, 请自行参考以上文章搭建环境.

还需要参考如下文档:

https://www.postgresql.org/docs/current/app-pgreceivewal.html

在线归档需要等wal文件切换时才会进行copy, PolarDB默认的wal文件大小是1GB, 这样可能会有一些弊端:

在主节点进行归档时, copy可能会带来较大的突发IO.
如果存储故障, 可能丢失未归档的wal日志, 最多1GB.

所以这篇文档想介绍一下实时归档. 注意: 如果你已经建立了PolarDB容灾(standby)节点, 可以忽略这篇文档, 因为容灾节点本身就有实时接收WAL, 在容灾节点归档即可, 没有必要再接收一份. 当然了, 如果你就是喜欢多弄几份WAL归档, 请继续.

DEMO

b站视频链接

Youtube视频链接

1、新建docker容器 pb4, 作为实时归档机.

cd ~/data_volumn    
PWD=`pwd`    
    
docker run -d -it -v $PWD:/data -P --shm-size=1g --cap-add=SYS_PTRACE --cap-add SYS_ADMIN --privileged=true --name pb4 registry.cn-hangzhou.aliyuncs.com/polardb_pg/polardb_pg_devel:ubuntu22.04 bash

在宿主机映射到docker容器的目录内创建实时归档目录.

# 进入容器pb4  
docker exec -ti pb4 bash  
  
# 创建实时归档目录  
mkdir /data/polardb_wal_archive/

将编译好的二进制拷贝到pb4的HOME目录, 便于调用:

$ cp -r /data/polardb/tmp_polardb_pg_15_base ~/
# 或者使用软链也可以, 软链还有个好处, 编译插件时只需要编译一次即可:   
$ ln -s /data/polardb/tmp_polardb_pg_15_base ~/   
    
    
$ which psql    
/home/postgres/tmp_polardb_pg_15_base/bin/psql

2、pb1 primary节点, 配置pg_hba.conf, 允许备份机使用流复制链接

# 进入容器pb1  
docker exec -ti pb1 bash  
  
# 由于这里pb4网段已经配置了, 所以不需要再配置, 如下:   
cat ~/primary/pg_hba.conf  
  
host  replication postgres  172.17.0.0/16 trust   
  
# 如果有修改, 需要reload  
# pg_ctl reload -D ~/primary

3、pb1 primary节点, 创建 replication slot, 防止实时归档机未接收的WAL被主节点删除.

psql -p 5432 -d postgres -c "SELECT pg_create_physical_replication_slot('online_archwal_1');"    
    
 pg_create_physical_replication_slot   
-------------------------------------  
 (online_archwal_1,)  
(1 row)

4、pb4 实时归档机, 使用pg_receivewal工具开始实时归档

编辑接收wal的脚本, 如果pg_receivewal进程退出了可以自动重启.

vi /home/postgres/wal.sh

脚本如下:

#!/bin/bash  
  
while true   
do  
  # 假设实时归档机上只有一个pg_receivewal命令, 没有其他pg_receivewal命令.  
  if pgrep -x "pg_receivewal" >/dev/null  
  then  
    echo "pg_receivewal 进程正在运行"   
  else  
    # 如果想降低IO频率, 可以去掉 --synchronous  
    nohup pg_receivewal --synchronous -D /data/polardb_wal_archive -S online_archwal_1 -d "host=172.17.0.2 port=5432 user=postgres dbname=replication" >>/data/polardb_wal_archive/arch.log 2>&1 &     
  fi  
  
  sleep 10  
done

修改脚本权限

chmod 555 /home/postgres/wal.sh

开启归档

nohup /home/postgres/wal.sh >/dev/null 2>&1 &

实际的生产环境当中, 可以把wal.sh配置为服务器启动/容器启动时自动执行.

5、检查归档是否正常

pb4 实时归档机. 接收已经开始了.

$ ll -h /data/polardb_wal_archive  
total 1.1G  
drwxr-xr-x 4 postgres postgres  128 Dec 19 11:17 ./  
drwxr-xr-x 9 postgres postgres  288 Dec 18 17:53 ../  
-rw------- 1 postgres postgres 1.0G Dec 19 11:17 000000010000000200000000.partial  
-rw-r--r-- 1 postgres postgres  334 Dec 19 11:16 arch.log

pb1, PolarDB primary节点. 查看复制槽, 可以看到连接已建立.

postgres=# select * from pg_stat_replication where application_name='pg_receivewal';  
-[ RECORD 1 ]----+------------------------------  
pid              | 4204  
usesysid         | 10  
usename          | postgres  
application_name | pg_receivewal  
client_addr      | 172.17.0.5  
client_hostname  |   
client_port      | 52744  
backend_start    | 2024-12-19 11:17:10.982052+08  
backend_xmin     |   
state            | streaming  
sent_lsn         | 2/658  
write_lsn        | 2/658  
flush_lsn        | 2/658  
replay_lsn       |   
write_lag        | 00:00:12.95224  
flush_lag        | 00:01:14.971488  
replay_lag       | 00:01:14.971488  
sync_priority    | 0  
sync_state       | async  
reply_time       | 2024-12-19 11:18:25.957865+08

6、在pb1, PolarDB primary节点写入一些数据, 可以看到pg_stat_replication.sent_lsn的变化.

postgres=# create table test (id int, info text, ts timestamp);  
CREATE TABLE  
postgres=# insert into test select generate_series(1,100),md5(random()::text),clock_timestamp();  
INSERT 0 100

接收wal位置已推进

postgres=# select * from pg_stat_replication where application_name='pg_receivewal';  
-[ RECORD 1 ]----+------------------------------  
pid              | 4285  
usesysid         | 10  
usename          | postgres  
application_name | pg_receivewal  
client_addr      | 172.17.0.5  
client_hostname  |   
client_port      | 43402  
backend_start    | 2024-12-19 11:32:27.928248+08  
backend_xmin     |   
state            | streaming  
sent_lsn         | 2/40021960  
write_lsn        | 2/40021960  
flush_lsn        | 2/40021960 
replay_lsn       |   
write_lag        | 00:00:05.118327  
flush_lag        | 00:00:10.002961  
replay_lag       | 00:00:10.002961  
sync_priority    | 0  
sync_state       | async  
reply_time       | 2024-12-19 11:32:37.937766+08

slot状态正常

postgres=# select * from pg_replication_slots where slot_name='online_archwal_1';  
-[ RECORD 1 ]-------+-----------------  
slot_name           | online_archwal_1  
plugin              |   
slot_type           | physical  
datoid              |   
database            |   
temporary           | f  
active              | t  
active_pid          | 4285  
xmin                |   
catalog_xmin        |   
restart_lsn         | 2/40021960 
confirmed_flush_lsn |   
wal_status          | reserved  
safe_wal_size       |   
two_phase           | f

附, pg_receivewal 命令行帮助:

$ pg_receivewal --help  
pg_receivewal receives PostgreSQL streaming write-ahead logs.  
  
Usage:  
  pg_receivewal [OPTION]...  
  
Options:  
  -D, --directory=DIR    receive write-ahead log files into this directory  
  -E, --endpos=LSN       exit after receiving the specified LSN  
      --if-not-exists    do not error if slot already exists when creating a slot  
  -n, --no-loop          do not loop on connection lost  
      --no-sync          do not wait for changes to be written safely to disk  
  -s, --status-interval=SECS  
                         time between status packets sent to server (default: 10)  
  -S, --slot=SLOTNAME    replication slot to use  
      --synchronous      flush write-ahead log immediately after writing  
  -v, --verbose          output verbose messages  
  -V, --version          output version information, then exit  
  -Z, --compress=METHOD[:DETAIL]  
                         compress as specified  
  -?, --help             show this help, then exit  
  
Connection options:  
  -d, --dbname=CONNSTR   connection string  
  -h, --host=HOSTNAME    database server host or socket directory  
  -p, --port=PORT        database server port number  
  -U, --username=NAME    connect as specified database user  
  -w, --no-password      never prompt for password  
  -W, --password         force password prompt (should happen automatically)  
  
Optional actions:  
      --create-slot      create a new replication slot (for the slot's name see --slot)  
      --drop-slot        drop the replication slot (for the slot's name see --slot)  
  
Report bugs to <pgsql-bugs@lists.postgresql.org>.  
PostgreSQL home page: <https://www.postgresql.org/>