PolarDB开源数据库进阶课4 计算节点 Switchover

本文涉及的产品
云原生数据库 PolarDB MySQL 版,通用型 2核8GB 50GB
云原生数据库 PolarDB PostgreSQL 版,标准版 2核4GB 50GB
简介: 本文介绍了如何在PolarDB RAC一写多读集群中进行计算节点的Switchover操作。实验环境依赖于Docker容器中的loop设备模拟共享存储,确保Primary节点停止写入后,再将Replica节点提升为Primary节点,避免多节点同时写入导致的数据异常。文中详细描述了Switchover的具体步骤、常见问题及解决方案,并提供了相关配置和命令示例。通过这些步骤,可以顺利完成节点角色的切换,确保集群的高可用性和数据一致性。

背景

穷鬼玩PolarDB RAC一写多读集群系列已经写了3篇:

本篇文章介绍一下如何进行PolarDB计算节点的Switchover? 实验环境依赖 《在Docker容器中用loop设备模拟共享存储》 , 如果没有环境, 请自行参考以上文章搭建环境.

还需要参考如下文档:

DEMO

b站视频链接

Youtube视频链接

只有当 Primary 节点停止写入后,才可以将 Replica 节点提升为 Primary 节点,否则将会出现集群内多个节点同时写入存储的情况。当数据库检测到出现多节点写入时,将会导致运行异常。

1、停止p1 primary

# 进入pb1容器  
docker exec -ti pb1 bash  
  
# 停止primary  
pg_ctl stop -m fast -D ~/primary  
  
日志如下:   
waiting for server to shut down.... done  
server stopped

2、promote p2 replica 切记: promote之前请再三确认RW节点已经停止了, 因为promote的命令没有做保护(激活过程中polardb和pfs都没有所谓的分布式排他锁), 会直接激活, 所以可能出现多个RW的情况, 数据有被破坏的可能.

# 进入pb2容器  
docker exec -ti pb2 bash  
  
# promote replica  
pg_ctl promote -D ~/replica1

出现报错如下:

[PFSD_SDK INF Dec 18 14:33:38.327335][859]pfsd_umount_force 247: pbdname nvme1n1  
[PFSD_SDK INF Dec 18 14:33:38.330154][859]chnl_connection_release_shm 1164: client umount return : deleted /var/run/pfsd//nvme1n1/859.pid  
[PFSD_SDK INF Dec 18 14:33:38.330163][859]pfsd_umount_force 261: umount success for nvme1n1  
waiting for server to promote...[PFSD_SDK INF Dec 18 14:33:38.444794][859]pfsd_umount_force 247: pbdname nvme1n1  
[PFSD_SDK INF Dec 18 14:33:38.446306][859]chnl_connection_release_shm 1164: client umount return : deleted /var/run/pfsd//nvme1n1/859.pid  
[PFSD_SDK INF Dec 18 14:33:38.446314][859]pfsd_umount_force 261: umount success for nvme1n1  
.[PFSD_SDK INF Dec 18 14:33:38.690673][859]pfsd_umount_force 247: pbdname nvme1n1  
[PFSD_SDK INF Dec 18 14:33:38.693022][859]chnl_connection_release_shm 1164: client umount return : deleted /var/run/pfsd//nvme1n1/859.pid  
[PFSD_SDK INF Dec 18 14:33:38.693035][859]pfsd_umount_force 261: umount success for nvme1n1  
[PFSD_SDK INF Dec 18 14:33:38.917831][859]pfsd_umount_force 247: pbdname nvme1n1  
[PFSD_SDK INF Dec 18 14:33:38.919652][859]chnl_connection_release_shm 1164: client umount return : deleted /var/run/pfsd//nvme1n1/859.pid  
[PFSD_SDK INF Dec 18 14:33:38.919664][859]pfsd_umount_force 261: umount success for nvme1n1  
[PFSD_SDK INF Dec 18 14:33:39.364163][859]pfsd_umount_force 247: pbdname nvme1n1  
[PFSD_SDK INF Dec 18 14:33:39.390100][859]chnl_connection_release_shm 1164: client umount return : deleted /var/run/pfsd//nvme1n1/859.pid  
[PFSD_SDK INF Dec 18 14:33:39.390109][859]pfsd_umount_force 261: umount success for nvme1n1  
 stopped waiting  
pg_ctl: server did not promote in time

数据库日志如下:

$PGDATA/pg_log/postgresql-2024-12-18_142905_error.log  
  
849     2024-12-18 14:33:38.330 CST LOG:  received promote request  
849     2024-12-18 14:33:38.330 CST LOG:  async lock replay: startup@polar_alr_terminate_worker: terminate worker  
849     2024-12-18 14:33:38.333 CST LOG:  redo done at 0/484CAF88 system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 263.78 s  
849     2024-12-18 14:33:38.333 CST LOG:  last completed transaction was at log time 2024-12-18 14:29:14.656403+08  
849     2024-12-18 14:33:38.333 CST WARNING:  polar_log_read_record read records from 'currRecPtr: 0/484C5168, read record: (0/484C5168, 0/484C51E0), decoded record: (0/484C5168, 0/484C51E0, 0/0)' to 'currRecPtr: 0/484CAF88, read record: (0/484CAF88, 0/484CB000), decoded record: (0/484CAF88, 0/484CB000, 0/0)', next record is 'currRecPtr: 0/484CAF88, read record: (0/484CAF88, 0/484CB000), decoded record: (0/484CAF88, 0/484CB000, 0/0)'  
849     2024-12-18 14:33:38.380 CST WARNING:  replica can't modify file in shared storage by pfs: /nvme1n1/shared_data//base/5  
849     2024-12-18 14:33:38.424 CST WARNING:  replica can't modify file in shared storage by pfs: /nvme1n1/shared_data//base/1  
849     2024-12-18 14:33:38.465 CST WARNING:  replica can't modify file in shared storage by pfs: /nvme1n1/shared_data//base/4  
849     2024-12-18 14:33:38.509 CST WARNING:  replica can't modify file in shared storage by pfs: /nvme1n1/shared_data//base/13572  
849     2024-12-18 14:33:38.509 CST LOG:  selected new timeline ID: 2  
849     2024-12-18 14:33:38.511 CST LOG:  vfs_unlink /nvme1n1/shared_data//pg_wal/xlogtemp.849  
Got coredump signal(11) in process(849)  
Process Info:  
  pid:  849  
  type: startup  
  sql:  <NULL>  
Backtrace:  
  #0  polar_program_error_handler+0x388 [0xaaaad54f72a4]  
  #1  +0x388 [0xffffabc2d790]  
  #2  strerrordesc_np+0x1e30 [0xffffaac19950]  
  #3  dopr+0x164 [0xaaaad55824b0]  
  #4  pg_vsnprintf+0xa0 [0xaaaad5581c68]  
  #5  pvsnprintf+0x44 [0xaaaad5574e54]  
  #6  appendStringInfoVA+0xb4 [0xaaaad5576cb8]  
  #7  errmsg+0x1c4 [0xaaaad54ef674]  
  #8  polar_vfs_file_handle_node_type+0x158 [0xffffa77de3cc]  
  #9  polar_stat_file_before_hook+0x44 [0xffff933b5ef8]  
  #10 vfs_unlink+0xf0 [0xffffa77daa00]  
  #11 polar_unlink+0x4c [0xaaaad4e017bc]  
  #12 XLogFileCopy+0x130 [0xaaaad4e06f38]  
  #13 XLogInitNewTimeline+0xe4 [0xaaaad4e0acf4]  
  #14 StartupXLOG+0xcd0 [0xaaaad4e0bf68]  
  #15 StartupProcessMain+0xcc [0xaaaad51f49b4]  
  #16 AuxiliaryProcessMain+0x284 [0xaaaad51e2520]  
  #17 StartChildProcess+0x60 [0xaaaad51f27c0]  
  #18 PostmasterMain+0x1564 [0xaaaad51eb5bc]  
  #19 main+0x380 [0xaaaad50dde60]  
  #20 __libc_init_first+0x7c [0xffffaaba73fc]  
  #21 __libc_start_main+0x98 [0xffffaaba74cc]  
  #22 _start+0x30 [0xaaaad4cc1f70]  
851     2024-12-18 14:33:38.628 CST WARNING:  Failed to get the instance memory usage  
836     2024-12-18 14:33:39.049 CST LOG:  startup process (PID 849) was terminated by signal 11: Segmentation fault  
836     2024-12-18 14:33:39.049 CST LOG:  terminating any other active server processes  
836     2024-12-18 14:33:39.051 CST LOG:  shutting down due to startup process failure  
[PFSD_SDK INF Dec 18 14:33:39.051684][836]pfsd_umount_force 247: pbdname nvme1n1  
[PFSD_SDK INF Dec 18 14:33:39.072031][836]chnl_connection_release_shm 1164: client umount return : deleted /var/run/pfsd//nvme1n1/836.pid  
[PFSD_SDK INF Dec 18 14:33:39.072049][836]pfsd_umount_force 261: umount success for nvme1n1  
836     2024-12-18 14:33:39.072 CST LOG:  umount pbd nvme1n1 success  
836     2024-12-18 14:33:39.103 CST LOG:  database system is shut down

core文件bt如下:

$ gdb --exec=/home/postgres/tmp_polardb_pg_15_base/bin/postgres --core=/home/postgres/replica1/core  
GNU gdb (Ubuntu 12.1-0ubuntu1~22.04.2) 12.1  
Copyright (C) 2022 Free Software Foundation, Inc.  
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>  
This is free software: you are free to change and redistribute it.  
There is NO WARRANTY, to the extent permitted by law.  
Type "show copying" and "show warranty" for details.  
This GDB was configured as "aarch64-linux-gnu".  
Type "show configuration" for configuration details.  
For bug reporting instructions, please see:  
<https://www.gnu.org/software/gdb/bugs/>.  
Find the GDB manual and other documentation resources online at:  
    <http://www.gnu.org/software/gdb/documentation/>.  
  
For help, type "help".  
Type "apropos word" to search for commands related to "word".  
  
warning: Can't open file /dev/zero (deleted) during file-backed mapping note processing  
  
warning: Can't open file /dev/shm/PostgreSQL.3630676106 during file-backed mapping note processing  
  
warning: Can't open file /SYSV000c289a (deleted) during file-backed mapping note processing  
[New LWP 849]  
[Thread debugging using libthread_db enabled]  
Using host libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1".  
Core was generated by `postgres(5432): startup                                                      '.  
Program terminated with signal SIGSEGV, Segmentation fault.  
#0  __pthread_kill_implementation (threadid=281473559998496, signo=signo@entry=11, no_tid=no_tid@entry=0) at ./nptl/pthread_kill.c:44  
44  ./nptl/pthread_kill.c: No such file or directory.  
(gdb) bt  
#0  __pthread_kill_implementation (threadid=281473559998496, signo=signo@entry=11, no_tid=no_tid@entry=0) at ./nptl/pthread_kill.c:44  
#1  0x0000ffffaabff254 in __pthread_kill_internal (signo=11, threadid=<optimized out>) at ./nptl/pthread_kill.c:78  
#2  0x0000ffffaabba67c in __GI_raise (sig=11) at ../sysdeps/posix/raise.c:26  
#3  0x0000aaaad54f74f0 in ?? ()  
#4  0x0000000b00000000 in ?? ()  
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

在pb2 sudo dmesg 系统层面未看到oom或其他报错, 尝试了关闭其他容器(降低内存使用), 问题依旧.

暂时无法进行后续操作, 请问有什么解决办法?

在研发同学帮助下, 已经找到问题并解决: 没有开启logindex, 导致ro无法被激活. 但是这个手册中没有提及, 期待未来官方文档的补充说明. 开启logindex配置几个参数即可, 参数已经更新到 《在Docker容器中用loop设备模拟共享存储》 . 你跟着做的话, 应该不会碰到我的问题了.

promote 正常的日志如下:

$ pg_ctl promote -D ~/replica1
[PFSD_SDK INF Dec 24 09:56:35.922139][747]pfsd_umount_force 247: pbdname nvme1n1
[PFSD_SDK INF Dec 24 09:56:35.923984][747]chnl_connection_release_shm 1164: client umount return : deleted /var/run/pfsd//nvme1n1/747.pid
[PFSD_SDK INF Dec 24 09:56:35.924002][747]pfsd_umount_force 261: umount success for nvme1n1
waiting for server to promote...[PFSD_SDK INF Dec 24 09:56:36.036961][747]pfsd_umount_force 247: pbdname nvme1n1
[PFSD_SDK INF Dec 24 09:56:36.038059][747]chnl_connection_release_shm 1164: client umount return : deleted /var/run/pfsd//nvme1n1/747.pid
[PFSD_SDK INF Dec 24 09:56:36.038070][747]pfsd_umount_force 261: umount success for nvme1n1
 done
server promoted
postgres@895e6aa6ffd4:~$ psql
psql (PostgreSQL 15.10 (PolarDB 15.10.2.0 build d4f5477d debug) on aarch64-linux-gnu)
Type "help" for help.
slot 也正常的切换了:  
postgres=# select * from pg_replication_slots;
 slot_name | plugin | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn | wal_status | safe_wal_size | two_phase 
-----------+--------+-----------+--------+----------+-----------+--------+------------+------+--------------+-------------+---------------------+------------+---------------+-----------
 replica1  |        | physical  |        |          | f         | f      |            |      |              | 0/40940E40  |                     | reserved   |               | f
 standby1  |        | physical  |        |          | f         | f      |            |      |              | 0/40942F38  |                     | reserved   |               | f
(2 rows)

3、“配置p1 primary作为replica” OR “重建1个replica” .

参考: 《穷鬼玩PolarDB RAC一写多读集群系列 | 在Docker容器中用loop设备模拟共享存储》

假设是全新的计算节点(容器), 需要参考以上文章依次执行如下:

  • 1 部署PolarDB二进制. 参考步骤 9、将编译好的二进制拷贝到pb1, pb2的HOME目录, 便于调用
  • 2 部署PFS二进制. 参考步骤 使用 polardb_pg_devel:ubuntu22.04 镜像, 自带pfs, 无需再安装pfs.
  • 3 接入共享块设备. 参考步骤 可以跳过步骤 : 5.1 和 5.2
  • 4 将共享块设备软链到 nvme1n1. 参考步骤 5.3、将loop设备软链到nvme1n1
  • 5 启动 pfsd 参考步骤 7、在pb1, pb2都启动pfsd, -w为工作进程数量, 数字越大IO性能越好, 这里受制于容器资源限制设置为1.
  • 6 新增部署replica节点. 参考步骤 15、replica节点部署. 及后续步骤

本文将继续使用pb1的容器来演示, 因为在pb1 容器中前面5步都已经完成了, 所以只需要操作第6步(新增部署replica节点).

3.1、清理pb1 旧的本地数据目录

# 进入pb1容器  
docker exec -ti pb1 bash    
  
rm -rf $HOME/primary

3.2、通过 polar-initdb.sh 脚本使用共享存储上的数据目录来初始化 Replica 节点的本地目录。 因为我想后续其他实验部分依旧把pb1作为primary节点, 所以目录依旧使用primary命名.

mkdir -m 0700 $HOME/primary    
sudo /home/postgres/tmp_polardb_pg_15_base/bin/polar-initdb.sh $HOME/primary/ /nvme1n1/shared_data/ replica

3.3、修改replica数据目录及子目录及文件的权限和owner

sudo chmod -R 700 $HOME/primary    
sudo chown -R postgres:postgres  $HOME/primary

3.4、生成初始配置文件

initdb -D /tmp/primary    
cp /tmp/primary/*.conf $HOME/primary/  
rm -rf /tmp/primary

3.5、编辑pb1 replica节点配置文件

配置postgresql.conf

echo "    
port=5432    
polar_hostid=1   # 注意  
polar_enable_shared_storage_mode=on    
polar_disk_name='nvme1n1'    
polar_datadir='/nvme1n1/shared_data/'    
polar_vfs.localfs_mode=off    
shared_preload_libraries='\$libdir/polar_vfs,\$libdir/polar_worker'    
polar_storage_cluster_name='disk'    
logging_collector=on    
log_line_prefix='%p\t%r\t%u\t%m\t'    
log_directory='pg_log'    
listen_addresses='0.0.0.0'    
max_connections=200  
# 下面几个参数解决replica不能promote的问题, 因为RO依赖logindex.  
polar_logindex_mem_size=64MB  
polar_xlog_queue_buffers=64MB  
polar_xlog_page_buffers=64MB  
# 使用pfs时可以关掉 full page write 和 polar_has_partial_write , 否则请打开这两   
full_page_writes = off  
polar_has_partial_write = off
polar_resource_manager.enable_resource_manager=off
    
# replication    
primary_slot_name='replica1'    
primary_conninfo='host=172.17.0.3 port=5432 user=postgres dbname=postgres application_name=replica1'    # 注意    
" >> ~/primary/postgresql.conf

配置pg_hba.conf

echo "      
host all all 0.0.0.0/0 md5      
host  replication postgres  172.17.0.0/16 trust      
" >> ~/primary/pg_hba.conf

3.6、标识节点以 Replica 模式启动:

touch $HOME/primary/replica.signal

3.7、启动 Replica 节点:

pg_ctl start -D $HOME/primary

3.8、pb2 检查到replica复制槽正常连接

postgres=# select * from pg_stat_replication;    
-[ RECORD 1 ]----+------------------------------  
pid              | 844  
usesysid         | 10  
usename          | postgres  
application_name | replica1  
client_addr      | 172.17.0.2  
client_hostname  |   
client_port      | 57928  
backend_start    | 2024-12-24 10:06:13.679898+08  
backend_xmin     |   
state            | streaming  
sent_lsn         | 0/40943100  
write_lsn        | 0/40943100  
flush_lsn        | 0/40943100  
replay_lsn       | 0/40943100  
write_lag        |   
flush_lag        |   
replay_lag       |   
sync_priority    | 0  
sync_state       | async  
reply_time       | 2024-12-24 10:06:43.729921+08

此时已经完成了switchover.

4、switchover again.

因为我想后续其他实验部分依旧把pb1作为primary节点, 所以我下面再做一次角色转变: 把pb2 停掉, 把pb1 promote, 把pb2 重建为replica.

4.1、把pb2 RW节点停掉

pg_ctl stop -m fast -D ~/replica1

4.2、把pb1 promote 为RW节点

pg_ctl promote -D ~/primary

4.3、把pb2 重建为replica.

略. 参考本文第3步 以及 下面的文章 15、replica节点部署 章节.

因为pb2之前就是RO节点, 其他都配置好了. 实际上把pb2切换回RO节点比较简单, 只需要2步. 但在工程化过程中往往会复用模块, 统一采用新增replica的流程.

touch $HOME/replica1/replica.signal  
pg_ctl start -D ~/replica1

参考

《穷鬼玩PolarDB RAC一写多读集群系列 | 在Docker容器中用loop设备模拟共享存储》

《穷鬼玩PolarDB RAC一写多读集群系列 | 如何搭建PolarDB容灾(standby)节点》

《穷鬼玩PolarDB RAC一写多读集群系列 | 共享存储在线扩容》

https://apsaradb.github.io/PolarDB-for-PostgreSQL/zh/operation/ro-online-promote.html

相关实践学习
使用PolarDB和ECS搭建门户网站
本场景主要介绍基于PolarDB和ECS实现搭建门户网站。
阿里云数据库产品家族及特性
阿里云智能数据库产品团队一直致力于不断健全产品体系,提升产品性能,打磨产品功能,从而帮助客户实现更加极致的弹性能力、具备更强的扩展能力、并利用云设施进一步降低企业成本。以云原生+分布式为核心技术抓手,打造以自研的在线事务型(OLTP)数据库Polar DB和在线分析型(OLAP)数据库Analytic DB为代表的新一代企业级云原生数据库产品体系, 结合NoSQL数据库、数据库生态工具、云原生智能化数据库管控平台,为阿里巴巴经济体以及各个行业的企业客户和开发者提供从公共云到混合云再到私有云的完整解决方案,提供基于云基础设施进行数据从处理、到存储、再到计算与分析的一体化解决方案。本节课带你了解阿里云数据库产品家族及特性。
目录
打赏
0
0
0
0
20695
分享
相关文章
喜报|PolarDB开源社区荣获“2024数据库国内活跃开源项目”奖
喜报|PolarDB开源社区荣获“2024数据库国内活跃开源项目”奖
首届全国大学生计算机系统能力大赛PolarDB数据库创新设计赛(天池杯)圆满收官
首届全国大学生计算机系统能力大赛PolarDB数据库创新设计赛(天池杯)圆满收官
世界第一!阿里云PolarDB登顶全球数据库性能及性价比排行榜!
2月26日,阿里云PolarDB在2025开发者大会上登顶全球数据库性能及性价比排行榜。此次突破标志着中国基础软件取得里程碑成就,PolarDB凭借创新的云原生架构,成功应对全球最大规模并发交易峰值,在性能、可扩展性等方面领先全球。
PolarDB开源数据库进阶课18 通过pg_bulkload适配pfs实现批量导入提速
本文介绍了如何修改 `pg_bulkload` 工具以适配 PolarDB 的 PFS(Polar File System),从而加速批量导入数据。实验环境依赖于 Docker 容器中的 loop 设备模拟共享存储。通过对 `writer_direct.c` 文件的修改,替换了一些标准文件操作接口为 PFS 对应接口,实现了对 PolarDB 15 版本的支持。测试结果显示,使用 `pg_bulkload` 导入 1000 万条数据的速度是 COPY 命令的三倍多。此外,文章还提供了详细的步骤和代码示例,帮助读者理解和实践这一过程。
67 1
【YashanDB知识库】原生mysql驱动配置连接崖山数据库
【YashanDB知识库】原生mysql驱动配置连接崖山数据库
【YashanDB知识库】原生mysql驱动配置连接崖山数据库
docker拉取MySQL后数据库连接失败解决方案
通过以上方法,可以解决Docker中拉取MySQL镜像后数据库连接失败的常见问题。关键步骤包括确保容器正确启动、配置正确的环境变量、合理设置网络和权限,以及检查主机防火墙设置等。通过逐步排查,可以快速定位并解决连接问题,确保MySQL服务的正常使用。
243 82
缓存与数据库的一致性方案,Redis与Mysql一致性方案,大厂P8的终极方案(图解+秒懂+史上最全)
缓存与数据库的一致性方案,Redis与Mysql一致性方案,大厂P8的终极方案(图解+秒懂+史上最全)

相关产品

  • 云原生数据库 PolarDB
  • AI助理

    你好,我是AI助理

    可以解答问题、推荐解决方案等