一、启动/停止Pgpool
开始使用Pgpool前请先确认:
- 各节点(server1、server2、server3)$HOME/pgpool/etc目录下是否有以下文件:
文件名 |
功能描述 |
pgpool_node_id |
节点标识文件 |
pcp.conf |
pcp系列命令密码文件 |
pool_hba.conf | Pgpool客户端认证配置文件 |
pgpool.conf | Pgpool配置文件 |
pool_passwd | Pgpool客户端密码文件 |
escalation.sh | 浮动IP切换脚本 |
failover.sh |
故障转移脚本 |
follow_primary.sh | 故障转移后续处理脚本 |
- 各节点(server1、server2、server3)$HOME目录下是否有以下文件:
文件名 |
功能描述 |
.pcppass |
pcp密码文件 |
.pgpass |
PostgreSQL密码文件 |
.pgpoolkey |
Pgpool客户端密码密钥文件 |
- 各节点(server1、server2、server3)$PG_DATA目录下是否有以下文件:
文件名 |
功能描述 |
recovery_1st_stage |
recovery第一阶段脚本 |
pgpool_remote_start |
recovery完成后启动数据库脚本 |
请检查上述文件是否存在,并确认文件内容是否正确、相应的脚本文件是否是执行权限。同时检查pgpool.conf中相关的目录是否已经创建(比如日志目录、archive目录等)。
[server1]$ pgpool--helppgpool-II version 4.4.4 (nurikoboshi), A generic connection pool/replication/load balance server for PostgreSQL Usage: pgpool [ -c] [ -f CONFIG_FILE ] [ -F PCP_CONFIG_FILE ] [ -a HBA_CONFIG_FILE ] [ -n ] [ -D ] [ -d ] pgpool [ -f CONFIG_FILE ] [ -F PCP_CONFIG_FILE ] [ -a HBA_CONFIG_FILE ] [ -m SHUTDOWN-MODE ] stop pgpool [ -f CONFIG_FILE ] [ -F PCP_CONFIG_FILE ] [ -a HBA_CONFIG_FILE ] reload Common options: -a, --hba-file=HBA_CONFIG_FILE Set the path to the pool_hba.conf configuration file (default: /home/postgres/pgpool/etc/pool_hba.conf) -f, --config-file=CONFIG_FILE Set the path to the pgpool.conf configuration file (default: /home/postgres/pgpool/etc/pgpool.conf) -k, --key-file=KEY_FILE Set the path to the pgpool key file (default: /home/postgres/.pgpoolkey) can be over ridden by PGPOOLKEYFILE environment variable -F, --pcp-file=PCP_CONFIG_FILE Set the path to the pcp.conf configuration file (default: /home/postgres/pgpool/etc/pcp.conf) -h, --help Print this help Start options: -C, --clear-oidmaps Clear query cache oidmaps when memqcache_method is memcached (If shmem, discards whenever pgpool starts.) -n, --dont-detach Don't run in daemon mode, does not detach control tty -x, --debug-assertions Turns on various assertion checks, This is a debugging aid -D, --discard-status Discard pgpool_status file and do not restore previous status -d, --debug Debug modeStop options: -m, --mode=SHUTDOWN-MODE Can be "smart", "fast", or "immediate"Shutdown modes are: smart quit after all clients have disconnected fast quit directly, with proper shutdown immediate the same mode as fast
1.1 启动
pgpool -n &
1.2 停止
pgpool -m {smart | fast | immediate} stop
二、使用
首先在主节点(server1)启动PostgreSQL数据库,其余两个节点不启动PostgreSQL,然后分别在三个节点启动Pgpool。
su- postgres pg_ctl -D /data/pgsql16/data -l logfile # 分别在三个节点上执行pgpool -n &
2.1 设置从节点
通过Pgpool online recovery功能设置从节点,此过程会调用$PG_DATA目录下的recovery_1st_stage和pgpool_remote_start 脚本通过流复制机制同步主节点数据到从节点,同步完成后启动从节点。
# pcp_recovery_node -h 192.168.0.5 -p 9898 -U pgpool -n 1Password: pcp_recovery_node -- Command Successful # pcp_recovery_node -h 192.168.0.5 -p 9898 -U pgpool -n 2Password: pcp_recovery_node -- Command Successful
成功执行完pcp_recovery_node后,检查从节点是否正确加入集群:
psql -h192.168.0.5 -p9999-U pgpool postgres -c"show pool_nodes"Password for user pgpool node_id | hostname | port | status | pg_status | lb_weight | role | pg_role | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change ---------+----------+------+--------+-----------+-----------+---------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------0 | server1 | 5432 | up | up | 0.333333 | primary | primary | 0 | false | 0 | | | 2023-10-2307:00:57 1 | server2 | 5432 | up | up | 0.333333 | standby | standby | 0 | true | 0 | streaming | async | 2023-10-2307:00:57 2 | server3 | 5432 | up | up | 0.333333 | standby | standby | 0 | false | 0 | streaming | async | 2023-10-2307:00:57 (3 rows)
2.2 Watchdog切换测试
- 执行pcp_watchdog_info命令,查看哪个节点是LEADER
$ pcp_watchdog_info-h192.168.0.5 -p9898-U pgpool Password: 33 YES server1:9999 Linux server1 server1 server1:9999 Linux server1 server1 999990004 LEADER 0 MEMBER #The Pgpool-II server started first became "LEADER".server2:9999 Linux server2 server2 999990007 STANDBY 0 MEMBER #run as standbyserver3:9999 Linux server3 server3 999990007 STANDBY 0 MEMBER #run as standby
- 停止leader的Pgpool,Pgpool会投票选举出新的leader,此例中server2成为了新leader。
[server1]$ pgpool-m fast stop$ pcp_watchdog_info-p9898-h192.168.0.5 -U pgpool Password: 33 YES server2:9999 Linux server2 server2 server2:9999 Linux server2 server2 999990004 LEADER 0 MEMBER #server2 is promoted to LEADERserver1:9999 Linux server1 server1 9999900010 SHUTDOWN 0 MEMBER #server1 is stoppedserver3:9999 Linux server3 server3 999990007 STANDBY 0 MEMBER #server3 runs as STANDBY
- 重新启动刚才停止的Pgpool,启动后该节点会成为standby节点。
[server1]$ pgpool-n & [server1]# pcp_watchdog_info -p 9898 -h 192.168.0.5 -U pgpoolPassword: 33 YES server2:9999 Linux server2 server2 server2:9999 Linux server2 server2 999990004 LEADER 0 MEMBER server1:9999 Linux server1 server1 999990007 STANDBY 0 MEMBER server3:9999 Linux server3 server3 999990007 STANDBY 0 MEMBER
2.3 故障转移测试
查看当前主节点,当前server是主节点。
$ psql-h192.168.0.5 -p9999-U pgpool postgres -c"show pool_nodes"Password for user pgpool: node_id | hostname | port | status | pg_status | lb_weight | role | pg_role | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change ---------+----------+------+--------+-----------+-----------+---------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------0 | server1 | 5432 | up | up | 0.333333 | primary | primary | 0 | false | 0 | | | 2023-10-2307:08:14 1 | server2 | 5432 | up | up | 0.333333 | standby | standby | 0 | false | 0 | streaming | async | 2023-10-2307:08:14 2 | server3 | 5432 | up | up | 0.333333 | standby | standby | 0 | true | 0 | streaming | async | 2023-10-2307:08:14 (3 rows)
停止server的PostgreSQL
[server1]$ pg_ctl-D /data/pgsql16/data -m immediate stop
停止server1后,发生故障转移,server2成为新的主节点。
$ psql-h192.168.0.5 -p9999-U pgpool postgres -c"show pool_nodes"Password for user pgpool: node_id | hostname | port | status | pg_status | lb_weight | role | pg_role | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change ---------+----------+------+--------+-----------+-----------+---------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------0 | server1 | 5432 | down | down | 0.333333 | standby | unknown | 0 | false | 0 | | | 2023-10-2307:10:01 1 | server2 | 5432 | up | up | 0.333333 | primary | primary | 0 | false | 0 | | | 2023-10-2307:10:01 2 | server3 | 5432 | up | up | 0.333333 | standby | standby | 0 | true | 0 | streaming | async | 2023-10-2307:10:03 (3 rows)
此时server2为primary,server3为standby。
[server3]$ psql-h server3 -p5432-U pgpool postgres -c"select pg_is_in_recovery()"pg_is_in_recovery -------------------t [server2]$ psql-h server2 -p5432-U pgpool postgres -c"select pg_is_in_recovery()"pg_is_in_recovery -------------------f [server2]# psql -h server2 -p 5432 -U pgpool postgres -c "select * from pg_stat_replication" -x-[ RECORD 1 ]----+------------------------------pid | 7198usesysid | 16385usename | repl application_name | server3 client_addr | 192.168.0.3 client_hostname | client_port | 40916backend_start | 2023-10-2311:27:30.39248+08backend_xmin | state | streaming sent_lsn | 0/12000260 write_lsn | 0/12000260 flush_lsn | 0/12000260 replay_lsn | 0/12000260 write_lag | flush_lag | replay_lag | sync_priority | 0sync_state | async reply_time | 2023-10-2311:27:35.077533+08
2.4 在线恢复
使用Pgpool的online recovery功能恢复server1
$ pcp_recovery_node-h192.168.0.5 -p9898-U pgpool -n0Password: pcp_recovery_node -- Command Successful
恢复完成后检查状态,server1已变为standby
# psql -h 192.168.0.5 -p 9999 -U pgpool postgres -c "show pool_nodes"Password for user pgpool: node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change ---------+----------+------+--------+-----------+-----------+---------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------0 | server1 | 5432 | up | up | 0.333333 | standby | standby | 0 | true | 0 | streaming | async | 2023-10-2209:42:27 1 | server2 | 5432 | up | up | 0.333333 | primary | primary | 0 | false | 0 | | | 2023-10-2209:42:27 2 | server3 | 5432 | up | up | 0.333333 | standby | standby | 0 | false | 0 | streaming | async | 2023-10-2209:42:27 (3 rows)