# This script is run by failover_command.set-o xtrace # Special values:# 1) %d = failed node id# 2) %h = failed node hostname# 3) %p = failed node port number# 4) %D = failed node database cluster path# 5) %m = new main node id# 6) %H = new main node hostname# 7) %M = old main node id# 8) %P = old primary node hostname# 9) %r = new main port number# 10) %R = new main database cluster path# 11) %N = old primary node hostname# 12) %S = old primary node port number# 13) %% = '%' characterFAILED_NODE_ID="$1"FAILED_NODE_HOST="$2"FAILED_NODE_PORT="$3"FAILED_NODE_PGDATA="$4"NEW_MAIN_NODE_ID="$5"NEW_MAIN_NODE_HOST="$6"OLD_MAIN_NODE_ID="$7"OLD_PRIMARY_NODE_ID="$8"NEW_MAIN_NODE_PORT="$9"NEW_MAIN_NODE_PGDATA="${10}"OLD_PRIMARY_NODE_HOST="${11}"OLD_PRIMARY_NODE_PORT="${12}"# PostgreSQL目录PGHOME=/usr/local/pgsql16 # 复制槽名称REPL_SLOT_NAME=${FAILED_NODE_HOST//[-.]/_}# 启动PostgreSQL服务的用户名POSTGRESQL_STARTUP_USER=postgres # ssh免密码登录密钥文件名SSH_KEY_FILE=id_rsa SSH_OPTIONS="-o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i ~/.ssh/${SSH_KEY_FILE}"echo failover.sh: start: failed_node_id=$FAILED_NODE_IDfailed_host=$FAILED_NODE_HOST \ old_primary_node_id=$OLD_PRIMARY_NODE_IDnew_main_node_id=$NEW_MAIN_NODE_IDnew_main_host=$NEW_MAIN_NODE_HOST## If there's no main node anymore, skip failover.if [ $NEW_MAIN_NODE_ID-lt0 ]; thenecho failover.sh: All nodes are down. Skipping failover. exit0fi## 检查是否可以免密登录指定节点ssh-T${SSH_OPTIONS}${POSTGRESQL_STARTUP_USER}@${NEW_MAIN_NODE_HOST}ls /tmp > /dev/null if [ $?-ne0 ]; thenecho failover.sh: passwordless SSH to ${POSTGRESQL_STARTUP_USER}@${NEW_MAIN_NODE_HOST} failed. Please setup passwordless SSH. exit1fi## 如果down机的是从节点,跳过if [ $OLD_PRIMARY_NODE_ID !="-1"-a$FAILED_NODE_ID !=$OLD_PRIMARY_NODE_ID ]; then# If Standby node is down, drop replication slot.${PGHOME}/bin/psql -h${OLD_PRIMARY_NODE_HOST}-p${OLD_PRIMARY_NODE_PORT} postgres \ -c"SELECT pg_drop_replication_slot('${REPL_SLOT_NAME}');" >/dev/null 2>&1 if [ $?-ne0 ]; thenecho ERROR: failover.sh: drop replication slot \"${REPL_SLOT_NAME}\" failed. You may need to drop replication slot manually. fiecho failover.sh: end: standby node is down. Skipping failover. exit0fi## 如果down机的是主节点,则提升指定从节点为主节点## 新的从节点由Pgpool投票选出echo failover.sh: primary node is down, promote new_main_node_id=$NEW_MAIN_NODE_ID on ${NEW_MAIN_NODE_HOST}. ## ssh到新的主节点并执行pg_ctl -D $PG_HOME promote提升从节点ssh-T${SSH_OPTIONS}${POSTGRESQL_STARTUP_USER}@${NEW_MAIN_NODE_HOST}${PGHOME}/bin/pg_ctl -D${NEW_MAIN_NODE_PGDATA}-w promote ## 提升失败退出,返回1if [ $?-ne0 ]; thenecho ERROR: failover.sh: promote failed exit1fi## 提示提升成功并退出,返回0echo failover.sh: end: new_main_node_id=$NEW_MAIN_NODE_ID on ${NEW_MAIN_NODE_HOST} was successfully promoted to primary exit0
脚本相对比较好理解,本脚本是故障转移的第一步。Pgpool-II检测到主库down机,则投票选举出新的主服务器,然后ssh到新主机上调用pg_ctl promote提升从节点。该脚本总共有12个参数:
# 1) %d = 失效节点ID(对应该主机PGPOOL_HOME/etc/pgpool_node_id文件里的ID)
# 2) %h = 失效节点名
# 3) %p = 失效节点端口号
# 4) %D = 失效节点PG_DATA目录
# 5) %m = 新主节点ID(对应该PGPOOL_HOME/etc/pgpool_node_id文件里的ID)
# 6) %H = 新主节点主机名
# 7) %M = 旧主节点ID
# 8) %P = 旧主节点主机名
# 9) %r = 新主节点端口号
# 10) %R = 新主节点PG_DATA目录
# 11) %N = 旧主节点主机名
# 12) %S = 旧主节点端口号
注意:这里失效节点并不一定就是旧主节点,所以要分别设置参数。