keepalived modify notify.c & vrrp.c to enforce waiting notify script execute success-阿里云开发者社区

开发者社区> 云计算> 正文

keepalived modify notify.c & vrrp.c to enforce waiting notify script execute success

简介:
我们在设计数据库HA的时候, 为了防止主备同时读写一份数据, 或者主备同时对外提供服务. 备机在切换成主机前, 需要有一个fence操作, 需先将主节点fence掉, 然后再激活备库为主库.
例如我们这边用的一个HA脚本 : 
https://raw.githubusercontent.com/digoal/sky_postgresql_cluster/master/INSTALL.txt
或者RHEL的HA套件, 都有类似的FENCE操作.
但是keepalived目前没有切换前的自定义脚本, 只有切换后的自定义脚本(notify), 并且不等待脚本执行.
如图 : 
keepalived modify notify.c  vrrp.c to enforce waiting notify script execute success - 德哥@Digoal - PostgreSQL research
keepalived如果用于数据库的HA切换还需要完善一下.
本文将讲解一下, 如何修改keepalived代码, 来实现这方面的功能, 即备切换成主之前, 必须先等待fence主节点.
最终目的如图 : 
keepalived modify notify.c  vrrp.c to enforce waiting notify script execute success - 德哥@Digoal - PostgreSQL research

首先要了解一下keepalived notify的机制 : 
顺序是这样的 : 起vip, 起静态路由, fork进程调用notify脚本(并且不等待). 
注意调用notify脚本在起VIP之后.
我们的目的是把它放到最前面, 并且需要等待notify脚本调用完成再起VIP.

keepalived有几项配置和notify有关, 即当路由器的角色转换后, 调用用户配置的脚本.
doc/keepalived.conf.SYNOPSIS
    notify_master <STRING>|<QUOTED-STRING> # Script to run during MASTER transit
    notify_backup <STRING>|<QUOTED-STRING> # Script to run during BACKUP transit
    notify_fault <STRING>|<QUOTED-STRING>  # Script to run during FAULT transit
    notify <STRING>|<QUOTED-STRING>        # Script to run during ANY state transit (1)
    smtp_alert           # Send email notif during state transit
}

(1) The "notify" script is called AFTER the corresponding notify_* script has
    been called, and is given exactly 4 arguments (the whole string is interpreted
    as a litteral filename so don't add parameters!):

    $1 = A string indicating whether it's a "GROUP" or an "INSTANCE"
    $2 = The name of said group or instance
    $3 = The state it's transitioning to ("MASTER", "BACKUP" or "FAULT")
    $4 = The priority value

    $1 and $3 are ALWAYS sent in uppercase, and the possible strings sent are the
    same ones listed above ("GROUP"/"INSTANCE", "MASTER"/"BACKUP"/"FAULT").

当路由器在进入对应的角色后, 会调用对应的脚本如notify_master, notify_backup, notify_fault.
当以上脚本调用完后, 再调用notify脚本.
相关代码举例 : 
进入master角色的代码
keepalived/vrrp/vrrp.c
/* MASTER state processing */
int
vrrp_state_master_tx(vrrp_t * vrrp, const int prio)
{
        int ret = 0;

        if (!VRRP_VIP_ISSET(vrrp)) {
                log_message(LOG_INFO, "VRRP_Instance(%s) Entering MASTER STATE"
                                    , vrrp->iname);
                vrrp_state_become_master(vrrp);   // 进入master角色的操作对应的函数
                ret = 1;
        } else if (vrrp->garp_refresh && timer_cmp(time_now, vrrp->garp_refresh_timer) > 0) {
                vrrp_send_link_update(vrrp);
                vrrp->garp_refresh_timer = timer_add_long(time_now, vrrp->garp_refresh);
        }

        vrrp_send_adv(vrrp,
                      (prio == VRRP_PRIO_OWNER) ? VRRP_PRIO_OWNER :
                                                  vrrp->effective_priority);
        return ret;
}

当进入master角色后, 在启用虚拟IP后, 会调用notify脚本.
/* becoming master */
void
vrrp_state_become_master(vrrp_t * vrrp)
{
        /* add the ip addresses */  // 先启用虚拟IP
        if (!LIST_ISEMPTY(vrrp->vip))
                vrrp_handle_ipaddress(vrrp, IPADDRESS_ADD, VRRP_VIP_TYPE);
        if (!LIST_ISEMPTY(vrrp->evip))
                vrrp_handle_ipaddress(vrrp, IPADDRESS_ADD, VRRP_EVIP_TYPE);
        vrrp->vipset = 1;

        /* add virtual routes */  // 然后添加静态路由
        if (!LIST_ISEMPTY(vrrp->vroutes))
                vrrp_handle_iproutes(vrrp, IPROUTE_ADD);

        /* remotes neighbour update */  // 然后发送免费ARP报文
        vrrp_send_link_update(vrrp);

        /* set refresh timer */
        if (vrrp->garp_refresh) {
                vrrp->garp_refresh_timer = timer_add_long(time_now, vrrp->garp_refresh);
        }

        /* Check if notify is needed */  // 最后是调用自定义的notify脚本
        notify_instance_exec(vrrp, VRRP_STATE_MAST);

#ifdef _WITH_SNMP_  // 触发snmp
        vrrp_snmp_instance_trap(vrrp);
#endif
....


这里要注意的是, notify脚本调用时fork进程通过system函数来调用的, 并且主进程直接返回0, 不等待子进程执行完毕.
对应的代码 : 
lib/notify.c
/* perform a system call */
int
system_call(char *cmdline)
{
        int retval;

        retval = system(cmdline);

        if (retval == 127) {
                /* couldn't exec command */
                log_message(LOG_ALERT, "Couldn't exec command: %s", cmdline);
        } else if (retval == -1) {
                /* other error */
                log_message(LOG_ALERT, "Error exec-ing command: %s", cmdline);
        }

        return retval;
}
/* Execute external script/program */
int
notify_exec(char *cmd)
{
        pid_t pid;
        int ret;

        pid = fork();

        /* In case of fork is error. */  
        if (pid < 0) {
                log_message(LOG_INFO, "Failed fork process");
                return -1;
        }

        /* In case of this is parent process */  // 主进程直接退出, 没有wait 子进程.
        if (pid)
                return 0;

        signal_handler_destroy();
        closeall(0);

        open("/dev/null", O_RDWR);

        ret = dup(0);
        if (ret < 0) {
                log_message(LOG_INFO, "dup(0) error");
        }

        ret = dup(0);
        if (ret < 0) {
                log_message(LOG_INFO, "dup(0) error");
        }

        system_call(cmd);

        exit(0);
}


所以, 为了达到目的, 我们需要改2个地方. 为了方便观察, 加几个日志输出.
# cd /opt/soft_bak/keepalived-1.2.13
1. 把调用notify脚本移到vip前面
# vi keepalived/vrrp/vrrp.c
修改函数, 把调用notify脚本从这个函数剥离出来
/* becoming master */
void
vrrp_state_become_master(vrrp_t * vrrp)
{
...
        /* Check if notify is needed */
        //notify_instance_exec(vrrp, VRRP_STATE_MAST);  //注释这里 

修改函数
/* MASTER state processing */
int
vrrp_state_master_tx(vrrp_t * vrrp, const int prio)
{
...
        // if (!VRRP_VIP_ISSET(vrrp)) {  // 改成如下, 即把notify脚本放到vrrp_state_become_master之前, 并且返回成功才会调用vrrp_state_become_master.
        if ((!VRRP_VIP_ISSET(vrrp)) && notify_instance_exec(vrrp, VRRP_STATE_MAST)) {
                log_message(LOG_INFO, "VRRP_Instance(%s) Entering MASTER STATE"
                                    , vrrp->iname);
                vrrp_state_become_master(vrrp);
                ret = 1;


2. 需要让主进程等待notify脚本执行完
# vi lib/notify.c
添加等待需要的头文件
#include <sys/types.h>
#include <sys/wait.h>

修改函数
/* Execute external script/program */
int
notify_exec(char *cmd)
{
        pid_t pid;
        int ret;
        pid_t wait_pid;
        int wait_ret;

        pid = fork();

        /* In case of fork is error. */
        if (pid < 0) {
                log_message(LOG_INFO, "Failed fork process");
                return -1;
        }

        /* child process */  // 子进程执行
        if (pid == 0) {

          signal_handler_destroy();
          closeall(0);

          open("/dev/null", O_RDWR);

          ret = dup(0);
          if (ret < 0) {
                log_message(LOG_INFO, "dup(0) error");
          }

          ret = dup(0);
          if (ret < 0) {
                log_message(LOG_INFO, "dup(0) error");
          }

          log_message(LOG_INFO, "notify executing.");  // 添加日志输出
          system_call(cmd);
          exit(0);
        }

        /* In case of this is parent process */  // 主进程等待
        else {
                log_message(LOG_INFO, "waiting notify execute success.");   // 添加日志输出
                wait_pid = waitpid(pid, &wait_ret, 0); 
                if (wait_pid == pid && WIFEXITED(wait_ret)) {
                  log_message(LOG_INFO, "notify execute successed.");    // 添加日志输出
                  return 0;
                }
                return -1;
        }
        
}


重新编译
# gmake && gmake install

测试
配置文件
[root@192_168_173_203 keepalived-1.2.13]# cd /opt/keepalived/etc/keepalived/
[root@192_168_173_203 keepalived]# cat keepalived.conf
! Configuration File for keepalived  !号或#号开头为注释

global_defs {      # 全局配置除了router_id, 其他都无所谓, 因为我这里的测试只是观察script weight对vrrp的影响.
   notification_email {
     acassen@firewall.loc
     failover@firewall.loc
     sysadmin@firewall.loc
   }
   notification_email_from Alexandre.Cassen@firewall.loc
   smtp_server 192.168.200.1
   smtp_connect_timeout 30
   router_id DIGOAL_TEST1   # 随便配置, 因为这个不在VRRP包里面体现, 在VRRP包里体现的是虚拟路由器ID.
}

vrrp_script pos {   # 脚本配置 
    script "/root/pos.sh"
    interval 1
    weight 0   # 配置默认weight 0, 后面可以在instance中覆盖
    fall 1    # 从OK到KO需要1次检测失败.
    rise 1    # 从KO到OK需要1次检测成功.
}

vrrp_script nag {   # 脚本配置 
    script "/root/nag.sh"
    interval 1
    weight 0   # 配置默认weight 0, 后面可以在instance中覆盖
    fall 1    # 从OK到KO需要1次检测失败.
    rise 1    # 从KO到OK需要1次检测成功.
}

vrrp_script zero {   # 脚本配置 
    script "/root/zero.sh"
    interval 1
    weight 0   # 配置默认weight 0, 后面可以在instance中覆盖
    fall 1    # 从OK到KO需要1次检测失败.
    rise 1    # 从KO到OK需要1次检测成功.
}

vrrp_instance vi_1 {
    state MASTER    # 初始状态
    interface eth0
    virtual_router_id 51
    priority 100   # 初始优先级, 注意两个节点配置要不一样. 一高一低, 高的选举为master.
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass 12345678  # 最大长度为8, 如果要修改最大长度, 
                            # 可参考 http://blog.163.com/digoal@126/blog/static/163877040201472123810816/
    }
    track_script {
        pos weight 3
        zero weight 0
        nag weight -5
    }
    unicast_peer {   # 使用单播代替组播发送vrrp心跳.
        192.168.173.203
        192.168.173.204
    }
    virtual_ipaddress {   # 虚拟地址配置, 和ifconfig兼容
        172.16.173.100/24 brd 172.16.173.255 dev eth0 scope link label eth0:1
    }
    debug   # 打开debug, 方便调试, 本例没有用到
    nopreempt   # 不主动竞争master, 这样的话优先级只用于决定第一次搭建HA时的主库节点, 发生failover后, 不会主动提升.
    notify_master "/root/notify.sh"
}


配置脚本
[root@192_168_173_203 keepalived]# vi /root/notify.sh 
#!/bin/bash

sleep 1000
exit 0

# chmod 555 /root/notify.sh


启动keepalived
# keepalived -f /opt/keepalived/etc/keepalived/keepalived.conf -D

观察日志 : 
# tail -f -n 1 /var/log/messages
Aug 26 10:13:31 192_168_173_203 Keepalived[14531]: Starting Keepalived v1.2.13 (08/19,2014)
Aug 26 10:13:31 192_168_173_203 Keepalived[14532]: Starting Healthcheck child process, pid=14533
Aug 26 10:13:31 192_168_173_203 Keepalived[14532]: Starting VRRP child process, pid=14534
Aug 26 10:13:31 192_168_173_203 Keepalived_vrrp[14534]: Netlink reflector reports IP 192.168.173.203 added
Aug 26 10:13:31 192_168_173_203 Keepalived_vrrp[14534]: Netlink reflector reports IP 192.168.173.156 added
Aug 26 10:13:31 192_168_173_203 Keepalived_vrrp[14534]: Netlink reflector reports IP fe80::f6ce:46ff:fe86:4234 added
Aug 26 10:13:31 192_168_173_203 Keepalived_healthcheckers[14533]: Netlink reflector reports IP 192.168.173.203 added
Aug 26 10:13:31 192_168_173_203 Keepalived_healthcheckers[14533]: Netlink reflector reports IP 192.168.173.156 added
Aug 26 10:13:31 192_168_173_203 Keepalived_vrrp[14534]: Registering Kernel netlink reflector
Aug 26 10:13:31 192_168_173_203 Keepalived_vrrp[14534]: Registering Kernel netlink command channel
Aug 26 10:13:31 192_168_173_203 Keepalived_healthcheckers[14533]: Netlink reflector reports IP fe80::f6ce:46ff:fe86:4234 added
Aug 26 10:13:31 192_168_173_203 Keepalived_vrrp[14534]: Registering gratuitous ARP shared channel
Aug 26 10:13:31 192_168_173_203 Keepalived_healthcheckers[14533]: Registering Kernel netlink reflector
Aug 26 10:13:31 192_168_173_203 Keepalived_healthcheckers[14533]: Registering Kernel netlink command channel
Aug 26 10:13:31 192_168_173_203 Keepalived_healthcheckers[14533]: Opening file '/opt/keepalived/etc/keepalived/keepalived.conf'.
Aug 26 10:13:31 192_168_173_203 Keepalived_vrrp[14534]: Opening file '/opt/keepalived/etc/keepalived/keepalived.conf'.
Aug 26 10:13:31 192_168_173_203 Keepalived_healthcheckers[14533]: Configuration is using : 7972 Bytes
Aug 26 10:13:31 192_168_173_203 Keepalived_vrrp[14534]: Configuration is using : 69370 Bytes
Aug 26 10:13:31 192_168_173_203 Keepalived_vrrp[14534]: Using LinkWatch kernel netlink reflector...
Aug 26 10:13:31 192_168_173_203 Keepalived_vrrp[14534]: VRRP sockpool: [ifindex(2), proto(112), unicast(1), fd(10,11)]
Aug 26 10:13:31 192_168_173_203 Keepalived_healthcheckers[14533]: Using LinkWatch kernel netlink reflector...
Aug 26 10:13:31 192_168_173_203 Keepalived_vrrp[14534]: VRRP_Script(nag) succeeded
Aug 26 10:13:31 192_168_173_203 Keepalived_vrrp[14534]: VRRP_Script(zero) succeeded
Aug 26 10:13:31 192_168_173_203 Keepalived_vrrp[14534]: VRRP_Script(pos) succeeded
Aug 26 10:13:32 192_168_173_203 Keepalived_vrrp[14534]: VRRP_Instance(vi_1) Transition to MASTER STATE
Aug 26 10:13:33 192_168_173_203 Keepalived_vrrp[14534]: waiting notify execute success.
Aug 26 10:13:33 192_168_173_203 Keepalived_vrrp[14564]: notify executing.
... 1000秒后, 进入master, 并且启动VIP, 测试成功.
Aug 26 10:30:13 192_168_173_203 Keepalived_vrrp[14534]: notify execute successed.
Aug 26 10:30:13 192_168_173_203 Keepalived_vrrp[14534]: VRRP_Instance(vi_1) Entering MASTER STATE
Aug 26 10:30:13 192_168_173_203 Keepalived_vrrp[14534]: VRRP_Instance(vi_1) setting protocol VIPs.
Aug 26 10:30:13 192_168_173_203 Keepalived_vrrp[14534]: VRRP_Instance(vi_1) Sending gratuitous ARPs on eth0 for 172.16.173.100
Aug 26 10:30:13 192_168_173_203 Keepalived_healthcheckers[14533]: Netlink reflector reports IP 172.16.173.100 added
Aug 26 10:30:18 192_168_173_203 Keepalived_vrrp[14534]: VRRP_Instance(vi_1) Sending gratuitous ARPs on eth0 for 172.16.173.100


我们只要把fence操作放在这个脚本里面, 就可以实现先fence, 再启VIP的过程.

[注意]
本文仅提供思路, 仅供测试, 用于生产请自行承担风险.

[参考]
1. lib/notify.c
2. keepalived/vrrp/vrrp.c
3. man 2 wait
4. man 3 system
5. doc/keepalived.conf.SYNOPSIS

版权声明:本文内容由阿里云实名注册用户自发贡献,版权归原作者所有,阿里云开发者社区不拥有其著作权,亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容,填写侵权投诉表单进行举报,一经查实,本社区将立刻删除涉嫌侵权内容。

分享:
云计算
使用钉钉扫一扫加入圈子
+ 订阅

时时分享云计算技术内容,助您降低 IT 成本,提升运维效率,使您更专注于核心业务创新。

其他文章