Loading [MathJax]/jax/output/HTML-CSS/jax.js

MySQL MHA高可用环境搭建

本文涉及的产品
RDS MySQL Serverless 基础系列,0.5-2RCU 50GB
云数据库 RDS MySQL,集群系列 2核4GB
推荐场景:
搭建个人博客
云数据库 RDS MySQL,高可用系列 2核4GB
简介:
原创作品,允许转载,转载时请务必以超链接形式标明文章  原始出处 、作者信息和本声明。否则将追究法律责任。 http://koumm.blog.51cto.com/703525/1769562

一、安装MHA基本环境

1. 安装MHA node

(1) 基本环境说明,本文参考互联网文章学习,搭建MHA与测试如下。

参考文档:http://www.cnblogs.com/xuanzhi201111/p/4231412.html

角色                IP地址            主机名    
=============================================    
Master              192.168.1.121     node1     
Slave               192.168.1.122     node2     
Slave               192.168.1.123     node3     
Monitor host        192.168.1.125     node5 


(2) 在node1,node2,node3,node5操作:

# vi /etc/hosts

192.168.1.121  node1   
192.168.1.122  node2    
192.168.1.123  node3    
192.168.1.125  node5

安装MHA node节点软件包:

# rpm -ivh http://dl.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm    
# yum install perl-DBD-MySQL perl-CPAN -y    
# tar xf mha4mysql-node-0.56.tar.gz     
# cd mha4mysql-node-0.56    
# perl Makefile.PL        
# make && make install    
 

2. 安装MHA Manager

在node5管理节点上操作:注:MHA Manager主机也是需要安装MHA Node,MHA Manger

# yum install perl-DBD-MySQL perl-CPAN perl-Config-Tiny perl-Log-Dispatch perl-Parallel-ForkManager perl-Time-HiRes -y   
# tar xf mha4mysql-manager-0.56.tar.gz    
# cd mha4mysql-manager-0.56    
# perl Makefile.PL    
# make && make install

#说明:安装的脚本程序都在/usr/local/bin/目录下。   
 

3. 节点间配置SSH登录无密码验证(MHA主机之间使用key登录)

在node5(Monitor):

# ssh-keygen -t rsa   
# ssh-copy-id -i /root/.ssh/id_rsa.pub root@node1    
# ssh-copy-id -i /root/.ssh/id_rsa.pub root@node2    
# ssh-copy-id -i /root/.ssh/id_rsa.pub root@node3

在node1(Master):

# ssh-keygen -t rsa   
# ssh-copy-id -i /root/.ssh/id_rsa.pub root@node2    
# ssh-copy-id -i /root/.ssh/id_rsa.pub root@node3    
# ssh-copy-id -i /root/.ssh/id_rsa.pub root@node5

在node2 (slave):

# ssh-keygen -t rsa   
# ssh-copy-id -i /root/.ssh/id_rsa.pub root@node1    
# ssh-copy-id -i /root/.ssh/id_rsa.pub root@node3    
# ssh-copy-id -i /root/.ssh/id_rsa.pub root@node5

在node3 (slave):

# ssh-keygen -t rsa   
# ssh-copy-id -i /root/.ssh/id_rsa.pub root@node1    
# ssh-copy-id -i /root/.ssh/id_rsa.pub root@node2    
# ssh-copy-id -i /root/.ssh/id_rsa.pub root@node5


二、搭建主从复制环境

1. 主从复制环境配置过程

(1) mysql安装过程略,但是三节点要创建如下链接

node1(主),node2(主备从),node3(从)

注意:创建如下链接:   
ln -s /usr/local/mysql/bin/* /usr/local/bin/

node1 my.cnf

server-id              = 1   
binlog-format          = ROW    
log-bin                = master-bin    
log-bin-index          = master-bin.index    
log-slave-updates      = true    
relay_log_purge        = 0

node2 my.cnf

server-id              = 2   
binlog-format          = ROW    
log-bin                = master-bin    
log-bin-index          = master-bin.index    
log-slave-updates      = true    
relay_log_purge        = 0

node3 my.cnf

binlog-format          = ROW   
log-bin                = mysql-bin    
relay-log              = slave-relay-bin    
relay-log-index        = slave-relay-bin.index    
log-slave-updates      = true    
server-id              = 11    
skip-name-resolve    
relay_log_purge        = 0


(2) 在node1 (Master)上备份一份完整的数据:

# mysqldump -uroot -p123456 --master-data=2 --single-transaction -R --triggers -A > all.sql

其中--master-data=2代表备份时刻记录master的Binlog位置和Position。


(3) 在node1 (Master)上创建复制用户:

mysql> grant replication slave on *.* to 'repl'@'192.168.1.%' identified by '123456';    
mysql> flush privileges;


(4) 查看主库备份时的binlog名称和位置,MASTER_LOG_FILE和MASTER_LOG_POS:

# head -n 30 all.sql | grep 'CHANGE MASTER TO'

-- CHANGE MASTER TO MASTER_LOG_FILE='master-bin.000004', MASTER_LOG_POS=120;


(5) 把备份复制到192.168.1.122和192.168.1.123

# scp all.sql 192.168.1.122:/root/   
# scp all.sql 192.168.1.123:/root/


(6) 分别在两台服务器上导入备份,执行复制相关命令

在node2、node3 主机上操作:

# mysql -uroot -p123456 < all.sql

mysql> stop slave;

CHANGE MASTER TO    
MASTER_HOST='192.168.1.121',    
MASTER_USER='repl',     
MASTER_PASSWORD='123456',    
MASTER_LOG_FILE='master-bin.000004',    
MASTER_LOG_POS=120;

mysql> start slave;   
mysql> show slave status\G


2. 创建MHA管理用户,在master上创建。

mysql> grant all privileges on *.* to 'root'@'192.168.1.%' identified  by '123456';    
mysql> flush  privileges;

 

三、配置Keepalived VIP

vip配置可以采用两种方式,一种通过keepalived的方式管理虚拟ip的浮动;一人是通过脚本方式,本文通过keepalived方式实现

1. 在node1(Master)与node2(备选主节点)安装keepalived。

# wget http://www.keepalived.org/software/keepalived-1.2.12.tar.gz    
# tar xf keepalived-1.2.12.tar.gz     
# cd keepalived-1.2.12    
# ./configure --prefix=/usr/local/keepalived    
# make &&  make install    
# cp /usr/local/keepalived/etc/rc.d/init.d/keepalived /etc/init.d/    
# cp /usr/local/keepalived/etc/sysconfig/keepalived /etc/sysconfig/    
# mkdir /etc/keepalived    
# cp /usr/local/keepalived/etc/keepalived/keepalived.conf /etc/keepalived/    
# cp /usr/local/keepalived/sbin/keepalived /usr/sbin/


2. 配置keepalived的配置文件,在node1(master)上配置操作如下:

注:keepalived配置成backup->backup,即IP地址切换后,主起来后IP地址不切换,本文监控脚本由MHA提供,keepalived不提供对mysqld的监控。

# vi /etc/keepalived/keepalived.conf    
! Configuration File for keepalived

global_defs {   
     notification_email {    
     abc@163.com    
   }    
   notification_email_from dba@dbserver.com    
   smtp_server 127.0.0.1    
   smtp_connect_timeout 30    
   router_id MySQL-HA    
}

vrrp_instance VI_1 {   
    state BACKUP    
    interface eth0    
    virtual_router_id 51    
    priority 150    
    advert_int 1    
    nopreempt

    authentication {   
    auth_type PASS    
    auth_pass 1111    
    }

    virtual_ipaddress {   
        192.168.1.130    
    }    
}


3. 配置keepalived的配置文件,在node2(备用节点)上配置操作如下:

# vi /etc/keepalived/keepalived.conf    
! Configuration File for keepalived

global_defs {   
     notification_email {    
     abc@163.com    
   }    
   notification_email_from dba@dbserver.com    
   smtp_server 127.0.0.1    
   smtp_connect_timeout 30    
   router_id MySQL-HA    
}

vrrp_instance VI_1 {   
    state BACKUP    
    interface eth0    
    virtual_router_id 51    
    priority 120    
    advert_int 1    
    nopreempt

    authentication {   
    auth_type PASS    
    auth_pass 1111    
    }

    virtual_ipaddress {   
        192.168.1.130    
    }    
}


4. node1,node2启动keepalived服务

# service keepalived start   
# chkconfig keepalived on


5. node1查看VIP启动情况

[root@node1 ~]# ip a   
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN     
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00    
    inet 127.0.0.1/8 scope host lo    
    inet6 ::1/128 scope host     
       valid_lft forever preferred_lft forever    
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000    
    link/ether 00:0c:29:4e:53:71 brd ff:ff:ff:ff:ff:ff    
    inet 192.168.1.121/24 brd 192.168.1.255 scope global eth0    
    inet 192.168.1.130/32 scope global eth0    
    inet6 fe80::20c:29ff:fe4e:5371/64 scope link     
       valid_lft forever preferred_lft forever


四、配置MHA

1. monitor创建MHA的工作目录,并且创建相关配置文件(在软件包解压后的目录里面有样例配置文件)。

# mkdir -p /etc/masterha   
# mkdir -p /var/log/masterha/app1    
# cp mha4mysql-manager-0.56/samples/conf/app1.cnf /etc/masterha/

修改app1.cnf配置文件,修改后的文件内容如下:

# cat /etc/masterha/app1.cnf

[server default]   
manager_workdir=/var/log/masterha/app1    
manager_log=/var/log/masterha/app1/manager.log    
master_binlog_dir=/usr/local/mysql/data/    
master_ip_failover_script= /usr/local/bin/master_ip_failover    
master_ip_online_change_script= /usr/local/bin/master_ip_online_change     
password=123456     
user=root    
ping_interval=1     
remote_workdir=/tmp     
repl_password=123456    
repl_user=repl    
report_script=/usr/local/bin/send_report    
ssh_user=root  

[server1]   
hostname=192.168.1.121    
port=3306

[server2]   
hostname=192.168.1.122    
port=3306    
candidate_master=1    
check_repl_delay=0

[server3]   
hostname=192.168.1.123    
port=3306

说明:    
master_ip_failover_script= /usr/local/bin/master_ip_failover            #MHA自动切换执行的脚本,需要修改     
master_ip_online_change_script= /usr/local/bin/master_ip_online_change  #手动切换需要执行的脚本,需要修改     
report_script=/usr/local/bin/send_report                               #切换时发送邮件进行报告,需要修改


2. 设置relay log的清除方式(在每个slave节点上):

(1)在node2,node3 从节点上操作:

将relay_log_purge=0加入my.cnf配置文件,前面已经配置。

(2) 设置定期清理relay脚本(node2,node3上操作):

# cat purge_relay_log.sh    
#!/bin/bash    
user=root    
passwd=123456    
port=3306    
log_dir='/data/masterha/log'    
work_dir='/data'    
purge='/usr/local/bin/purge_relay_logs'

if [ ! -d logdir]thenmkdirlog_dir -p    
fi

purgeuser=user --password=passwddisablerelaylogpurgeport=port --workdir=workdir>>log_dir/purge_relay_logs.log 2>&1

配置定时计划任务

# crontab -e   
0 4 * * * /bin/bash /root/purge_relay_log.sh

 

3. 要想把keepalived服务引入MHA,我们只需要修改切换是触发的脚本文件master_ip_failover即可,在该脚本中添加在master发生宕机时对keepalived的处理。

(1) 编辑脚本/usr/local/bin/master_ip_failover,修改后如下:

# vi /usr/local/bin/master_ip_failover    
#!/usr/bin/env perl    
use strict;    
use warnings FATAL => 'all';

use Getopt::Long;

my (   
    command,ssh_user,        origmasterhost,orig_master_ip,    
    origmasterport,new_master_host, newmasterip,new_master_port    
);

my vip=192.168.1.130;myssh_start_vip = "/etc/init.d/keepalived start";    
my $ssh_stop_vip = "/etc/init.d/keepalived stop";

GetOptions(   
    'command=s'          => \$command,    
    'ssh_user=s'         => \$ssh_user,    
    'orig_master_host=s' => \$orig_master_host,    
    'orig_master_ip=s'   => \$orig_master_ip,    
    'orig_master_port=i' => \$orig_master_port,    
    'new_master_host=s'  => \$new_master_host,    
    'new_master_ip=s'    => \$new_master_ip,    
    'new_master_port=i'  => \$new_master_port,    
);

exit &main();   
sub main {    
    print "\n\nIN SCRIPT TEST====sshstopvip==ssh_start_vip===\n\n";    
    if ( commandeq"stop"||command eq "stopssh" ) {    
        my exit_code = 1;              eval {                  print "Disabling the VIP on old master: $orig_master_host \n";                  &stop_vip();                  $exit_code = 0;              };              if (@) {    
            warn "Got Error: @\n";exitexit_code;    
        }    
        exit exit_code;          }          elsif (command eq "start" ) {

        my exit_code = 10;             eval {                  print "Enabling the VIP - $vip on the new master - $new_master_host \n";                  &start_vip();                  $exit_code = 0;              };              if (@) {    
            warn @;exitexit_code;    
        }    
        exit exit_code;          }          elsif (command eq "status" ) {    
        print "Checking the Status of the script.. OK \n";    
        exit 0;    
    }    
    else {    
        &usage();    
        exit 1;    
    }    
}    
sub start_vip() {    
    `ssh sshuser\@new_master_host \" ssh_start_vip \"`;      }      # A simple system call that disable the VIP on the old_master      sub stop_vip() {          `sshssh_user\@origmasterhost\"ssh_stop_vip \"`;    
}

sub usage {   
    print    
    "Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";    
}


(2) 编辑脚本master_ip_online_change,修改后如下:

#!/usr/bin/env perl

#  Copyright (C) 2011 DeNA Co.,Ltd.   
#    
#  This program is free software; you can redistribute it and/or modify    
#  it under the terms of the GNU General Public License as published by    
#  the Free Software Foundation; either version 2 of the License, or    
#  (at your option) any later version.    
#    
#  This program is distributed in the hope that it will be useful,    
#  but WITHOUT ANY WARRANTY; without even the implied warranty of    
#  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the    
#  GNU General Public License for more details.    
#    
#  You should have received a copy of the GNU General Public License    
#   along with this program; if not, write to the Free Software    
#  Foundation, Inc.,    
#  51 Franklin Street, Fifth Floor, Boston, MA  02110-1301  USA

## Note: This is a sample script and is not complete. Modify the script based on your environment.

use strict;   
use warnings FATAL => 'all';

use Getopt::Long;   
use MHA::DBHelper;    
use MHA::NodeUtil;    
use Time::HiRes qw( sleep gettimeofday tv_interval );    
use Data::Dumper;

my tstart;my_running_interval = 0.1;    
my (    
  command,orig_master_is_new_slave, origmasterhost,orig_master_ip,       origmasterport,orig_master_user,    
  origmasterpassword,orig_master_ssh_user,     newmasterhost,new_master_ip,        newmasterport,new_master_user,    
  newmasterpassword,new_master_ssh_user    
);    
my vip=192.168.1.130/24;mykey = '1';    
my sshstartvip="/sbin/ifconfigeth0:key vip";myssh_stop_vip = "/sbin/ifconfig eth0:keydown";myorig_master_ssh_port = 22;    
my $new_master_ssh_port = 22;    
GetOptions(    
  'command=s'                => \$command,    
  'orig_master_is_new_slave' => \$orig_master_is_new_slave,    
  'orig_master_host=s'       => \$orig_master_host,    
  'orig_master_ip=s'         => \$orig_master_ip,    
  'orig_master_port=i'       => \$orig_master_port,    
  'orig_master_user=s'       => \$orig_master_user,    
  'orig_master_password=s'   => \$orig_master_password,    
  'orig_master_ssh_user=s'   => \$orig_master_ssh_user,    
  'new_master_host=s'        => \$new_master_host,    
  'new_master_ip=s'          => \$new_master_ip,    
  'new_master_port=i'        => \$new_master_port,    
  'new_master_user=s'        => \$new_master_user,    
  'new_master_password=s'    => \$new_master_password,    
  'new_master_ssh_user=s'    => \$new_master_ssh_user,    
  'orig_master_ssh_port=i'    => \$orig_master_ssh_port,    
  'new_master_ssh_port=i'    => \$new_master_ssh_port,    
);

exit &main();

sub current_time_us {   
  my ( sec,microsec ) = gettimeofday();    
  my curdate=localtime(sec);    
  return curdate."".sprintf("microsec );    
}

sub sleep_until {   
  my elapsed=tvinterval(_tstart);    
  if ( runninginterval>elapsed ) {    
    sleep( runningintervalelapsed );    
  }    
}

sub get_threads_util {   
  my dbh=shift;mymy_connection_id       = shift;    
  my runningtimethreshold=shift;mytype                   = shift;    
  runningtimethreshold=0unless(running_time_threshold);    
  type=0unless(type);    
  my @threads;

  my sth=dbh->prepare("SHOW PROCESSLIST");   
  $sth->execute();

  while ( my ref=sth->fetchrow_hashref() ) {   
    my id=ref->{Id};    
    my user=ref->{User};    
    my host=ref->{Host};    
    my command=ref->{Command};    
    my state=ref->{State};    
    my querytime=ref->{Time};    
    my info=ref->{Info};    
    info= s/\s(.?)\s/1/ifdefined(info);    
    next if ( myconnectionid==id );    
    next if ( defined(query_time) &&query_time < runningtimethreshold);nextif(defined(command)    && commandeq"BinlogDump");nextif(defined(user)       && usereq"systemuser");nextif(defined(command)    
      && command eq "Sleep"            && defined(query_time)    
      && $query_time >= 1 );

    if ( type >= 1 ) {           next if ( defined(command) && commandeq"Sleep");nextif(defined(command) && $command eq "Connect" );    
    }

    if ( type >= 2 ) {           next if ( defined(info) && info= m/select/i);nextif(defined(info) && $info =~ m/^show/i );    
    }

    push @threads, $ref;   
  }    
  return @threads;    
}

sub main {   
  if ( command eq "stop" ) {          ## Gracefully killing connections on the current master          # 1. Set read_only= 1 on the new master          # 2. DROP USER so that no app user can establish new connections          # 3. Set read_only= 1 on the current master          # 4. Kill current queries          # * Any database access failure will result in script die.          myexit_code = 1;    
    eval {    
      ## Setting read_only=1 on the new master (to avoid accident)    
      my $new_master_handler = new MHA::DBHelper();

      # args: hostname, port, user, password, raise_error(die_on_error)_or_not   
      newmasterhandler>connect(new_master_ip, newmasterport,new_master_user, newmasterpassword,1);printcurrenttimeus()."Setreadonlyonthenewmaster..";new_master_handler->enable_read_only();    
      if ( newmasterhandler>isreadonly())print"ok.\n";elsedie"Failed!\n";new_master_handler->disconnect();

      # Connecting to the orig master, die if any database error happens   
      my origmasterhandler=newMHA::DBHelper();orig_master_handler->connect( origmasterip,orig_master_port,    
        origmasteruser,orig_master_password, 1 );

      ## Drop application user so that nobody can connect. Disabling per-session binlog beforehand   
      orig_master_handler->disable_log_bin_local();            print current_time_us() . " Drpping app user on the orig master..\n";            #FIXME_xxx_drop_app_user(orig_master_handler);

      ## Waiting for N * 100 milliseconds so that current connections can exit   
      my timeuntilreadonly=15;_tstart = [gettimeofday];    
      my @threads = get_threads_util( origmasterhandler>dbh,orig_master_handler->{connection_id} );    
      while ( time_until_read_only > 0 &&#threads >= 0 ) {    
        if ( timeuntilreadonly_tstart = [gettimeofday];    
        timeuntilreadonly;@threads=getthreadsutil(orig_master_handler->{dbh},    
          $orig_master_handler->{connection_id} );    
      }

      ## Setting read_only=1 on the current master so that nobody(except SUPER) can write   
      print current_time_us() . " Set read_only=1 on the orig master.. ";    
      origmasterhandler>enablereadonly();if(orig_master_handler->is_read_only() ) {    
        print "ok.\n";    
      }    
      else {    
        die "Failed!\n";    
      }

      ## Waiting for M * 100 milliseconds so that current update queries can complete   
      my timeuntilkillthreads=5;@threads=getthreadsutil(orig_master_handler->{dbh},    
        origmasterhandler>connectionid);while(time_until_kill_threads > 0 && #threads >= 0 ) {              if (time_until_kill_threads % 5 == 0 ) {    
          printf    
"%s Waiting all running %d queries are disconnected.. (max %d milliseconds)\n",    
            current_time_us(), #threads + 1,time_until_kill_threads * 100;    
          if ( #threads < 5 ) {                  print Data::Dumper->new( [$_] )->Indent(0)->Terse(1)->Dump . "\n"                    foreach (@threads);                }              }              sleep_until();_tstart = [gettimeofday];    
        timeuntilkillthreads;@threads=getthreadsutil(orig_master_handler->{dbh},    
          $orig_master_handler->{connection_id} );    
      }

      ## Terminating all threads   
      print current_time_us() . " Killing all application threads..\n";    
      origmasterhandler>killthreads(@threads)if(#threads >= 0 );    
      print current_time_us() . " done.\n";    
      origmasterhandler>enablelogbinlocal();orig_master_handler->disconnect();

      ## After finishing the script, MHA executes FLUSH TABLES WITH READ LOCK   
      eval {    
      `ssh -porigmastersshportorig_master_ssh_user\@origmasterhost\"ssh_stop_vip \"`;    
        };    
        if (@)warn$@;exit_code = 0;    
    };    
    if (@)warn"GotError:$@\n";exit$exitcode;exitexit_code;    
  }    
  elsif ( $command eq "start" ) {    
    ## Activating master ip on the new master    
    # 1. Create app user with write privileges    
    # 2. Moving backup script if needed    
    # 3. Register new master's ip to the catalog database

# We don't return error even though activating updatable accounts/ip failed so that we don't interrupt slaves' recovery.   
# If exit code is 0 or 10, MHA does not abort    
    my exit_code = 10;          eval {            mynew_master_handler = new MHA::DBHelper();

      # args: hostname, port, user, password, raise_error_or_not   
      newmasterhandler>connect(new_master_ip, newmasterport,new_master_user, $new_master_password, 1 );

      ## Set read_only=0 on the new master   
      newmasterhandler>disablelogbinlocal();printcurrenttimeus()."Setreadonly=0onthenewmaster.\n";new_master_handler->disable_read_only();

      ## Creating an app user on the new master   
      print current_time_us() . " Creating app user on the new master..\n";    
      #FIXME_xxx_create_app_user(newmasterhandler);new_master_handler->enable_log_bin_local();    
      $new_master_handler->disconnect();

      ## Update master ip on the catalog database, etc   
      `ssh -pnewmastersshportnew_master_ssh_user\@newmasterhost\"ssh_start_vip \"`;    
      exit_code = 0;          };          if (@) {    
      warn "Got Error: @\n";exitexit_code;    
    }    
    exit exit_code;        }        elsif (command eq "status" ) {

    # do nothing   
    exit 0;    
  }    
  else {    
    &usage();    
    exit 1;    
  }    
}

sub usage {   
  print    
"Usage: master_ip_online_change --command=start|stop|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";    
  die;    
}


(3) 编辑脚本send_report,修改后如下:

#!/usr/bin/perl

use strict;   
use warnings FATAL => 'all';    
use Mail::Sender;    
use Getopt::Long;

#new_master_host and new_slave_hosts are set only when recovering master succeeded   
my ( deadmasterhost,new_master_host, newslavehosts,subject, body);mysmtp='smtp.163.com';    
my mailfrom=xxxx;mymail_user='xxxxx';    
my mailpass=xxxxx;mymail_to=['xxxx','xxxx'];    
GetOptions(    
  'orig_master_host=s' => \$dead_master_host,    
  'new_master_host=s'  => \$new_master_host,    
  'new_slave_hosts=s'  => \$new_slave_hosts,    
  'subject=s'          => \$subject,    
  'body=s'             => \$body,    
);

mailToContacts(smtp,mail_from,mailuser,mail_pass,mailto,subject,$body);

sub mailToContacts {   
    my ( smtp,mail_from, user,passwd, mailto,subject, msg)=@;openmyDEBUG, "> /tmp/monitormail.log"    
        or die "Can't open the debug      file:!\n";mysender = new Mail::Sender {    
        ctype       => 'text/plain; charset=utf-8',    
        encoding    => 'utf-8',    
        smtp        => smtp,from=>mail_from,    
        auth        => 'LOGIN',    
        TLS_allowed => '0',    
        authid      => user,authpwd=>passwd,    
        to          => mailto,subject=>subject,    
        debug       => $DEBUG    
    };

    sender>MailMsg(msg=>$msg,debug=>$DEBUG)orprintMail::Sender::Error;    
    return 1;    
}

# Do whatever you want here

exit 0;

 

五、MHA的日常管理

1. 检查SSH配置(node5 Monitor 监控节点上操作),如下:

# masterha_check_ssh --conf=/etc/masterha/app1.cnf

Sun May  1 22:05:12 2016 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.   
Sun May  1 22:05:12 2016 - [info] Reading application default configuration from /etc/masterha/app1.cnf..    
Sun May  1 22:05:12 2016 - [info] Reading server configuration from /etc/masterha/app1.cnf..    
Sun May  1 22:05:12 2016 - [info] Starting SSH connection tests..    
Sun May  1 22:05:14 2016 - [debug]     
Sun May  1 22:05:12 2016 - [debug]  Connecting via SSH from root@192.168.1.121(192.168.1.121:22) to root@192.168.1.122(192.168.1.122:22)..    
Sun May  1 22:05:13 2016 - [debug]   ok.    
Sun May  1 22:05:13 2016 - [debug]  Connecting via SSH from root@192.168.1.121(192.168.1.121:22) to root@192.168.1.123(192.168.1.123:22)..    
Sun May  1 22:05:13 2016 - [debug]   ok.    
Sun May  1 22:05:14 2016 - [debug]     
Sun May  1 22:05:13 2016 - [debug]  Connecting via SSH from root@192.168.1.122(192.168.1.122:22) to root@192.168.1.121(192.168.1.121:22)..    
Sun May  1 22:05:13 2016 - [debug]   ok.    
Sun May  1 22:05:13 2016 - [debug]  Connecting via SSH from root@192.168.1.122(192.168.1.122:22) to root@192.168.1.123(192.168.1.123:22)..    
Sun May  1 22:05:14 2016 - [debug]   ok.    
Sun May  1 22:05:14 2016 - [debug]     
Sun May  1 22:05:13 2016 - [debug]  Connecting via SSH from root@192.168.1.123(192.168.1.123:22) to root@192.168.1.121(192.168.1.121:22)..    
Sun May  1 22:05:14 2016 - [debug]   ok.    
Sun May  1 22:05:14 2016 - [debug]  Connecting via SSH from root@192.168.1.123(192.168.1.123:22) to root@192.168.1.122(192.168.1.122:22)..    
Sun May  1 22:05:14 2016 - [debug]   ok.    
Sun May  1 22:05:14 2016 - [info] All SSH connection tests passed successfully.


2. 检查整个复制环境状况(node5 监控节点上操作),如下:

# masterha_check_repl --conf=/etc/masterha/app1.cnf

Sun May  1 22:46:44 2016 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.   
Sun May  1 22:46:44 2016 - [info] Reading application default configuration from /etc/masterha/app1.cnf..    
Sun May  1 22:46:44 2016 - [info] Reading server configuration from /etc/masterha/app1.cnf..    
Sun May  1 22:46:44 2016 - [info] MHA::MasterMonitor version 0.56.    
Sun May  1 22:46:45 2016 - [info] GTID failover mode = 0    
Sun May  1 22:46:45 2016 - [info] Dead Servers:    
Sun May  1 22:46:45 2016 - [info] Alive Servers:    
Sun May  1 22:46:45 2016 - [info]   192.168.1.121(192.168.1.121:3306)    
Sun May  1 22:46:45 2016 - [info]   192.168.1.122(192.168.1.122:3306)    
Sun May  1 22:46:45 2016 - [info]   192.168.1.123(192.168.1.123:3306)    
Sun May  1 22:46:45 2016 - [info] Alive Slaves:    
Sun May  1 22:46:45 2016 - [info]   192.168.1.122(192.168.1.122:3306)  Version=5.6.29-log (oldest major version between slaves) log-bin:enabled    
Sun May  1 22:46:45 2016 - [info]     Replicating from 192.168.1.121(192.168.1.121:3306)    
Sun May  1 22:46:45 2016 - [info]     Primary candidate for the new Master (candidate_master is set)    
Sun May  1 22:46:45 2016 - [info]   192.168.1.123(192.168.1.123:3306)  Version=5.6.29-log (oldest major version between slaves) log-bin:enabled    
Sun May  1 22:46:45 2016 - [info]     Replicating from 192.168.1.121(192.168.1.121:3306)    
Sun May  1 22:46:45 2016 - [info] Current Alive Master: 192.168.1.121(192.168.1.121:3306)    
Sun May  1 22:46:45 2016 - [info] Checking slave configurations..    
Sun May  1 22:46:45 2016 - [info]  read_only=1 is not set on slave 192.168.1.122(192.168.1.122:3306).    
Sun May  1 22:46:45 2016 - [warning]  relay_log_purge=0 is not set on slave 192.168.1.122(192.168.1.122:3306).    
Sun May  1 22:46:45 2016 - [info]  read_only=1 is not set on slave 192.168.1.123(192.168.1.123:3306).    
Sun May  1 22:46:45 2016 - [warning]  relay_log_purge=0 is not set on slave 192.168.1.123(192.168.1.123:3306).    
Sun May  1 22:46:45 2016 - [info] Checking replication filtering settings..    
Sun May  1 22:46:45 2016 - [info]  binlog_do_db= , binlog_ignore_db=     
Sun May  1 22:46:45 2016 - [info]  Replication filtering check ok.    
Sun May  1 22:46:45 2016 - [info] GTID (with auto-pos) is not supported    
Sun May  1 22:46:45 2016 - [info] Starting SSH connection tests..    
Sun May  1 22:46:46 2016 - [info] All SSH connection tests passed successfully.    
Sun May  1 22:46:46 2016 - [info] Checking MHA Node version..    
Sun May  1 22:46:47 2016 - [info]  Version check ok.    
Sun May  1 22:46:47 2016 - [info] Checking SSH publickey authentication settings on the current master..    
Sun May  1 22:46:47 2016 - [info] HealthCheck: SSH to 192.168.1.121 is reachable.    
Sun May  1 22:46:47 2016 - [info] Master MHA Node version is 0.56.    
Sun May  1 22:46:47 2016 - [info] Checking recovery script configurations on 192.168.1.121(192.168.1.121:3306)..    
Sun May  1 22:46:47 2016 - [info]   Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/usr/local/mysql/data/ --output_file=/tmp/save_binary_logs_test --manager_version=0.56 --start_file=master-bin.000008     
Sun May  1 22:46:47 2016 - [info]   Connecting to root@192.168.1.121(192.168.1.121:22)..     
  Creating /tmp if not exists..    ok.    
  Checking output directory is accessible or not..    
   ok.    
  Binlog found at /usr/local/mysql/data/, up to master-bin.000008    
Sun May  1 22:46:48 2016 - [info] Binlog setting check done.    
Sun May  1 22:46:48 2016 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..    
Sun May  1 22:46:48 2016 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user='root' --slave_host=192.168.1.122 --slave_ip=192.168.1.122 --slave_port=3306 --workdir=/tmp --target_version=5.6.29-log --manager_version=0.56 --relay_log_info=/usr/local/mysql/data/relay-log.info  --relay_dir=/usr/local/mysql/data/  --slave_pass=xxx    
Sun May  1 22:46:48 2016 - [info]   Connecting to root@192.168.1.122(192.168.1.122:22)..     
  Checking slave recovery environment settings..    
    Opening /usr/local/mysql/data/relay-log.info ... ok.    
    Relay log found at /usr/local/mysql/data, up to node2-relay-bin.000002    
    Temporary relay log file is /usr/local/mysql/data/node2-relay-bin.000002    
    Testing mysql connection and privileges..Warning: Using a password on the command line interface can be insecure.    
done.    
    Testing mysqlbinlog output.. done.    
    Cleaning up test file(s).. done.    
Sun May  1 22:46:48 2016 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user='root' --slave_host=192.168.1.123 --slave_ip=192.168.1.123 --slave_port=3306 --workdir=/tmp --target_version=5.6.29-log --manager_version=0.56 --relay_log_info=/usr/local/mysql/data/relay-log.info  --relay_dir=/usr/local/mysql/data/  --slave_pass=xxx    
Sun May  1 22:46:48 2016 - [info]   Connecting to root@192.168.1.123(192.168.1.123:22)..     
  Checking slave recovery environment settings..    
    Opening /usr/local/mysql/data/relay-log.info ... ok.    
    Relay log found at /usr/local/mysql/data, up to slave-relay-bin.000012    
    Temporary relay log file is /usr/local/mysql/data/slave-relay-bin.000012    
    Testing mysql connection and privileges..Warning: Using a password on the command line interface can be insecure.    
done.    
    Testing mysqlbinlog output.. done.    
    Cleaning up test file(s).. done.    
Sun May  1 22:46:48 2016 - [info] Slaves settings check done.    
Sun May  1 22:46:48 2016 - [info]     
192.168.1.121(192.168.1.121:3306) (current master)    
+--192.168.1.122(192.168.1.122:3306)    
+--192.168.1.123(192.168.1.123:3306)

Sun May  1 22:46:48 2016 - [info] Checking replication health on 192.168.1.122..   
Sun May  1 22:46:48 2016 - [info]  ok.    
Sun May  1 22:46:48 2016 - [info] Checking replication health on 192.168.1.123..    
Sun May  1 22:46:48 2016 - [info]  ok.    
Sun May  1 22:46:48 2016 - [info] Checking master_ip_failover_script status:    
Sun May  1 22:46:48 2016 - [info]   /usr/local/bin/master_ip_failover --command=status --ssh_user=root --orig_master_host=192.168.1.121 --orig_master_ip=192.168.1.121 --orig_master_port=3306

IN SCRIPT TEST====/etc/init.d/keepalived stop==/etc/init.d/keepalived start===

Checking the Status of the script.. OK    
Sun May  1 22:46:48 2016 - [info]  OK.    
Sun May  1 22:46:48 2016 - [warning] shutdown_script is not defined.    
Sun May  1 22:46:48 2016 - [info] Got exit code 0 (Not master dead).

MySQL Replication Health is OK.   
[root@node5 masterha]#


3. 开启MHA Manager监控(node5操作)如下:

# mkdir -p  /var/log/masterha/app1/   
# nohup masterha_manager --conf=/etc/masterha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/masterha/app1/manager.log 2>&1 & 

参数说明:   
--remove_dead_master_conf     #该参数代表当发生主从切换后,老的主库的ip将会从配置文件中移除。    
--manger_log                  #日志存放位置    
--ignore_last_failover        #在缺省情况下,如果MHA检测到连续发生宕机,会生成app1.failover.complete文件,会造成MHA管理进程无法启动。


4. 查看MHA Manager监控是否正常:

# masterha_check_status --conf=/etc/masterha/app1.cnf   
app1 (pid:2480) is running(0:PING_OK), master:192.168.1.121


5. 查看启动日志(node5操作)如下:

# tail -n20 /var/log/masterha/app1/manager.log


6. 关闭MHA Manage监控:

(1) 关闭

# masterha_stop --conf=/etc/masterha/app1.cnf

(2) 启动

# nohup masterha_manager --conf=/etc/masterha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/masterha/app1/manager.log 2>&1


六、MHA Failover切换

1. 自动Failover切换

(1) 模拟master mysql关闭   
(2) VIP将会切换到node2    
(3) /etc/masterha/app1.cnf中将原主服务器配置文件清掉。    
(4) masterha_manager监控进程会自动退出关闭,并在/var/log/masterha/app1下生成app1.failover.complete文件,manager.log会记录全过程,从服务器会自动从新的主服务器复制。    
(5) 原主服务器mysqld启动的,需要清掉/var/log/masterha/app1下生成app1.failover.complete文件,添加node1配置文件到/etc/masterha/app1.cnf,通过manager.log中的记录的故障点,重新同步主服务器,成为从节点。


2. 手动Failover切换

(1) 先停MHA Manager进程。   
masterha_stop --conf=/etc/masterha/app1.cnf    
(2) 停掉master mysqld    
(3) 手动切换,在Manager主机上操作如下:    
# masterha_master_switch --master_state=dead --conf=/etc/masterha/app1.cnf --dead_master_host=192.168.1.122 --dead_master_port=3306 --new_master_host=192.168.1.121 --new_master_port=3306 --ignore_last_failover

通过观察日志可以观察切换全过程。   
(4) 如上节方式恢复节点为从服务器。

3.  正常运行情况下切换(Master正在运行)

等补充。

4. 小结

通过对MMM,MHA的环境搭建测试,MHA由于采用复制架构,原理简单,在一些对数据要求比较高的环境,为了保证可靠性,最好与半同步结合使用。

本文出自 “koumm的linux技术博客” 博客,请务必保留此出处http://koumm.blog.51cto.com/703525/1769562

相关实践学习
如何快速连接云数据库RDS MySQL
本场景介绍如何通过阿里云数据管理服务DMS快速连接云数据库RDS MySQL,然后进行数据表的CRUD操作。
全面了解阿里云能为你做什么
阿里云在全球各地部署高效节能的绿色数据中心,利用清洁计算为万物互联的新世界提供源源不断的能源动力,目前开服的区域包括中国(华北、华东、华南、香港)、新加坡、美国(美东、美西)、欧洲、中东、澳大利亚、日本。目前阿里云的产品涵盖弹性计算、数据库、存储与CDN、分析与搜索、云通信、网络、管理与监控、应用服务、互联网中间件、移动服务、视频服务等。通过本课程,来了解阿里云能够为你的业务带来哪些帮助 &nbsp; &nbsp; 相关的阿里云产品:云服务器ECS 云服务器 ECS(Elastic Compute Service)是一种弹性可伸缩的计算服务,助您降低 IT 成本,提升运维效率,使您更专注于核心业务创新。产品详情: https://www.aliyun.com/product/ecs
目录
打赏
0
1
0
1
270
分享
相关文章
Mysql高可用架构方案
本文阐述了Mysql高可用架构方案,介绍了 主从模式,MHA模式,MMM模式,MGR模式 方案的实现方式,没有哪个方案是完美的,开发人员在选择何种方案应用到项目中也没有标准答案,合适的才是最好的。
470 3
Mysql高可用架构方案
云数据库:从零到一,构建高可用MySQL集群
在互联网时代,数据成为企业核心资产,传统单机数据库难以满足高并发、高可用需求。云数据库通过弹性扩展、分布式架构等优势解决了这些问题,但也面临数据安全和性能优化挑战。本文介绍了如何从零开始构建高可用MySQL集群,涵盖选择云服务提供商、创建实例、配置高可用架构、数据备份恢复及性能优化等内容,并通过电商平台案例展示了具体应用。
介绍几种 MySQL 官方高可用方案
MySQL 官方提供了多种高可用部署方案,从最基础的主从复制到组复制再到 InnoDB Cluster 等等。本篇文章以 MySQL 8.0 版本为准,介绍下不同高可用方案架构原理及使用场景。
1875 3
介绍几种 MySQL 官方高可用方案
orchestrator搭建mysql高可用
orchestrator搭建mysql高可用
107 0
【MySQL】Orchestrator最简单的 mysql 高可用方案最细细细细~
【MySQL】Orchestrator最简单的 mysql 高可用方案最细细细细~
MySQL调优秘籍曝光!从索引到事务,全方位解锁高可用秘诀,让你的数据库性能飞起来!
【8月更文挑战第6天】MySQL是顶级关系型数据库之一,其性能直接影响应用的高可用性与用户体验。本文聚焦MySQL的高性能调优,从索引设计到事务管理,逐一解析。介绍如何构建高效索引,如联合索引`CREATE INDEX idx_order_customer ON orders(order_id, customer_id);`,以及索引覆盖查询等技术。
108 0
MySQL高可用架构设计:从主从复制到分布式集群
MySQL高可用性涉及主从复制、半同步复制和Group/InnoDB Cluster。主从复制通过二进制日志同步数据,保证故障时可切换。半同步复制确保事务在至少一个从服务器确认后才提交。Group Replication是多主复制,支持自动故障切换。InnoDB Cluster是8.0的集成解决方案,简化集群管理。使用这些技术能提升数据库的稳定性和可靠性。
779 2
目录
目录
AI助理

你好,我是AI助理

可以解答问题、推荐解决方案等