MySQL MHA
架构介绍:MHA由两部分组成MHA Manager(管理节点)和MHA Node(数据节点),MHA Node运行在每台MySQL服务器上,MHA Manager会定时探测集群中的master节点,当master出现故障时,它可以自动将最新数据的slave提升为新的master,然后将所有其他的slave重新指向新的master
MHA的隐患:在MHA自动故障切换的过程中,MHA试图从宕掉的主服务器上保存二进制日志,最大程度保证数据的不丢失,存在的问题是,如果主服务器硬件故障宕机或无法通过SSH访问,MHA没有办法保存二进制日志,只能进行故障转移而可能丢失最新数据
工作原理总结为以下几条:
1.从宕机崩溃的master保存二进制日志事件(binlog events);
2.识别含有最新更新的slave;
3.应用差异的中继日志(relay log) 到其他slave;
4.应用从master保存的二进制日志事件(binlog events);
5.提升一个slave为新master;
6.使用其他的slave连接新的master进行复制。
1、安装mysql:
1.1 添加环境变量
vim /etc/profile
export PATH=$PATH:/usr/local/mysql/bin
source /etc/profile
1.2 解压tar包
tar -xf mysql-5.7.22-linux-glibc2.12-x86_64.tar.gz
mv mysql-5.7.22-linux-glibc2.12-x86_64 mysql
scp -r /usr/local/mysql slave1:/usr/local/mysql
scp -r /usr/local/mysql slave2:/usr/local/mysql
scp /etc/my.cnf slave1:/etc/
scp /etc/my.cnf slave2:/etc/
所有节点my.cnf的server-id必须唯一
1.3 创建用户,目录,授权,初始化,启动(3台执行)
useradd mysql
mkdir -p /home/mysql3306/{mysql3306,logs}
chown mysql:mysql /home/mysql3306 -R
chown mysql:mysql /usr/local/mysql -R
mysqld --defaults-file=/etc/my.cnf --initialize-insecure --datadir=/home/mysql3306/mysql3306 --basedir=/usr/local/mysql --user=mysql
mysqld_safe --user=mysql &
2、配置主从
2.1 在master上建立帐户并授权slave:
grant REPLICATION CLIENT,REPLICATION SLAVE on *.* to rep@'192.168.111.129' identified by '123456';
grant REPLICATION CLIENT,REPLICATION SLAVE on *.* to rep@'192.168.111.130' identified by '123456';
flush privileges;
2.2 查看master状态,获取binlog文件和pos点
mysql> show master status;
2.3 slave1、slave2设置需要同步的主库
change master to master_host='192.168.111.128',master_user='rep',master_password='123456', master_log_file='mysql-bin.000002',master_log_pos=1229,MASTER_PORT=3306;
flush privileges;
start slave;
2.4 查看从服务器复制状态
show slave status\G
2.5 两台slave服务器设置read_only(从库对外提供读服务,之所以没有写进配置文件,是因为随时slave会提升为master)
mysql -uroot -e "set global read_only=1"
2.6 所有节点创建manager所需的监控用户
grant all privileges on *.* to 'rep'@'192.168.111.%' identified by '123456';
3、搭建MHA
3.1 配置集群内时间同步、ssh免密码登陆
3.2 MHA node节点安装
yum install -y perl-ExtUtils-CBuilder perl-ExtUtils-MakeMaker perl-DBD-MySQL perl-devel perl-CPAN
mkdir -p /etc/mha ##创建安装目录
tar -xf mha4mysql-node-0.57.tar.gz -C /etc/mha/
mv /etc/mha/mha4mysql-node-0.57 /etc/mha/node
cd /etc/mha/node
perl Makefile.PL
make && make install
3.3 MHA manager节点安装
yum install -y perl-Config-Tiny perl-Log-Dispatch perl-Parallel-ForkManager perl-Time-HiRes
tar -xf mha4mysql-manager-0.57.tar.gz -C /etc/mha/
mv /etc/mha/mha4mysql-manager-0.57 /etc/mha/manager
cd /etc/mha/manager
perl Makefile.PL
make && make install
3.4 配置MHA
修改manager配置文件
mkdir /etc/mha/app1 ##创建manager工作目录
cp /etc/mha/manager/samples/conf/app1.cnf /etc/mha/
vim /etc/mha/app1.cnf
[server default]
manager_workdir=/etc/mha/app1 #MHA工作路径
manager_log=/etc/mha/app1/manager.log #MHA日志路径
master_binlog_dir="/home/mysql3306/mysql3306" #MHA node端的binlog路径,也就是mysql的数据目录
remote_workdir=/etc/mha/app1 #远端mysql在发生切换时binlog的保存位置
master_ip_failover_script=/etc/mha/master_ip_failover #自动failover时候的切换脚本
master_ip_online_change_script=/etc/mha/master_ip_online_change #手动切换脚本
report_script=/etc/mha/send_report #发生切换后报警脚本
user=rep #监控用户
password=123456 #监控用户密码
repl_user=rep #复制用户
repl_password=123456 #复制用户密码
ping_interval=1 #MHA manager的检测时间间隔(1秒)
secondary_check_script= masterha_secondary_check -s slave1 -s mastre --user=rep --master_host=master --master_ip=192.168.111.128 --master_port=3306 #MHA检测到master出现问题,Manager会尝试从slave1登陆到master
[server1]
hostname=192.168.111.128
port=3306
ssh_port=22
[server2]
hostname=192.168.111.129
port=3306
candidate_master=1 #备用主,如果主库出问题,此库将提升为主库,即使这个库不是集群中事件最新的slave
ssh_port=22
check_repl_delay=0 #默认情况下,一个slave落后于master 100M的relay log,MHA将不会选择该slave为一个新的master,设置为0,MHA触发切换在选择一个新的master的时候将会忽略复制延时,这个参数对于设置了candidate_master=1的主机非常有用,因为这个候选主在切换的过程中一定是新的master
[server3]
hostname=192.168.111.130
port=3306
no_master=1
ssh_port=22
3.5 设置slave节点relay log清除方式;建立硬连接
MHA发生切换工程中,从库恢复依赖于relay log,mysql默认情况下,从库应用完就会自动清除relay log,因此将其设置为OFF,采用手动清理方式。
mysql -uroot -p123456 -e "set global relay_log_purge=0"
定期删除relay log可能会出现复制延迟的问题,所以建立relay log日志硬连接,因为linux系统中通过硬连接删除大文件速度快。
mkdir /home/mysql3306/logs1
ln /home/mysql3306/logs/mysql-relay* /home/mysql3306/logs1
3.6 编写定期清理relay log脚本,结合定时任务清理(slave1、slave2操作)
vim /etc/mha/purge_relay_log.sh
#!/bin/bash
user=root
passwd=123456
port=3306
log_dir='/home/mysql3306/logs/'
work_dir='/home/mysql3306/logs1'
purge='/usr/local/bin/purge_relay_logs'
if [ ! -d $log_dir ]
then
mkdir $log_dir -p
fi
$purge --user=$user --password=$passwd --disable_relay_log_purge --port=$port --host=localhost --workdir=$work_dir >> $log_dir/purge_relay_logs.log 2>&1
参数说明:
--work_dir:指定创建relay log的硬链接的位置
--disable_relay_log_purge :默认情况下,如果relay_log_purge=1,脚本会什么都不清理,自动退出,通过设定这个参数,当relay_log_purge=1的情况下会将relay_log_purge设置为0。清理relay log之后,最后将参数设置为OFF。
此处有几个小细节
purge_relay_logs脚本中定义了的sock文件位置/var/lib/mysql/mysql.sock,可以做个软链
ln -s /tmp/mysql3306.sock /var/lib/mysql/mysql.sock
purge_relay_logs需要--user=root --host=localhost 没有权限的,需要设置
没问题了,可以先测试下:
purge_relay_logs --user=root --host=localhost --port=3306 --password=123456 -disable_relay_log_purge --workdir=/home/mysql3306/logs/
出现这个说明测试通过:2018-07-04 05:22:21: All relay log purging operations succeeded.
添加定时任务
crontab -e
0 0 */3 * * sh /etc/auto_clean_relay_log.sh
3.7 创建自动切换脚本
vim /etc/mha/master_ip_failover
#!/usr/bin/env perl
use strict;
use warnings FATAL => 'all';
use Getopt::Long;
my (
$command, $ssh_user, $orig_master_host, $orig_master_ip,
$orig_master_port, $new_master_host, $new_master_ip, $new_master_port
);
my $vip = '192.168.111.111/24';
my $key = '0';
my $ssh_start_vip = "/sbin/ifconfig eth0:$key $vip";
my $ssh_stop_vip = "/sbin/ifconfig eth0:$key down";
GetOptions(
'command=s' => \$command,
'ssh_user=s' => \$ssh_user,
'orig_master_host=s' => \$orig_master_host,
'orig_master_ip=s' => \$orig_master_ip,
'orig_master_port=i' => \$orig_master_port,
'new_master_host=s' => \$new_master_host,
'new_master_ip=s' => \$new_master_ip,
'new_master_port=i' => \$new_master_port,
);
exit &main();
sub main {
print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n";
if ( $command eq "stop" || $command eq "stopssh" ) {
my $exit_code = 1;
eval {
print "Disabling the VIP on old master: $orig_master_host \n";
&stop_vip();
$exit_code = 0;
};
if ($@) {
warn "Got Error: $@\n";
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "start" ) {
my $exit_code = 10;
eval {
print "Enabling the VIP - $vip on the new master - $new_master_host \n";
&start_vip();
$exit_code = 0;
};
if ($@) {
warn $@;
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "status" ) {
print "Checking the Status of the script.. OK \n";
exit 0;
}
else {
&usage();
exit 1;
}
}
sub start_vip() {
`ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;
}
sub stop_vip() {
return 0 unless ($ssh_user);
`ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
}
sub usage {
"Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip
--orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
}
3.8 创建手动切换脚本
vim /etc/mha/master_ip_online_change
#!/usr/bin/env perl
use strict;
use warnings FATAL =>'all';
use Getopt::Long;
my $vip = '192.168.111.111/24'; # Virtual IP
my $key = "0";
my $ssh_start_vip = "/sbin/ifconfig eth0:$key $vip";
my $ssh_stop_vip = "/sbin/ifconfig eth0:$key down";
my $exit_code = 0;
my (
$command, $orig_master_is_new_slave, $orig_master_host,
$orig_master_ip, $orig_master_port, $orig_master_user,
$orig_master_password, $orig_master_ssh_user, $new_master_host,
$new_master_ip, $new_master_port, $new_master_user,
$new_master_password, $new_master_ssh_user,
);
GetOptions(
'command=s' => \$command,
'orig_master_is_new_slave' => \$orig_master_is_new_slave,
'orig_master_host=s' => \$orig_master_host,
'orig_master_ip=s' => \$orig_master_ip,
'orig_master_port=i' => \$orig_master_port,
'orig_master_user=s' => \$orig_master_user,
'orig_master_password=s' => \$orig_master_password,
'orig_master_ssh_user=s' => \$orig_master_ssh_user,
'new_master_host=s' => \$new_master_host,
'new_master_ip=s' => \$new_master_ip,
'new_master_port=i' => \$new_master_port,
'new_master_user=s' => \$new_master_user,
'new_master_password=s' => \$new_master_password,
'new_master_ssh_user=s' => \$new_master_ssh_user,
);
exit &main();
sub main {
#print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n";
if ( $command eq "stop" || $command eq "stopssh" ) {
# $orig_master_host, $orig_master_ip, $orig_master_port are passed.
# If you manage master ip address at global catalog database,
# invalidate orig_master_ip here.
my $exit_code = 1;
eval {
print "\n\n\n***************************************************************\n";
print "Disabling the VIP - $vip on old master: $orig_master_host\n";
print "***************************************************************\n\n\n\n";
&stop_vip();
$exit_code = 0;
};
if ($@) {
warn "Got Error: $@\n";
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "start" ) {
# all arguments are passed.
# If you manage master ip address at global catalog database,
# activate new_master_ip here.
# You can also grant write access (create user, set read_only=0, etc) here.
my $exit_code = 10;
eval {
print "\n\n\n***************************************************************\n";
print "Enabling the VIP - $vip on new master: $new_master_host \n";
print "***************************************************************\n\n\n\n";
&start_vip();
$exit_code = 0;
};
if ($@) {
warn $@;
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "status" ) {
print "Checking the Status of the script.. OK \n";
`ssh $orig_master_ssh_user\@$orig_master_host \" $ssh_start_vip \"`;
exit 0;
}
else {
&usage();
exit 1;
}
}
# A simple system call that enable the VIP on the new master
sub start_vip() {
`ssh $new_master_ssh_user\@$new_master_host \" $ssh_start_vip \"`;
}
# A simple system call that disable the VIP on the old_master
sub stop_vip() {
`ssh $orig_master_ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
}
sub usage {
"Usage: master_ip_failover -command=start|stop|stopssh|status -orig_master_host=host -orig_master_ip=ip -
orig_master_port=po
rt -new_master_host=host -new_master_ip=ip -new_master_port=port\n";
}
3.9编写切换节点监控报警脚本
vim /etc/mha/send_report
#!/usr/bin/perl
use strict;
use warnings FATAL => 'all';
use Mail::Sender;
use Getopt::Long;
#new_master_host and new_slave_hosts are set only when recovering master succeeded
my ( $dead_master_host, $new_master_host, $new_slave_hosts, $subject, $body );
my $smtp='smtp.163.com';
my $mail_from='xxxxx@163.com';
my $mail_user='xxxxx@163.com';
my $mail_pass='xxxxx';
my $mail_to=['xxxxx@139.com'];
GetOptions(
'orig_master_host=s' => \$dead_master_host,
'new_master_host=s' => \$new_master_host,
'new_slave_hosts=s' => \$new_slave_hosts,
'subject=s' => \$subject,
'body=s' => \$body,
);
mailToContacts($smtp,$mail_from,$mail_user,$mail_pass,$mail_to,$subject,$body);
sub mailToContacts {
my ( $smtp, $mail_from, $user, $passwd, $mail_to, $subject, $msg ) = @_;
open my $DEBUG, "> /tmp/monitormail.log"
or die "Can't open the debug file:$!\n";
my $sender = new Mail::Sender {
ctype => 'text/plain; charset=utf-8',
encoding => 'utf-8',
smtp => $smtp,
from => $mail_from,
auth => 'LOGIN',
TLS_allowed => '0',
authid => $user,
authpwd => $passwd,
to => $mail_to,
subject => $subject,
debug => $DEBUG
};
$sender->MailMsg(
{ msg => $msg,
debug => $DEBUG
}
) or print $Mail::Sender::Error;
return 1;
}
# Do whatever you want here
exit 0;
脚本需要修改的地方
my $smtp='smtp.163.com'; ## 提供smtp服务的服务商地址,通常为smtp.(qq.163.139.)com
my $mail_from='xxxxx@163.com'; ## 发送邮件的邮箱
my $mail_user='xxxxx@163.com'; ## 同上
my $mail_pass='xxxxx'; ## 邮箱授权码,邮箱开启pop3/smtp时,一般会让你设置密码
my $mail_to=['xxxxx@139.com']; ## 接收邮件的邮箱,139为移动的短信邮箱,很方便,直接短信接收信息
给其执行权限
chmod +x /etc/mha/master_ip_failover
chmod +x /etc/mha/master_ip_online_change
chmod +x /etc/mha/send_report
3.10manager检查ssh是否成功
/etc/mha/manager/bin/masterha_check_ssh --conf=/etc/mha/app1.cnf
3.11manager检查复制状态
所有节点创建软链
ln -s /usr/local/mysql/bin/mysqlbinlog /usr/bin/mysqlbinlog
ln -s /usr/local/mysql/bin/mysql /usr/bin/mysql
/etc/mha/manager/bin/masterha_check_repl --conf=/etc/mha/app1.cnf
3.12为master添加vip
ifconfig ens33:0 192.168.111.111
3.13manager节点启动mha
nohup /etc/mha/manager/bin/masterha_manager --conf=/etc/mha/app1.cnf --ignore_last_failover >/tmp/mha_manager.log < /dev/null 2>&1 &
3.14检查mha状态
/etc/mha/manager/bin/masterha_check_status --conf=/etc/mha/app1.cnf
3.15测试
实验一:测试自动Failover
1.在slave1 上我先停掉IO线程,模拟主从延迟
stop slave io_thread;
2.master库导入一张表(数据量尽量大点,建议10W+以上数据)
这时候slave2一直在同步数据
3.slave1开启IO线程
start slave io_thread;
4.停掉master mysql
实验使用pkill mysql(生产禁用)
5.查看manager日志,可以看出master已经换了
tail -300f /etc/mha/app1/manager.log
6.在新的master上可以看到落后的数据也已经同步过来了
7.查看Vip飘逸情况,vip是否到了slave1这台主机
实验二:手动Failover测试
注意,执行手动Failover时,MHA manager必须没有运行,否则,manager会挂掉
1.停止manager和master的mysql
/etc/mha/manager/bin/masterha_stop --conf=/etc/mha/app1.cnf
实验使用pkill mysql(生产禁用)
2.执行manager上的脚本master_ip_online_change