8、启动Manager
- MySQL 中主从的工作状态检测及切换是由 Manager 节点来完成的,MHA 安装完成以及检测通过后就可以根据自己的需求开启以及停止 Manager。
#Manager 是通过 masterha_manager 命令开启,启动后需要将它放在后台运行。 [root@manager mha]# nohup masterha_manager --conf=/etc/masterha/app1.cnf \ --remove_dead_master_conf --ignore_last_failover < /dev/null > \ /var/log/masterha/app1/manager.log 2>&1 & [1] 1743
- 检测 Manager 的工作状态
#当 MHA Manager 启动监控以后,如果没有异常则不会打印任何信息。我们可通过 masterha_check_status 命令检查 Manager 的状态。 [root@manager app1]# masterha_check_status --conf=/etc/masterha/app1.cnf app1 (pid:14175) is running(0:PING_OK), master:192.168.100.211 #running即可
- 在 MHA 环境中配置 VIP
#通过 MHA 进行故障转移后,连接 MySQL 数据库的服务并不知道 MySQL 复制环境中进行了故障的转移,同时连 MySQL 的服务也无法知晓主节点是哪一个,此时,可以通过配置 VIP 的方式让所有的应用程序连接 VIP,当 MySQL 故障切换时,VIP 会自动漂移到新的主节点。 ————————————————————————————————————————————————————在master主节点上配置vip [root@master ~]# ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: ens32: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000 link/ether 00:0c:29:65:e8:85 brd ff:ff:ff:ff:ff:ff inet 192.168.100.211/24 brd 192.168.100.255 scope global ens32 valid_lft forever preferred_lft forever inet6 fe80::e029:444:1ec0:7c78/64 scope link valid_lft forever preferred_lft forever inet6 fe80::7ab:dbe:2aec:32fa/64 scope link tentative dadfailed valid_lft forever preferred_lft forever inet6 fe80::34f4:cad:16ae:5b4d/64 scope link tentative dadfailed valid_lft forever preferred_lft forever [root@master ~]# ifconfig ens32:1 192.168.100.188 netmask 255.255.255.0 up [root@master ~]# ifconfig ens32:1 ens32:1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 192.168.100.188 netmask 255.255.255.0 broadcast 192.168.100.255 ether 00:0c:29:65:e8:85 txqueuelen 1000 (Ethernet) ————————————————————————————————————————————在manager上修改 MHA 配置文件,使其支持 VIP [root@manager ~]# vim /etc/masterha/app1.cnf [server default] master_ip_failover_script=/usr/bin/master_ip_failover #在第一行下添加 #保存退出
- 在manager节点上编写 VIP 自动切换脚本
[root@manager ~]# vim /usr/bin/master_ip_failover #就是在app1上指定的文件 #!/usr/bin/env perl use strict; use warnings FATAL => 'all'; use Getopt::Long; my ( $command, $ssh_user, $orig_master_host, $orig_master_ip, $orig_master_port, $new_master_host, $new_master_ip, $new_master_port ); my $vip = '192.168.100.188/24'; #记得修改vip my $key = '1'; my $ssh_start_vip = "/sbin/ifconfig ens32:$key $vip"; my $ssh_stop_vip = "/sbin/ifconfig ens32:$key down"; GetOptions( 'command=s' => \$command, 'ssh_user=s' => \$ssh_user, 'orig_master_host=s' => \$orig_master_host, 'orig_master_ip=s' => \$orig_master_ip, 'orig_master_port=i' => \$orig_master_port, 'new_master_host=s' => \$new_master_host, 'new_master_ip=s' => \$new_master_ip, 'new_master_port=i' => \$new_master_port, ); exit &main(); sub main { print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n"; if ( $command eq "stop" || $command eq "stopssh" ) { my $exit_code = 1; eval { print "Disabling the VIP on old master: $orig_master_host \n"; &stop_vip(); $exit_code = 0; }; if ($@) { warn "Got Error: $@\n"; exit $exit_code; } exit $exit_code; } elsif ( $command eq "start" ) { my $exit_code = 10; eval { print "Enabling the VIP - $vip on the new master - $new_master_host \n"; &start_vip(); $exit_code = 0; }; if ($@) { warn $@; exit $exit_code; } exit $exit_code; } elsif ( $command eq "status" ) { print "Checking the Status of the script.. OK \n"; exit 0; } else { &usage(); exit 1; } } sub start_vip() { `ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`; } sub stop_vip() { return 0 unless ($ssh_user); `ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`; } sub usage { print "Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n"; } #保存退出 [root@manager ~]# chmod +x /usr/bin/master_ip_failover #添加可执行权限
- 再次在manager上进行检测
[root@manager ~]# masterha_check_repl --conf=/etc/masterha/app1.cnf 。。。。。。 MySQL Replication Health is OK. #这里也是ok
- 开启监控
[root@manager ~]# nohup masterha_manager --conf=/etc/masterha/app1.cnf \ --remove_dead_master_conf --ignore_last_failover < /dev/null > \ /var/log/masterha/app1/manager.log 2>&1 & [1] 14175 [root@manager ~]# masterha_check_status --conf=/etc/masterha/app1.cnf app1 (pid:14175) is running(0:PING_OK), master:192.168.100.211
9、验证故障转移
******(1)关闭master上的mysql [root@master ~]# systemctl stop mysqld ******(2)去manager上查看日志 [root@manager ~]# tail -n10 /var/log/masterha/app1/manager.log Mon Aug 2 06:10:00 2021 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.100.211' (111)) Mon Aug 2 06:10:00 2021 - [warning] Connection failed 4 time(s).. Mon Aug 2 06:10:00 2021 - [warning] Secondary network check script returned errors. Failover should not start so checking server status again. Check network settings for details. Mon Aug 2 06:10:01 2021 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.100.211' (111)) Mon Aug 2 06:10:01 2021 - [warning] Connection failed 1 time(s).. Mon Aug 2 06:10:01 2021 - [info] Executing secondary network check script: /usr/local/bin/masterha_secondary_check -s 192.168.100.212 -s 192.168.100.213 --user=root --master_host=192.168.100.211 --master_ip=192.168.100.211 --master_port=3306 --master_user=root --master_password=123123 --ping_type=SELECT Mon Aug 2 06:10:01 2021 - [info] Executing SSH check script: save_binary_logs --command=test --start_pos=4 --binlog_dir=/usr/local/mysql/data/ --output_file=/tmp/save_binary_logs_test --manager_version=0.57 --binlog_prefix=mysql-bin-master sh: /usr/local/bin/masterha_secondary_check: 没有那个文件或目录 Mon Aug 2 06:10:01 2021 - [error][/usr/share/perl5/vendor_perl/MHA/HealthCheck.pm, ln412] Got unknown error from /usr/local/bin/masterha_secondary_check -s 192.168.100.212 -s 192.168.100.213 --user=root --master_host=192.168.100.211 --master_ip=192.168.100.211 --master_port=3306 --master_user=root --master_password=123123 --ping_type=SELECT. exit. Mon Aug 2 06:10:01 2021 - [info] HealthCheck: SSH to 192.168.100.211 is reachable. ******(3)在slave1上查看,这样做只会飘逸一次,并且当master重新连回来不会回到主从复制的集群中 [root@slave1 ~]# ifconfig ens32:1 #发现vip已经漂移到了slave1上 ens32:1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 192.168.100.188 netmask 255.255.255.0 broadcast 192.168.100.255 ether 00:0c:29:7d:7d:68 txqueuelen 1000 (Ethernet)
三、MHA报错解决
- 确认安装依赖
yum install -y perl-Config-Tiny epel-release perl-Log-Dispatch perl-Parallel-ForkManager perl-Time-HiRes
- 报错1
运行masterha_check_repl --conf=/etc/masterha/app1.cnf报错 Testing [mysql](https://www.yisu.com/mysql/) connection and privileges..sh: mysql: command not found mysql command failed with rc 127:0! at /usr/bin/apply_diff_relay_logs line 375 解决方案:ln -s /usr/local/mysql/bin/mysql /usr/bin
报错2
运行masterha_check_repl --conf=/etc/masterha/app1.cnf报错 Can't exec "mysqlbinlog": No such file or directory at /usr/local/perl5/MHA/BinlogManager.pm line 99. 解决方案:在node节点上执行 which mysqlbinlog,比如我的结果就是 [localhost~]$ which mysqlbinlog /usr/local/mysql/bin/mysqlbinlog ln -s /usr/local/mysql/bin/mysqlbinlog /usr/bin/mysqlbinlog
报错3
运行master_check_ssh --conf=/etc/masterha/aap1.cnf报错 connection via SSH fromroot@192.168.17.199toroot@192.168.17.200 ... permission denied (publickey,gssapi-keyex,gssapi-with-mic,password) [error] [/usr/local/share/perl5/MHA/SSHcheck.pm,ln163] 解决方案:一般是公钥有问题,需要删除 /root/.ssh/known_hosts里面的相关ip内容 重新生成一下就ok了
- 报错4
运行master_check_ssh --conf=/etc/masterha/aap1.cnf报错 Sun Nov 20 20:10:59 2016 - [info] Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/mysqllog/3306 --output_file=/masterha/app1/save_binary_logs_test --manager_version=0.55 --start_file=mysql-bin.000001 Sun Nov 20 20:10:59 2016 - [info] Connecting to root@172.18.3.180(172.18.3.180).. Failed to save binary log: Binlog not found from /data/mysqllog/3306! If you got this error at MHA Manager, please set "master_binlog_dir=/path/to/binlog_directory_of_the_master" correctly in the MHA Manager's configuration file and try again. at /usr/bin/save_binary_logs line 117. eval {...} called at /usr/bin/save_binary_logs line 66 main::main() called at /usr/bin/save_binary_logs line 62 Sun Nov 20 20:10:59 2016 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln154] Master setting check failed! Sun Nov 20 20:10:59 2016 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln367] Master configuration failed. Sun Nov 20 20:10:59 2016 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln386] Error happend on checking configurations. at /usr/bin/masterha_check_repl line 48. Sun Nov 20 20:10:59 2016 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln482] Error happened on monitoring servers. Sun Nov 20 20:10:59 2016 - [info] Got exit code 1 (Not master dead). 解决方案: /etc/masterha/aap1.cnf中的datadir路径应该是mysql中bin-log的位置
- 报错5
运行masterha_check_repl --conf=/etc/masterha/app1.cnf报错 Mon Nov 21 11:11:40 2016 - [info] MHA::MasterRotate version 0.55. Mon Nov 21 11:11:40 2016 - [info] Starting online master switch.. Mon Nov 21 11:11:40 2016 - [info] Mon Nov 21 11:11:40 2016 - [info] * Phase 1: Configuration Check Phase.. Mon Nov 21 11:11:40 2016 - [info] Mon Nov 21 11:11:40 2016 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. Mon Nov 21 11:11:40 2016 - [info] Reading application default configurations from /etc/masterha/app1.cnf.. Mon Nov 21 11:11:40 2016 - [info] Reading server configurations from /etc/masterha/app1.cnf.. Mon Nov 21 11:11:40 2016 - [error][/usr/share/perl5/vendor_perl/MHA/ServerManager.pm, ln604] There are 2 non-slave servers! MHA manages at most one non-slave server. Check configurations. Mon Nov 21 11:11:40 2016 - [error][/usr/share/perl5/vendor_perl/MHA/ManagerUtil.pm, ln178] Got ERROR: at /usr/share/perl5/vendor_perl/MHA/MasterRotate.pm line 85. 解决方案: 手动将修复后的master做成新的master的从服务器