MHA 切换的2个异常(masterha_master_switch line 53)

简介:         MHA 在测试手动故障转移和在线切换的过程中,碰到了2个比较诡异的问题,在使用IP地址调用的时候均无法测试成功,出现了Detected dead master xxx does not match with specified dead master以及xxx is not alive。

        MHA 在测试手动故障转移和在线切换的过程中,碰到了2个比较诡异的问题,在使用IP地址调用的时候均无法测试成功,出现了Detected dead master xxx does not match with specified dead master以及xxx is not alive。下面是这2个错误问题的描述及解决方案。

 

1、MHA配置文件
[root@vdbsrv4 ~]# more /etc/masterha/app1.cnf
[server default]
manager_workdir=/var/log/masterha/app1
manager_log=/var/log/masterha/app1/manager.log

user=mha
password=xxx
ssh_user=root
repl_user=repl  
repl_password=repl  
ping_interval=1
shutdown_script=""
master_ip_online_change_script=""
report_script=""
#master_ip_failover_script=/usr/bin/master_ip_failover
master_ip_failover_script=/tmp/master_ip_failover
 
[server1]
hostname=vdbsrv1
master_binlog_dir=/data/mysqldata

[server2]
hostname=vdbsrv2
master_binlog_dir=/data/mysqldata

[server3]
hostname=vdbsrv3
master_binlog_dir=/data/mysqldata/
#candidate_master=1

 

2、手动故障转移时的错误提示
[root@vdbsrv4 ~]# masterha_master_switch --master_state=dead --conf=/etc/masterha/app1.cnf --dead_master_host=192.168.1.6 \
> --dead_master_port=3306 --new_master_host=192.168.1.8 --new_master_port=3306 --ignore_last_failover
--dead_master_ip=<dead_master_ip> is not set. Using 192.168.1.6.
Wed Apr 21 09:08:30 2015 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Wed Apr 21 09:08:30 2015 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Wed Apr 21 09:08:30 2015 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Wed Apr 21 09:08:30 2015 - [info] MHA::MasterFailover version 0.56.
Wed Apr 21 09:08:30 2015 - [info] Starting master failover.
Wed Apr 21 09:08:30 2015 - [info]
Wed Apr 21 09:08:30 2015 - [info] * Phase 1: Configuration Check Phase..
Wed Apr 21 09:08:30 2015 - [info]
Wed Apr 21 09:08:31 2015 - [info] GTID failover mode = 0
Wed Apr 21 09:08:31 2015 - [error][/usr/lib/perl5/site_perl/5.8.8/MHA/MasterFailover.pm, ln2083] Detected dead master vdbsrv1(192.168.1.6:3306)
   does not match with specified dead master 192.168.1.6(192.168.1.6:3306)!
Wed Apr 21 09:08:31 2015 - [error][/usr/lib/perl5/site_perl/5.8.8/MHA/MasterFailover.pm, ln2151]
   Got ERROR:  at /usr/bin/masterha_master_switch line 53

 

3、在线切换时的错误提示
[root@vdbsrv4 ~]# masterha_master_switch --conf=/etc/masterha/app1.cnf --master_state=alive --new_master_host=192.168.1.8 \
> --orig_master_is_new_slave --running_updates_limit=10000
Tue Apr 21 11:50:14 2015 - [info] MHA::MasterRotate version 0.56.
Tue Apr 21 11:50:14 2015 - [info] Starting online master switch..
Tue Apr 21 11:50:14 2015 - [info]
Tue Apr 21 11:50:14 2015 - [info] * Phase 1: Configuration Check Phase..
Tue Apr 21 11:50:14 2015 - [info]
Tue Apr 21 11:50:14 2015 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Tue Apr 21 11:50:14 2015 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Tue Apr 21 11:50:14 2015 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Tue Apr 21 11:50:14 2015 - [info] GTID failover mode = 0
Tue Apr 21 11:50:14 2015 - [info] Current Alive Master: vdbsrv1(192.168.1.6:3306)
Tue Apr 21 11:50:14 2015 - [info] Alive Slaves:
Tue Apr 21 11:50:14 2015 - [info]   vdbsrv2(192.168.1.7:3306)  Version=5.6.22-log (oldest major version between slaves) log-bin:enabled
Tue Apr 21 11:50:14 2015 - [info]     Replicating from 192.168.1.6(192.168.1.6:3306)
Tue Apr 21 11:50:14 2015 - [info]   vdbsrv3(192.168.1.8:3306)  Version=5.6.22-log (oldest major version between slaves) log-bin:enabled
Tue Apr 21 11:50:14 2015 - [info]     Replicating from 192.168.1.6(192.168.1.6:3306)

It is better to execute FLUSH NO_WRITE_TO_BINLOG TABLES on the master before switching. Is it ok to execute on vdbsrv1(192.168.1.6:3306)? (YES/no): yes
Tue Apr 21 11:50:41 2015 - [info] Executing FLUSH NO_WRITE_TO_BINLOG TABLES. This may take long time..
Tue Apr 21 11:50:41 2015 - [info]  ok.
Tue Apr 21 11:50:41 2015 - [info] Checking MHA is not monitoring or doing failover..
Tue Apr 21 11:50:41 2015 - [info] Checking replication health on vdbsrv2..
Tue Apr 21 11:50:41 2015 - [info]  ok.
Tue Apr 21 11:50:41 2015 - [info] Checking replication health on vdbsrv3..
Tue Apr 21 11:50:41 2015 - [info]  ok.
Tue Apr 21 11:50:41 2015 - [error][/usr/lib/perl5/site_perl/5.8.8/MHA/MasterRotate.pm, ln228] 192.168.1.8 is not alive!
Tue Apr 21 11:50:41 2015 - [error][/usr/lib/perl5/site_perl/5.8.8/MHA/MasterRotate.pm, ln613] Failed to get new master!
Tue Apr 21 11:50:41 2015 - [error][/usr/lib/perl5/site_perl/5.8.8/MHA/MasterRotate.pm, ln652] Got ERROR:  at /usr/bin/masterha_master_switch line 53

 

4、解决方案
      直接将IP地址替换为主机名后问题解决,不再演示。

       按官方文档描述,参数--dead_master_host=(hostname),而不是可以用IP地址。

       If these parameters are not set, --dead_master_ip will be the result of gethostbyname(dead_master_host), and --dead_master_port will be 3306.

       补充: 如果配置文件里hostname=IP地址,则在切换的时候使用IP地址也是可行的。 @20150522


目录
相关文章
|
9月前
|
SQL 监控 关系型数据库
Mysql主从同步报错解决:Error executing row event: Table zabbix.history-..
Mysql主从同步报错解决:Error executing row event: Table zabbix.history-..
133 0
|
11月前
|
机器学习/深度学习 Kubernetes 关系型数据库
gitlab--job 作业运行控制 tag、when、allow_failure、retry、timeout、parallel
gitlab--job 作业运行控制 tag、when、allow_failure、retry、timeout、parallel
|
canal 关系型数据库 MySQL
Canal需要执行 `show master status` 命令来获取主库的binlog文件名和位置
Canal需要执行 `show master status` 命令来获取主库的binlog文件名和位置
317 1
|
MySQL 关系型数据库
master/slave 相同server_id引起的同步失败
昨天在做MySQL SwitchOver遇到一个诡异的想象,切换前后的结构图如下: 当我把一切都切换好之后,应其他需求,重启了04上的mysql,然后show slave status\G发现报错: Last_IO_Error: Fatal error: The slav...
943 0
|
SQL 关系型数据库
slave复制中断 ,别滥用SQL_SLAVE_SKIP_COUNTER
slave复制中断 ,别滥用SQL_SLAVE_SKIP_COUNTER 来源:http://blog.chinaunix.net/uid-26364035-id-3588217.html 【问题背景】  1、从库的复制出现中断,如主键冲突;对应的表或者库不存在;基于row复制时,操作的行不存在; 常常大家会通过使用set global SQL_SLAVE_SKIP_COUNTER=n 来跳过导致复制错误的SQL.  2、 使用sql_slave_skip_counter跳过,每一次跳过为一个Binlog event group, 也就相当于一个事务。
1774 0