验证VIP访问数据库
当我们的MMM客户段服务在四个MySQL数据库实例中启动,并且MMM监控服务在monitor节点启动之后。我们在monitor节点上面尝试使用各个VIP看能否正常连接到MySQL数据库。验证的结果如下:
root@monitor:/etc/mysql-mmm# root@monitor:/etc/mysql-mmm# mmm_control show master1(172.20.0.11) master/ONLINE. Roles: reader(172.20.0.111), writer(172.20.0.100) master2(172.20.0.21) master/ONLINE. Roles: reader(172.20.0.122) slave1(172.20.0.12) slave/ONLINE. Roles: reader(172.20.0.211) slave2(172.20.0.22) slave/ONLINE. Roles: reader(172.20.0.222) root@monitor:/etc/mysql-mmm# mysql -uroot -proot -h172.20.0.100 -e 'select @@hostname' mysql: \[Warning\] Using a password on the command line interface can be insecure. +---------------+ | @@hostname | +---------------+ | master1.mysql | +---------------+ root@monitor:/etc/mysql-mmm# mysql -uroot -proot -h172.20.0.111 -e 'select @@hostname' mysql: \[Warning\] Using a password on the command line interface can be insecure. +---------------+ | @@hostname | +---------------+ | master1.mysql | +---------------+ root@monitor:/etc/mysql-mmm# mysql -uroot -proot -h172.20.0.122 -e 'select @@hostname' mysql: \[Warning\] Using a password on the command line interface can be insecure. +---------------+ | @@hostname | +---------------+ | master2.mysql | +---------------+ root@monitor:/etc/mysql-mmm# mysql -uroot -proot -h172.20.0.211 -e 'select @@hostname' mysql: \[Warning\] Using a password on the command line interface can be insecure. +--------------+ | @@hostname | +--------------+ | slave1.mysql | +--------------+ root@monitor:/etc/mysql-mmm# mysql -uroot -proot -h172.20.0.222 -e 'select @@hostname' mysql: \[Warning\] Using a password on the command line interface can be insecure. +--------------+ | @@hostname | +--------------+ | slave2.mysql | +--------------+ root@monitor:/etc/mysql-mmm#
验证MMM的高可用
我们把master1节点上面的MySQL服务停止掉,来验证一下是否会把主从同步的源头从master1切换为master2上面,并且我们在monitor节点上,仍然可以通过writer vip 去连接到MySQL上。
停止master1节点上面的MySQL服务,在master1节点上,执行如下命令:
root@master1:~/Net-ARP-1.0.11# etc/init.d/mysql stop ............ \[info\] MySQL Community Server 5.7.31 is stopped. root@master1:~/Net-ARP-1.0.11# % ➜ ~
接着,我们在monitor节点上,查看MMM监控各个节点服务的状态,发现master1节点以及下线了,并且我们的writer ip,已经从原先的master1节点迁移到master2节点上了。如下所示:
root@monitor:/etc/mysql-mmm# mmm_control show # Warning: agent on host master1 is not reachable master1(172.20.0.11) master/HARD_OFFLINE. Roles: master2(172.20.0.21) master/ONLINE. Roles: reader(172.20.0.122), writer(172.20.0.100) slave1(172.20.0.12) slave/ONLINE. Roles: reader(172.20.0.111), reader(172.20.0.211) slave2(172.20.0.22) slave/ONLINE. Roles: reader(172.20.0.222) root@monitor:/etc/mysql-mmm# mmm_control checks all slave2 ping \[last change: 2021/02/23 16:19:44\] OK slave2 mysql \[last change: 2021/02/23 16:19:44\] OK slave2 rep_threads \[last change: 2021/02/23 16:19:44\] OK slave2 rep_backlog \[last change: 2021/02/23 16:19:44\] OK: Backlog is null master1 ping \[last change: 2021/02/23 18:44:57\] ERROR: Could not ping 172.20.0.11 master1 mysql \[last change: 2021/02/23 18:44:40\] ERROR: Connect error (host = 172.20.0.11:3306, user = mmm_monitor)! Can't connect to MySQL server on '172.20.0.11' (115) master1 rep_threads \[last change: 2021/02/23 16:19:44\] OK master1 rep_backlog \[last change: 2021/02/23 16:19:44\] OK: Backlog is null slave1 ping \[last change: 2021/02/23 16:19:44\] OK slave1 mysql \[last change: 2021/02/23 16:19:44\] OK slave1 rep_threads \[last change: 2021/02/23 16:19:44\] OK slave1 rep_backlog \[last change: 2021/02/23 16:19:44\] OK: Backlog is null master2 ping \[last change: 2021/02/23 16:19:44\] OK master2 mysql \[last change: 2021/02/23 16:19:44\] OK master2 rep_threads \[last change: 2021/02/23 16:19:44\] OK master2 rep_backlog \[last change: 2021/02/23 16:19:44\] OK: Backlog is null root@monitor:/etc/mysql-mmm#
此时我们看到master2上面的IP地址信息如下,writer vip 172.20.0.100
已经迁移到了master2主机上了。
root@master2:~/Net-ARP-1.0.11# ip addr 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid\_lft forever preferred\_lft forever 2: tunl0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN group default qlen 1000 link/ipip 0.0.0.0 brd 0.0.0.0 3: ip6tnl0@NONE: <NOARP> mtu 1452 qdisc noop state DOWN group default qlen 1000 link/tunnel6 :: brd :: 12: eth0@if13: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default link/ether 02:42:ac:14:00:15 brd ff:ff:ff:ff:ff:ff link-netnsid 0 inet 172.20.0.21/24 brd 172.20.0.255 scope global eth0 valid\_lft forever preferred\_lft forever inet 172.20.0.122/32 scope global eth0 valid\_lft forever preferred\_lft forever inet 172.20.0.100/32 scope global eth0 valid\_lft forever preferred\_lft forever root@master2:~/Net-ARP-1.0.11#
在monitor节点上,查看MMM的监控服务器的日志信息如下:
root@monitor:/etc/mysql-mmm# tail -20 /var/log/mysql-mmm/mmm_mond.log 2021/02/23 16:19:49 INFO Check 'mysql' on 'slave2' is ok! 2021/02/23 16:19:49 INFO Check 'mysql' on 'master1' is ok! 2021/02/23 16:19:49 INFO Check 'mysql' on 'slave1' is ok! 2021/02/23 16:19:49 INFO Check 'mysql' on 'master2' is ok! 2021/02/23 18:44:30 WARN Check 'rep_threads' on 'master1' is in unknown state! Message: UNKNOWN: Connect error (host = 172.20.0.11:3306, user = mmm_monitor)! Can't connect to MySQL server on '172.20.0.11' (115) 2021/02/23 18:44:30 WARN Check 'rep_backlog' on 'master1' is in unknown state! Message: UNKNOWN: Connect error (host = 172.20.0.11:3306, user = mmm_monitor)! Can't connect to MySQL server on '172.20.0.11' (115) 2021/02/23 18:44:40 ERROR Check 'mysql' on 'master1' has failed for 10 seconds! Message: ERROR: Connect error (host = 172.20.0.11:3306, user = mmm_monitor)! Can't connect to MySQL server on '172.20.0.11' (115) 2021/02/23 18:44:41 FATAL State of host 'master1' changed from ONLINE to HARD_OFFLINE (ping: OK, mysql: not OK) 2021/02/23 18:44:41 INFO Removing all roles from host 'master1': 2021/02/23 18:44:41 INFO Removed role 'reader(172.20.0.111)' from host 'master1' 2021/02/23 18:44:41 INFO Removed role 'writer(172.20.0.100)' from host 'master1' 2021/02/23 18:44:41 FATAL Can't reach agent on host 'master1' 2021/02/23 18:44:41 ERROR Can't send offline status notification to 'master1' \- killing it! 2021/02/23 18:44:41 FATAL Could not kill host 'master1' \- there may be some duplicate ips now! (There's no binary configured for killing hosts.) 2021/02/23 18:44:41 INFO Orphaned role 'writer(172.20.0.100)' has been assigned to 'master2' 2021/02/23 18:44:41 INFO Orphaned role 'reader(172.20.0.111)' has been assigned to 'slave1' 2021/02/23 18:44:57 ERROR Check 'ping' on 'master1' has failed for 11 seconds! Message: ERROR: Could not ping 172.20.0.11 root@monitor:/etc/mysql-mmm#
我们在monitor节点上,再次尝试连接到writer vip 172.20.0.100
上面,发现已经连接到master2上面了,如下所示:
root@monitor:/etc/mysql-mmm# mysql -uroot -proot -h172.20.0.100 -e 'select @@hostname' mysql: \[Warning\] Using a password on the command line interface can be insecure. +---------------+ | @@hostname | +---------------+ | master2.mysql | +---------------+ root@monitor:/etc/mysql-mmm#
下面我们看一下slave1和slave2节点主从同步的配置是否,自动切换到master2节点上。分别登录到slave1和slave2节点上执行如下SQL命令:
select * from mysql.slave\_master\_info\\G
查看结果如下:
通过下面的截图,可以看出在新的主节点上面执行的插入数据,是可以正常的同步到两个slave从节点上。
下面我们把master1节点重新启动,让MySQL数据库服务恢复。
# 启动master1容器 docker start mysql-ha-mmm-master1 # 进入master1容器 docker exec -it mysql-ha-mmm-master1 /bin/bash # 启动master1上面的MMM的agent服务 /etc/init.d/mysql-mmm-agent start
进入monitor节点上,查看各个节点的状态:
# 在master1启动后,在monitor节点查看各个节点状态的时候,发现已经不再有"Warning: agent on host master1 is not reachable" 的警告了。 root@monitor:/etc/mysql-mmm# mmm_control show master1(172.20.0.11) master/AWAITING_RECOVERY. Roles: master2(172.20.0.21) master/ONLINE. Roles: reader(172.20.0.122), writer(172.20.0.100) slave1(172.20.0.12) slave/ONLINE. Roles: reader(172.20.0.111), reader(172.20.0.211) slave2(172.20.0.22) slave/ONLINE. Roles: reader(172.20.0.222) # 设置master1节点上线 root@monitor:/etc/mysql-mmm# mmm\_control set\_online master1 OK: State of 'master1' changed to ONLINE. Now you can wait some time and check its new roles! # 再次查看各个节点的状态,发现master1已经上线了。但是它此时成立备用的主节点了,并且给他分配了一个reader VIP地址。 root@monitor:/etc/mysql-mmm# mmm_control show master1(172.20.0.11) master/ONLINE. Roles: reader(172.20.0.211) master2(172.20.0.21) master/ONLINE. Roles: reader(172.20.0.122), writer(172.20.0.100) slave1(172.20.0.12) slave/ONLINE. Roles: reader(172.20.0.111) slave2(172.20.0.22) slave/ONLINE. Roles: reader(172.20.0.222) root@monitor:/etc/mysql-mmm#
「MMM高可用验证结果小结:」
- master1节点宕机之后,master2节点会接管master1的所有任务。
- writer VIP会自动从master1上面转移到master2上面。
- 两个从节点slave1和slave2会自动的把主从同步的链路从master1切换到master2上面。
- 如果在master1上除了提供写的服务,还提供可读的服务,也就是为其分配了reader VIP,那么当master1宕机后,这个reader VIP不会丢失,会被迁移到其他任意一个节点上。
- 在master1重启后,在monitor节点通过命令
mmm_control set_online master1
设置master1节点上线后,master1节点不会接管现有master2的角色,而是作为一个备用主节点集群中提供服务。此时,只有master2节点宕机后,master1节点才会重新成为主节点的角色。 - 随着master1和master2节点的宕机和恢复,提供写的VIP会在两个节点上来回切换。
对于这个5个服务器组成的高可用集群,如果我们担心MMM的monitor节点是单节点,不是高可用的monitor服务,我们可以在选择一台服务器,在上面部署另外一个monitor服务,使用keepalive组件,保证两个monitor服务至少有一个可用。这样就更加完美了。
总结
上面就是关于MySQL高可用架构-MMM架构的搭建过程。期间涉及到了MySQL主从同步集群的搭建,我们搭建了一个双主双从的集群,然后基于这个集群,又在每一个节点上面编译安装了MMM插件,在每一个节点上启动了MMM的agent服务。还选择了一台服务器,安装MMM监控服务。
希望对你有所帮助,后续会分享其他高可用架构搭建的方式。