将节点1的keepalived停掉
[root@LB01 ~]# systemctl stop keepalived
节点2接管VIP
1. [root@LB02 ~]# ip add | grep 10.0.0.3 2. inet 10.0.0.3/32 scope global eth0
再次查看mac地址,此时物理地址与LB02上10.0.0.3MAC地址一致
Keepalived故障脑裂
由于某些原因,导致两台keepalived服务器在指定的时间内,无法检测到对方的心跳,但是两台服务器都可以正常使用。
一、常见故障原因
1、服务器网线松动等网络故障
2、服务器硬件故障发生损坏现象而崩溃
3、主备都开启了firewalld防火墙
二、脑裂故障测试
1、将主备主机的防火墙都打开
1. [root@LB01 ~]# systemctl start firewalld 2. [root@LB02 ~]# systemctl start firewalld
2、将刚刚的配置文件改回去
1. [root@LB01 ~]# vim /etc/keepalived/keepalived.conf 2. global_defs { 3. router_id LB01 4. } 5. 6. vrrp_instance VI_1 { 7. state MASTER 8. #nopreempt 9. interface eth0 10. virtual_router_id 50 11. priority 150 12. advert_int 1 13. authentication { 14. auth_type PASS 15. auth_pass 1111 16. } 17. virtual_ipaddress { 18. 10.0.0.3 19. } 20. } 21. [root@LB01 ~]# systemctl restart keepalived 22. 23. [root@LB02 ~]# cat /etc/keepalived/keepalived.conf 24. global_defs { 25. router_id LB02 26. } 27. 28. vrrp_instance VI_1 { 29. state BACKUP 30. #nopreempt 31. interface eth0 32. virtual_router_id 50 33. priority 100 34. advert_int 1 35. authentication { 36. auth_type PASS 37. auth_pass 1111 38. } 39. virtual_ipaddress { 40. 10.0.0.3 41. } 42. } 43. [root@LB02 ~]# systemctl restart keepalived 44.
3、通过抓包查看信息
4、查看LB01和LB02中的IP,发现都有10.0.0.3
1. [root@LB01 ~]# ip add | grep 10.0.0.3 2. inet 10.0.0.3/32 scope global eth0 3. 4. [root@LB02 ~]# ip add | grep 10.0.0.3 5. inet 10.0.0.3/32 scope global eth0
三、脑裂故障解决方案
解决思路:发生了脑裂,我们随便kill掉一台即可,可以通过编写脚本的方式,我们认为两边的ip add都有10.0.0.3,则发生了脑裂。我们在LB01上写脚本。
做免密钥方便获取LB02的ip信息:
1. [root@LB01 ~]# ssh-keygen 2. Generating public/private rsa key pair. 3. Enter file in which to save the key (/root/.ssh/id_rsa): 4. Enter passphrase (empty for no passphrase): 5. Enter same passphrase again: 6. Your identification has been saved in /root/.ssh/id_rsa. 7. Your public key has been saved in /root/.ssh/id_rsa.pub. 8. The key fingerprint is: 9. SHA256:+NyOCiY7aBX8nEPwGeNQHjTLY2EXPKU1o33LTBrm1zk root@LB01 10. The key's randomart image is: 11. +---[RSA 2048]----+ 12. | oB.oo= | 13. | o+o*o= o | 14. | . =*+o.+ o | 15. | o.=..o B o . | 16. | = o So = E | 17. | . = o .. . | 18. | .o o . o . | 19. |...+ . o | 20. |. .. ... . | 21. +----[SHA256]-----+ 22. [root@LB01 ~]# 23. [root@LB01 ~]# ssh-copy-id -i .ssh/id_rsa 10.0.0.6 24. /usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: ".ssh/id_rsa.pub" 25. /usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed 26. /usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys 27. root@10.0.0.6's password: 28. 29. Number of key(s) added: 1 30. 31. Now try logging into the machine, with: "ssh '10.0.0.6'" 32. and check to make sure that only the key(s) you wanted were added. 33. 34. [root@LB01 ~]# ssh '10.0.0.6' ip add | grep 10.0.0.3 | wc -l #免密钥测试 35. 1
脚本编写并执行:
1. [root@LB01 ~]# cat check_split_brain.sh 2. LB01_VIP_Number=`ip add | grep 10.0.0.3 | wc -l` 3. LB02_VIP_Number=`ssh '10.0.0.6' ip add | grep 10.0.0.3 | wc -l` 4. if [ $LB01_VIP_Number -eq 1 -a $LB02_VIP_Number -eq 1 ] 5. then 6. systemctl stop keepalived 7. fi 8. [root@LB01 ~]# sh check_split_brain.sh
1. [root@LB01 ~]# ip add | grep 10.0.0.3 2. 3. [root@LB02 ~]# ip add | grep 10.0.0.3 4. inet 10.0.0.3/32 scope global eth0
Keepalived与Nginx
Nginx默认监听在所有的IP地址上,VIP飘到一台节点上,相当于Nginx多了VIP这个网卡,所以可以访问到Nginx所在的机器,但是如果Nginx宕机,会导致用户请求失败,但是keepalived没有挂掉不会进行切换,就需要编写脚本检测Nginx存活状态,如果不存活则kill掉keepalived,让VIP自动飘到备用服务器。
一、脚本编写并增加权限
1. [root@LB01 ~]# cat check_nginx.sh 2. nginxpid=`ps -C nginx --no-header|wc -l` 3. if [ $nginxpid -eq 0 ] 4. then 5. systemctl restart nginx &>/etc/null 6. if [ $? -ne 0 ] 7. then 8. systemctl stop keepalived 9. fi 10. fi 11. 12. [root@LB01 ~]# chmod +x check_nginx.sh 13. [root@LB01 ~]# ll check_nginx.sh 14. -rwxr-xr-x 1 root root 150 Apr 12 17:37 check_nginx.sh
二、脚本测试
1. [root@LB02 ~]# ip add|grep 10.0.0.3 #当前VIP不在LB02 2. 3. [root@LB01 ~]# ip add|grep 10.0.0.3 #当前VIP在LB01上 4. inet 10.0.0.3/32 scope global eth0 5. [root@LB01 ~]# systemctl stop nginx #关闭Nginx 6. [root@LB01 ~]# ip add|grep 10.0.0.3 #VIP依旧在LB0上,因为Nginx对keepalived没有影响 7. inet 10.0.0.3/32 scope global eth0 8. [root@LB01 ~]# vim /etc/nginx/nginx.conf #修改Nginx配置文件,让其无法重启,查看是否会飘到LB02上 9. 10. ser nginx; 11. 12. [root@LB01 ~]# sh check_nginx.sh #执行脚本 13. [root@LB01 ~]# ip add|grep 10.0.0.3 #发现VIP已经不在LB02了 14. 15. [root@LB02 ~]# ip add | grep 10.0.0.3 #VIP飘移到LB02上了 16. inet 10.0.0.3/32 scope global eth0
三、在配置文件内中调用此脚本
1. [root@LB01 ~]# cat /etc/keepalived/keepalived.conf 2. global_defs { 3. router_id LB01 4. } 5. 6. #每5秒执行一次脚本,脚本执行内容不能超过5秒,否则会中断再次重新执行脚本 7. vrrp_script check_nginx { 8. script "/root/check_nginx.sh" 9. interval 5 10. } 11. 12. vrrp_instance VI_1 { 13. state MASTER 14. #nopreempt 15. interface eth0 16. virtual_router_id 50 17. priority 150 18. advert_int 1 19. authentication { 20. auth_type PASS 21. auth_pass 1111 22. } 23. virtual_ipaddress { 24. 10.0.0.3 25. } 26. #调用并运行脚本 27. track_script { 28. check_nginx 29. } 30. } 31. 32. 注意:在Master的keepalived中调用脚本,抢占式,仅需在Master配置即可。如果配置为非抢占式,那么需要两台服务器都使用该脚本。
我是koten,10年运维经验,持续分享运维干货,感谢大家的阅读和关注!