红帽集群套件RHCS四部曲(测试篇)

本文涉及的产品
云数据库 RDS MySQL Serverless,0.5-2RCU 50GB
简介:

  集群配置完成后,如何知道集群已经配置成功了呢,下面我们就分情况测试RHCS提供的高可用集群和存储集群功能。

一、 高可用集群测试
 在前面的文章中,我们配置了四个节点的集群系统,每个节点的主机名分别是web1、web2、Mysql1、Mysql2,四个节点之间的关系是:web1和web2组成web集群,运行webserver服务,其中web1是主节点,正常状态下服务运行在此节点,web2是备用节点;Mysql1和Mysql2组成Mysql集群,运行mysqlserver服务,其中,Mysql1是主节点,正常状态下服务运行在此节点,Mysql2为备用节点。
下面分四种情况介绍当节点发生宕机时,集群是如何进行切换和工作的。

1 节点web2宕机时
宕机分为正常关机和异常宕机两种情况,下面分别说明。

(1) 节点web2正常关机
在节点web2上执行正常关机命令:
[root@web2 ~]#init 0
然后在web1节点查看/var/log/messages日志,输出信息如下:
Aug 24 00:57:09 web1 clurgmgrd[3321]: <notice> Member 1 shutting down 
Aug 24 00:57:17 web1 qdiskd[2778]: <info> Node 1 shutdown 
Aug 24 00:57:29 web1 openais[2755]: [TOTEM] The token was lost in the OPERATIONAL state. 
Aug 24 00:57:29 web1 openais[2755]: [TOTEM] Receive multicast socket recv buffer size (320000 bytes). 
Aug 24 00:57:29 web1 openais[2755]: [TOTEM] Transmit multicast socket send buffer size (219136 bytes). 
Aug 24 00:57:29 web1 openais[2755]: [TOTEM] entering GATHER state from 2. 
Aug 24 00:57:49 web1 openais[2755]: [TOTEM] entering GATHER state from 0. 
Aug 24 00:57:49 web1 openais[2755]: [TOTEM] Creating commit token because I am the rep. 
Aug 24 00:57:49 web1 openais[2755]: [TOTEM] Saving state aru 73 high seq received 73 
Aug 24 00:57:49 web1 openais[2755]: [TOTEM] Storing new sequence id for ring bc8 
Aug 24 00:57:49 web1 openais[2755]: [TOTEM] entering COMMIT state. 
Aug 24 00:57:49 web1 openais[2755]: [TOTEM] entering RECOVERY state. 
Aug 24 00:57:49 web1 openais[2755]: [TOTEM] position [0] member 192.168.12.230: 
Aug 24 00:57:49 web1 openais[2755]: [TOTEM] previous ring seq 3012 rep 192.168.12.230 
Aug 24 00:57:49 web1 openais[2755]: [TOTEM] aru 73 high delivered 73 received flag 1 
Aug 24 00:57:49 web1 openais[2755]: [TOTEM] position [1] member 192.168.12.231: 
Aug 24 00:57:49 web1 openais[2755]: [TOTEM] previous ring seq 3012 rep 192.168.12.230 
Aug 24 00:57:49 web1 openais[2755]: [TOTEM] aru 73 high delivered 73 received flag 1 
Aug 24 00:57:49 web1 openais[2755]: [TOTEM] position [2] member 192.168.12.232: 
Aug 24 00:57:49 web1 openais[2755]: [TOTEM] previous ring seq 3012 rep 192.168.12.230 
Aug 24 00:57:49 web1 openais[2755]: [TOTEM] aru 73 high delivered 73 received flag 1 
Aug 24 00:57:49 web1 openais[2755]: [TOTEM] Did not need to originate any messages in recovery. 
Aug 24 00:57:49 web1 openais[2755]: [TOTEM] Sending initial ORF token 
Aug 24 00:57:49 web1 openais[2755]: [CLM  ] CLM CONFIGURATION CHANGE 
Aug 24 00:57:49 web1 openais[2755]: [CLM  ] New Configuration: 
Aug 24 00:57:49 web1 openais[2755]: [CLM  ]     r(0) ip(192.168.12.230)  
Aug 24 00:57:49 web1 openais[2755]: [CLM  ]     r(0) ip(192.168.12.231)  
Aug 24 00:57:49 web1 openais[2755]: [CLM  ]     r(0) ip(192.168.12.232)  
Aug 24 00:57:49 web1 openais[2755]: [CLM  ] Members Left: 
Aug 24 00:57:49 web1 openais[2755]: [CLM  ]     r(0) ip(192.168.12.240)  
Aug 24 00:57:49 web1 kernel: dlm: closing connection to node 1

Aug 24 00:57:49 web1 openais[2755]: [CLM  ] Members Joined: 
Aug 24 00:57:49 web1 openais[2755]: [CLM  ] CLM CONFIGURATION CHANGE 
Aug 24 00:57:49 web1 openais[2755]: [CLM  ] New Configuration: 
Aug 24 00:57:49 web1 openais[2755]: [CLM  ]     r(0) ip(192.168.12.230)  
Aug 24 00:57:49 web1 openais[2755]: [CLM  ]     r(0) ip(192.168.12.231)  
Aug 24 00:57:49 web1 openais[2755]: [CLM  ]     r(0) ip(192.168.12.232)  
Aug 24 00:57:49 web1 openais[2755]: [CLM  ] Members Left: 
Aug 24 00:57:49 web1 openais[2755]: [CLM  ] Members Joined: 
Aug 24 00:57:49 web1 openais[2755]: [SYNC ] This node is within the primary component and will provide service. 
Aug 24 00:57:49 web1 openais[2755]: [TOTEM] entering OPERATIONAL state. 
Aug 24 00:57:49 web1 openais[2755]: [CLM  ] got nodejoin message 192.168.12.230 
Aug 24 00:57:49 web1 openais[2755]: [CLM  ] got nodejoin message 192.168.12.231 
Aug 24 00:57:49 web1 openais[2755]: [CLM  ] got nodejoin message 192.168.12.232 
Aug 24 00:57:49 web1 openais[2755]: [CPG  ] got joinlist message from node 3 
Aug 24 00:57:49 web1 openais[2755]: [CPG  ] got joinlist message from node 4 
Aug 24 00:57:49 web1 openais[2755]: [CPG  ] got joinlist message from node 2 
从输出日志可以看出,当web2节点正常关机后,qdiskd进程立刻检测到web2节点已经关闭,然后dlm锁进程正常关闭了从web2的连接,由于是正常关闭节点web2,所以RHCS认为整个集群系统没有发生异常,仅仅把节点web2从集群中隔离而已。请重点查看日志中斜体部分。
重新启动web2节点,然后在web1上继续观察日志信息:
Aug 24 01:10:50 web1 openais[2755]: [TOTEM] entering GATHER state from 11. 
Aug 24 01:10:51 web1 openais[2755]: [TOTEM] Creating commit token because I am the rep. 
Aug 24 01:10:51 web1 openais[2755]: [TOTEM] Saving state aru 2b high seq received 2b 
Aug 24 01:10:51 web1 openais[2755]: [TOTEM] Storing new sequence id for ring bcc 
Aug 24 01:10:51 web1 openais[2755]: [TOTEM] entering COMMIT state. 
Aug 24 01:10:51 web1 openais[2755]: [TOTEM] entering RECOVERY state. 
Aug 24 01:10:51 web1 openais[2755]: [TOTEM] position [0] member 192.168.12.230: 
Aug 24 01:10:51 web1 openais[2755]: [TOTEM] previous ring seq 3016 rep 192.168.12.230 
Aug 24 01:10:51 web1 openais[2755]: [TOTEM] aru 2b high delivered 2b received flag 1 
Aug 24 01:10:51 web1 openais[2755]: [TOTEM] position [1] member 192.168.12.231: 
Aug 24 01:10:51 web1 openais[2755]: [TOTEM] previous ring seq 3016 rep 192.168.12.230 
Aug 24 01:10:51 web1 openais[2755]: [TOTEM] aru 2b high delivered 2b received flag 1 
Aug 24 01:10:51 web1 openais[2755]: [TOTEM] position [2] member 192.168.12.232: 
Aug 24 01:10:51 web1 openais[2755]: [TOTEM] previous ring seq 3016 rep 192.168.12.230 
Aug 24 01:10:51 web1 openais[2755]: [TOTEM] aru 2b high delivered 2b received flag 1 
Aug 24 01:10:51 web1 openais[2755]: [TOTEM] position [3] member 192.168.12.240: 
Aug 24 01:10:51 web1 openais[2755]: [TOTEM] previous ring seq 3016 rep 192.168.12.240 
Aug 24 01:10:51 web1 openais[2755]: [TOTEM] aru c high delivered c received flag 1 
Aug 24 01:10:51 web1 openais[2755]: [TOTEM] Did not need to originate any messages in recovery. 
Aug 24 01:10:51 web1 openais[2755]: [TOTEM] Sending initial ORF token 
Aug 24 01:10:51 web1 openais[2755]: [CLM  ] CLM CONFIGURATION CHANGE 
Aug 24 01:10:51 web1 openais[2755]: [CLM  ] New Configuration: 
Aug 24 01:10:51 web1 openais[2755]: [CLM  ]     r(0) ip(192.168.12.230)  
Aug 24 01:10:51 web1 openais[2755]: [CLM  ]     r(0) ip(192.168.12.231)  
Aug 24 01:10:51 web1 openais[2755]: [CLM  ]     r(0) ip(192.168.12.232)  
Aug 24 01:10:51 web1 openais[2755]: [CLM  ] Members Left: 
Aug 24 01:10:51 web1 openais[2755]: [CLM  ] Members Joined: 
Aug 24 01:10:51 web1 openais[2755]: [CLM  ] CLM CONFIGURATION CHANGE 
Aug 24 01:10:51 web1 openais[2755]: [CLM  ] New Configuration: 
Aug 24 01:10:51 web1 openais[2755]: [CLM  ]     r(0) ip(192.168.12.230)  
Aug 24 01:10:51 web1 openais[2755]: [CLM  ]     r(0) ip(192.168.12.231)  
Aug 24 01:10:51 web1 openais[2755]: [CLM  ]     r(0) ip(192.168.12.232)  
Aug 24 01:10:51 web1 openais[2755]: [CLM  ]     r(0) ip(192.168.12.240)  
Aug 24 01:10:51 web1 openais[2755]: [CLM  ] Members Left: 
Aug 24 01:10:51 web1 openais[2755]: [CLM  ] Members Joined: 
Aug 24 01:10:51 web1 openais[2755]: [CLM  ]     r(0) ip(192.168.12.240)  
Aug 24 01:10:51 web1 openais[2755]: [SYNC ] This node is within the primary component and will provide service. 
Aug 24 01:10:51 web1 openais[2755]: [TOTEM] entering OPERATIONAL state. 
Aug 24 01:10:51 web1 openais[2755]: [CLM  ] got nodejoin message 192.168.12.230 
Aug 24 01:10:51 web1 openais[2755]: [CLM  ] got nodejoin message 192.168.12.231 
Aug 24 01:10:51 web1 openais[2755]: [CLM  ] got nodejoin message 192.168.12.232 
Aug 24 01:10:51 web1 openais[2755]: [CLM  ] got nodejoin message 192.168.12.240 
Aug 24 01:10:51 web1 openais[2755]: [CPG  ] got joinlist message from node 3 
Aug 24 01:10:51 web1 openais[2755]: [CPG  ] got joinlist message from node 4 
Aug 24 01:10:51 web1 openais[2755]: [CPG  ] got joinlist message from node 2 
Aug 24 01:10:55 web1 kernel: dlm: connecting to 1
从输出可知,重新启动节点web2后,openais底层通信进程检测到web2节点已经激活,并且将web2重新加入集群中,请重点查看日志中斜体部分。

(2) 节点web2异常宕机
在节点web2上执行如下命令,让内核崩溃:
[root@web2 ~]#echo c>/proc/sysrq-trigger
然后在节点Mysql1上查看/var/log/messages日志,信息如下:
Aug 24 02:26:16 Mysql1 openais[2649]: [TOTEM] entering GATHER state from 12. 
Aug 24 02:26:28 Mysql1 qdiskd[2672]: <notice> Node 1 evicted 
Aug 24 02:26:36 Mysql1 openais[2649]: [TOTEM] entering GATHER state from 11. 
Aug 24 02:26:36 Mysql1 openais[2649]: [TOTEM] Saving state aru 78 high seq received 78 
Aug 24 02:26:36 Mysql1 openais[2649]: [TOTEM] Storing new sequence id for ring bd0 
Aug 24 02:26:36 Mysql1 openais[2649]: [TOTEM] entering COMMIT state. 
Aug 24 02:26:36 Mysql1 openais[2649]: [TOTEM] entering RECOVERY state. 
Aug 24 02:26:36 Mysql1 openais[2649]: [TOTEM] position [0] member 192.168.12.230: 
Aug 24 02:26:36 Mysql1 openais[2649]: [TOTEM] previous ring seq 3020 rep 192.168.12.230 
Aug 24 02:26:36 Mysql1 openais[2649]: [TOTEM] aru 78 high delivered 78 received flag 1 
Aug 24 02:26:36 Mysql1 openais[2649]: [TOTEM] position [1] member 192.168.12.231: 
Aug 24 02:26:36 Mysql1 openais[2649]: [TOTEM] previous ring seq 3020 rep 192.168.12.230 
Aug 24 02:26:36 Mysql1 openais[2649]: [TOTEM] aru 78 high delivered 78 received flag 1 
Aug 24 02:26:36 Mysql1 openais[2649]: [TOTEM] position [2] member 192.168.12.232: 
Aug 24 02:26:36 Mysql1 openais[2649]: [TOTEM] previous ring seq 3020 rep 192.168.12.230 
Aug 24 02:26:36 Mysql1 openais[2649]: [TOTEM] aru 78 high delivered 78 received flag 1 
Aug 24 02:26:36 Mysql1 openais[2649]: [TOTEM] Did not need to originate any messages in recovery. 
Aug 24 02:26:36 Mysql1 openais[2649]: [CLM  ] CLM CONFIGURATION CHANGE 
Aug 24 02:26:36 Mysql1 openais[2649]: [CLM  ] New Configuration: 
Aug 24 02:26:36 Mysql1 kernel: dlm: closing connection to node 1
Aug 24 02:26:36 Mysql1 openais[2649]: [CLM  ]   r(0) ip(192.168.12.230)  
Aug 24 02:26:36 Mysql1 openais[2649]: [CLM  ]   r(0) ip(192.168.12.231)  
Aug 24 02:26:36 Mysql1 openais[2649]: [CLM  ]   r(0) ip(192.168.12.232)  
Aug 24 02:26:36 Mysql1 openais[2649]: [CLM  ] Members Left: 
Aug 24 02:26:36 Mysql1 openais[2649]: [CLM  ]   r(0) ip(192.168.12.240)  
Aug 24 02:26:36 Mysql1 openais[2649]: [CLM  ] Members Joined: 
Aug 24 02:26:36 Mysql1 openais[2649]: [CLM  ] CLM CONFIGURATION CHANGE 
Aug 24 02:26:36 Mysql1 openais[2649]: [CLM  ] New Configuration: 
Aug 24 02:26:36 Mysql1 openais[2649]: [CLM  ]   r(0) ip(192.168.12.230)  
Aug 24 02:26:36 Mysql1 openais[2649]: [CLM  ]   r(0) ip(192.168.12.231)  
Aug 24 02:26:36 Mysql1 openais[2649]: [CLM  ]   r(0) ip(192.168.12.232)  
Aug 24 02:26:36 Mysql1 openais[2649]: [CLM  ] Members Left: 
Aug 24 02:26:36 Mysql1 openais[2649]: [CLM  ] Members Joined: 
Aug 24 02:26:36 Mysql1 openais[2649]: [SYNC ] This node is within the primary component and will provide service. 
Aug 24 02:26:36 Mysql1 fenced[2688]: web2 not a cluster member after 0 sec post_fail_delay
Aug 24 02:26:36 Mysql1 openais[2649]: [TOTEM] entering OPERATIONAL state. 
Aug 24 02:26:36 Mysql1 fenced[2688]: fencing node "web2"
Aug 24 02:26:36 Mysql1 openais[2649]: [CLM  ] got nodejoin message 192.168.12.230 
Aug 24 02:26:36 Mysql1 openais[2649]: [CLM  ] got nodejoin message 192.168.12.231 
Aug 24 02:26:36 Mysql1 openais[2649]: [CLM  ] got nodejoin message 192.168.12.232 
Aug 24 02:26:36 Mysql1 openais[2649]: [CPG  ] got joinlist message from node 3 
Aug 24 02:26:36 Mysql1 openais[2649]: [CPG  ] got joinlist message from node 4 
Aug 24 02:26:36 Mysql1 openais[2649]: [CPG  ] got joinlist message from node 2
Aug 24 02:26:45 Mysql1 fenced[2688]: fence "web2" success
Aug 24 02:26:45 Mysql1 kernel: GFS2: fsid=mycluster:my-gfs2.2: jid=3: Trying to acquire journal lock...
Aug 24 02:26:45 Mysql1 kernel: GFS2: fsid=mycluster:my-gfs2.2: jid=3: Looking at journal...
Aug 24 02:26:45 Mysql1 kernel: GFS2: fsid=mycluster:my-gfs2.2: jid=3: Done

 从输出信息可知,qdiskd首先检测到web2出现异常,然后将它从集群中隔离,由于是异常宕机,所以RHCS为了保证集群资源的唯一性,必须重置web2节点,于是fenced进程启动, 在fenced进程没有返回成功信息之前,所有节点挂载的GFS2共享分区将无法使用,处于hung住的状态。直到fence成功。
此时,在节点web1上查看web2的集群状态:
[root@web1 ~]#clustat  -m web2
 Member Name ID Status
 ------ ----   ---- ------
 web2           1 Offline
 由输出可知,web2已经处于offline状态了。

2 节点Mysql2宕机时
节点Mysql2在正常关机和异常宕机时,RHCS的切换状态与上面讲述的web2节点情况一模一样,这么不在重复讲述。

3 节点web1宕机时
(1) 节点web1正常关机
[root@web1 ~]# init 0
然后在节点web2上查看/var/log/messages日志,信息如下:
Aug 24 02:06:13 web2 last message repeated 3 times
Aug 24 02:14:58 web2 clurgmgrd[3239]: <notice> Member 4 shutting down 
Aug 24 02:15:03 web2 clurgmgrd[3239]: <notice> Starting stopped service service:webserver 
Aug 24 02:15:05 web2 avahi-daemon[3110]: Registering new address record for 192.168.12.233 on eth0.
Aug 24 02:15:06 web2 in.rdiscd[4451]: setsockopt (IP_ADD_MEMBERSHIP): Address already in use
Aug 24 02:15:06 web2 in.rdiscd[4451]: Failed joining addresses 
Aug 24 02:15:07 web2 clurgmgrd[3239]: <notice> Service service:webserver started 
Aug 24 02:15:08 web2 qdiskd[2712]: <info> Node 4 shutdown 

Aug 24 02:15:21 web2 openais[2689]: [TOTEM] entering GATHER state from 12. 
Aug 24 02:15:41 web2 openais[2689]: [TOTEM] entering GATHER state from 11. 
Aug 24 02:15:41 web2 openais[2689]: [TOTEM] Saving state aru b7 high seq received b7 
Aug 24 02:15:41 web2 openais[2689]: [TOTEM] Storing new sequence id for ring bd8 
Aug 24 02:15:41 web2 openais[2689]: [TOTEM] entering COMMIT state. 
Aug 24 02:15:41 web2 openais[2689]: [TOTEM] entering RECOVERY state. 
Aug 24 02:15:41 web2 openais[2689]: [TOTEM] position [0] member 192.168.12.231: 
Aug 24 02:15:41 web2 openais[2689]: [TOTEM] previous ring seq 3028 rep 192.168.12.230 
Aug 24 02:15:41 web2 openais[2689]: [TOTEM] aru b7 high delivered b7 received flag 1 
Aug 24 02:15:41 web2 openais[2689]: [TOTEM] position [1] member 192.168.12.232: 
Aug 24 02:15:41 web2 openais[2689]: [TOTEM] previous ring seq 3028 rep 192.168.12.230 
Aug 24 02:15:41 web2 openais[2689]: [TOTEM] aru b7 high delivered b7 received flag 1 
Aug 24 02:15:41 web2 openais[2689]: [TOTEM] position [2] member 192.168.12.240: 
Aug 24 02:15:41 web2 openais[2689]: [TOTEM] previous ring seq 3028 rep 192.168.12.230 
Aug 24 02:15:41 web2 openais[2689]: [TOTEM] aru b7 high delivered b7 received flag 1 
Aug 24 02:15:41 web2 openais[2689]: [TOTEM] Did not need to originate any messages in recovery. 
Aug 24 02:15:41 web2 openais[2689]: [CLM  ] CLM CONFIGURATION CHANGE 
Aug 24 02:15:41 web2 openais[2689]: [CLM  ] New Configuration: 
Aug 24 02:15:41 web2 openais[2689]: [CLM  ]     r(0) ip(192.168.12.231)  
Aug 24 02:15:41 web2 openais[2689]: [CLM  ]     r(0) ip(192.168.12.232)  
Aug 24 02:15:41 web2 openais[2689]: [CLM  ]     r(0) ip(192.168.12.240)  
Aug 24 02:15:41 web2 openais[2689]: [CLM  ] Members Left: 
Aug 24 02:15:41 web2 kernel: dlm: closing connection to node 4
Aug 24 02:15:41 web2 openais[2689]: [CLM  ]     r(0) ip(192.168.12.230)  

Aug 24 02:15:41 web2 openais[2689]: [CLM  ] Members Joined: 
Aug 24 02:15:41 web2 openais[2689]: [CLM  ] CLM CONFIGURATION CHANGE 
Aug 24 02:15:41 web2 openais[2689]: [CLM  ] New Configuration: 
Aug 24 02:15:41 web2 openais[2689]: [CLM  ]     r(0) ip(192.168.12.231)  
Aug 24 02:15:41 web2 openais[2689]: [CLM  ]     r(0) ip(192.168.12.232)  
Aug 24 02:15:41 web2 openais[2689]: [CLM  ]     r(0) ip(192.168.12.240)  
Aug 24 02:15:41 web2 openais[2689]: [CLM  ] Members Left: 
Aug 24 02:15:41 web2 openais[2689]: [CLM  ] Members Joined: 
Aug 24 02:15:41 web2 openais[2689]: [SYNC ] This node is within the primary component and will provide service. 
Aug 24 02:15:41 web2 openais[2689]: [TOTEM] entering OPERATIONAL state. 
Aug 24 02:15:41 web2 openais[2689]: [CLM  ] got nodejoin message 192.168.12.231 
Aug 24 02:15:41 web2 openais[2689]: [CLM  ] got nodejoin message 192.168.12.232 
Aug 24 02:15:41 web2 openais[2689]: [CLM  ] got nodejoin message 192.168.12.240 
Aug 24 02:15:41 web2 openais[2689]: [CPG  ] got joinlist message from node 2 
Aug 24 02:15:41 web2 openais[2689]: [CPG  ] got joinlist message from node 3 
Aug 24 02:15:41 web2 openais[2689]: [CPG  ] got joinlist message from node 1 
从输出日志可知,节点web1正常关机后,节点web1的服务和IP资源自动切换到了节点web2上,然后由qdiskd进程将节点web1从集群系统中隔离。由于web1节点是正常关闭,所以集群中GFS2共享文件系统可以正常读写,不受web1关闭的影响。
 此时,在web2查看节点web1的状态:
[root@web2 ~]# clustat  -m web1
 Member Name        ID    Status
 ------ ----              ----  ------
 web1                4   Offline
 从输出可知,web1节点已经处于offline状态了。
 接着,登录到节点web2,查看集群服务和IP资源是否正常切换,操作如下:
 [root@web2 ~]# clustat  -s webserver
 Service Name     Owner (Last)         State         
------- ----           ----- ------            -----         
service:webserver    web2             started 
[root@web2 ~]# ip addr show|grep eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast qlen 1000
    inet 192.168.12.240/24 brd 192.168.12.255 scope global eth0
    inet 192.168.12.233/24 scope global secondary eth0
从输出可知,集群服务和IP地址已经成功切换到web2节点。

最后,重新启动节点web1,然后在节点web2查看/var/log/messages日志,信息如下:
Aug 24 02:42:36 web2 openais[2689]: [TOTEM] entering GATHER state from 11. 
Aug 24 02:42:36 web2 openais[2689]: [TOTEM] Saving state aru 2b high seq received 2b 
Aug 24 02:42:36 web2 openais[2689]: [TOTEM] Storing new sequence id for ring bdc 
Aug 24 02:42:36 web2 openais[2689]: [TOTEM] entering COMMIT state. 
Aug 24 02:42:36 web2 openais[2689]: [TOTEM] entering RECOVERY state. 
Aug 24 02:42:36 web2 openais[2689]: [TOTEM] position [0] member 192.168.12.230: 
Aug 24 02:42:36 web2 openais[2689]: [TOTEM] previous ring seq 3028 rep 192.168.12.230 
Aug 24 02:42:36 web2 openais[2689]: [TOTEM] aru 0 high delivered 0 received flag 1 
Aug 24 02:42:36 web2 openais[2689]: [TOTEM] position [1] member 192.168.12.231: 
Aug 24 02:42:36 web2 openais[2689]: [TOTEM] previous ring seq 3032 rep 192.168.12.231 
Aug 24 02:42:36 web2 openais[2689]: [TOTEM] aru 2b high delivered 2b received flag 1 
Aug 24 02:42:36 web2 openais[2689]: [TOTEM] position [2] member 192.168.12.232: 
Aug 24 02:42:36 web2 openais[2689]: [TOTEM] previous ring seq 3032 rep 192.168.12.231 
Aug 24 02:42:36 web2 openais[2689]: [TOTEM] aru 2b high delivered 2b received flag 1 
Aug 24 02:42:36 web2 openais[2689]: [TOTEM] position [3] member 192.168.12.240: 
Aug 24 02:42:36 web2 openais[2689]: [TOTEM] previous ring seq 3032 rep 192.168.12.231 
Aug 24 02:42:36 web2 openais[2689]: [TOTEM] aru 2b high delivered 2b received flag 1 
Aug 24 02:42:36 web2 openais[2689]: [TOTEM] Did not need to originate any messages in recovery. 
Aug 24 02:42:36 web2 openais[2689]: [CLM  ] CLM CONFIGURATION CHANGE 
Aug 24 02:42:36 web2 openais[2689]: [CLM  ] New Configuration: 
Aug 24 02:42:36 web2 openais[2689]: [CLM  ]     r(0) ip(192.168.12.231)  
Aug 24 02:42:36 web2 openais[2689]: [CLM  ]     r(0) ip(192.168.12.232)  
Aug 24 02:42:36 web2 openais[2689]: [CLM  ]     r(0) ip(192.168.12.240)  
Aug 24 02:42:36 web2 openais[2689]: [CLM  ] Members Left: 
Aug 24 02:42:36 web2 openais[2689]: [CLM  ] Members Joined: 
Aug 24 02:42:36 web2 openais[2689]: [CLM  ] CLM CONFIGURATION CHANGE 
Aug 24 02:42:36 web2 openais[2689]: [CLM  ] New Configuration: 
Aug 24 02:42:36 web2 openais[2689]: [CLM  ]     r(0) ip(192.168.12.230)  
Aug 24 02:42:36 web2 openais[2689]: [CLM  ]     r(0) ip(192.168.12.231)  
Aug 24 02:42:36 web2 openais[2689]: [CLM  ]     r(0) ip(192.168.12.232)  
Aug 24 02:42:36 web2 openais[2689]: [CLM  ]     r(0) ip(192.168.12.240)  
Aug 24 02:42:36 web2 openais[2689]: [CLM  ] Members Left: 
Aug 24 02:42:36 web2 openais[2689]: [CLM  ] Members Joined: 
Aug 24 02:42:36 web2 openais[2689]: [CLM  ]     r(0) ip(192.168.12.230)  
Aug 24 02:42:36 web2 openais[2689]: [SYNC ] This node is within the primary component and will provide service. 
Aug 24 02:42:36 web2 openais[2689]: [TOTEM] entering OPERATIONAL state. 
Aug 24 02:42:36 web2 openais[2689]: [CLM  ] got nodejoin message 192.168.12.230 
Aug 24 02:42:36 web2 openais[2689]: [CLM  ] got nodejoin message 192.168.12.231 
Aug 24 02:42:36 web2 openais[2689]: [CLM  ] got nodejoin message 192.168.12.232 
Aug 24 02:42:36 web2 openais[2689]: [CLM  ] got nodejoin message 192.168.12.240 
Aug 24 02:42:36 web2 openais[2689]: [CPG  ] got joinlist message from node 3 
Aug 24 02:42:36 web2 openais[2689]: [CPG  ] got joinlist message from node 1 
Aug 24 02:42:36 web2 openais[2689]: [CPG  ] got joinlist message from node 2 
Aug 24 02:42:40 web2 kernel: dlm: got connection from 4
Aug 24 02:43:06 web2 clurgmgrd[3239]: <notice> Relocating service:webserver to better node web1 
Aug 24 02:43:06 web2 clurgmgrd[3239]: <notice> Stopping service service:webserver 
Aug 24 02:43:07 web2 avahi-daemon[3110]: Withdrawing address record for 192.168.12.233 on eth0.
Aug 24 02:43:17 web2 clurgmgrd[3239]: <notice> Service service:webserver is stopped 
从输出可知,节点web1在重新启动后,再次被加入到集群系统中,同时停止自身的服务以及释放IP资源。这个切换方式跟集群设置的Failover Domain策略有关,在创建的的失败转移域webserver-Failover中,没有加入“Do not fail back services in this domain”一项功能,也就是主节点在重新启动后,自动将服务切换回来。
此时在节点web1查看/var/log/messages日志,信息如下:
Aug 24 02:43:19 web1 clurgmgrd[3252]: <notice> stop on script "mysqlscript" returned 5 (program not installed) 
Aug 24 02:43:35 web1 clurgmgrd[3252]: <notice> Starting stopped service service:webserver 
Aug 24 02:43:37 web1 avahi-daemon[3126]: Registering new address record for 192.168.12.233 on eth0.
Aug 24 02:43:38 web1 in.rdiscd[4075]: setsockopt (IP_ADD_MEMBERSHIP): Address already in use
Aug 24 02:43:38 web1 in.rdiscd[4075]: Failed joining addresses 
Aug 24 02:43:39 web1 clurgmgrd[3252]: <notice> Service service:webserver started 
这个输出表明,web1在重启后,自动将集群服务和IP资源切换回来。

(2) 节点web1异常宕机
在节点web1上执行如下命令,让系统内核崩溃:
[root@web2 ~]#echo c>/proc/sysrq-trigger
然后在节点web2上查看/var/log/messages日志,信息如下:
Aug 24 02:59:57 web2 openais[2689]: [TOTEM] entering GATHER state from 12. 
Aug 24 03:00:10 web2 qdiskd[2712]: <notice> Node 4 evicted 
Aug 24 03:00:17 web2 openais[2689]: [TOTEM] entering GATHER state from 11. 
Aug 24 03:00:17 web2 openais[2689]: [TOTEM] Saving state aru 92 high seq received 92 
Aug 24 03:00:17 web2 openais[2689]: [TOTEM] Storing new sequence id for ring be0 
Aug 24 03:00:17 web2 openais[2689]: [TOTEM] entering COMMIT state. 
Aug 24 03:00:17 web2 openais[2689]: [TOTEM] entering RECOVERY state. 
Aug 24 03:00:17 web2 openais[2689]: [TOTEM] position [0] member 192.168.12.231: 
Aug 24 03:00:17 web2 openais[2689]: [TOTEM] previous ring seq 3036 rep 192.168.12.230 
Aug 24 03:00:17 web2 openais[2689]: [TOTEM] aru 92 high delivered 92 received flag 1 
Aug 24 03:00:17 web2 openais[2689]: [TOTEM] position [1] member 192.168.12.232: 
Aug 24 03:00:17 web2 openais[2689]: [TOTEM] previous ring seq 3036 rep 192.168.12.230 
Aug 24 03:00:17 web2 openais[2689]: [TOTEM] aru 92 high delivered 92 received flag 1 
Aug 24 03:00:17 web2 openais[2689]: [TOTEM] position [2] member 192.168.12.240: 
Aug 24 03:00:17 web2 openais[2689]: [TOTEM] previous ring seq 3036 rep 192.168.12.230 
Aug 24 03:00:17 web2 openais[2689]: [TOTEM] aru 92 high delivered 92 received flag 1 
Aug 24 03:00:17 web2 openais[2689]: [TOTEM] Did not need to originate any messages in recovery. 
Aug 24 03:00:17 web2 openais[2689]: [CLM  ] CLM CONFIGURATION CHANGE 
Aug 24 03:00:17 web2 openais[2689]: [CLM  ] New Configuration: 
Aug 24 03:00:17 web2 openais[2689]: [CLM  ]     r(0) ip(192.168.12.231)  
Aug 24 03:00:17 web2 openais[2689]: [CLM  ]     r(0) ip(192.168.12.232)  
Aug 24 03:00:17 web2 openais[2689]: [CLM  ]     r(0) ip(192.168.12.240)  
Aug 24 03:00:17 web2 openais[2689]: [CLM  ] Members Left: 
Aug 24 03:00:17 web2 openais[2689]: [CLM  ]     r(0) ip(192.168.12.230)  
Aug 24 03:00:17 web2 kernel: dlm: closing connection to node 4

Aug 24 03:00:17 web2 openais[2689]: [CLM  ] Members Joined: 
Aug 24 03:00:17 web2 openais[2689]: [CLM  ] CLM CONFIGURATION CHANGE 
Aug 24 03:00:17 web2 openais[2689]: [CLM  ] New Configuration: 
Aug 24 03:00:17 web2 openais[2689]: [CLM  ]     r(0) ip(192.168.12.231)  
Aug 24 03:00:17 web2 openais[2689]: [CLM  ]     r(0) ip(192.168.12.232)  
Aug 24 03:00:17 web2 openais[2689]: [CLM  ]     r(0) ip(192.168.12.240)  
Aug 24 03:00:17 web2 openais[2689]: [CLM  ] Members Left: 
Aug 24 03:00:17 web2 openais[2689]: [CLM  ] Members Joined: 
Aug 24 03:00:17 web2 openais[2689]: [SYNC ] This node is within the primary component and will provide service. 
Aug 24 03:00:17 web2 openais[2689]: [TOTEM] entering OPERATIONAL state. 
Aug 24 03:00:17 web2 openais[2689]: [CLM  ] got nodejoin message 192.168.12.231 
Aug 24 03:00:17 web2 openais[2689]: [CLM  ] got nodejoin message 192.168.12.232 
Aug 24 03:00:17 web2 openais[2689]: [CLM  ] got nodejoin message 192.168.12.240 
Aug 24 03:00:17 web2 openais[2689]: [CPG  ] got joinlist message from node 2 
Aug 24 03:00:17 web2 fenced[2728]: web1 not a cluster member after 0 sec post_fail_delay
Aug 24 03:00:17 web2 openais[2689]: [CPG  ] got joinlist message from node 3 
Aug 24 03:00:17 web2 fenced[2728]: fencing node "web1"
Aug 24 03:00:17 web2 openais[2689]: [CPG  ] got joinlist message from node 1
Aug 24 03:00:55 web2 fenced[2728]: fence "web1" success
Aug 24 03:00:55 web2 kernel: GFS2: fsid=mycluster:my-gfs2.3: jid=0: Trying to acquire journal lock...
Aug 24 03:00:55 web2 kernel: GFS2: fsid=mycluster:my-gfs2.3: jid=0: Looking at journal...
Aug 24 03:00:55 web2 kernel: GFS2: fsid=mycluster:my-gfs2.3: jid=0: Done
Aug 24 03:00:55 web2 clurgmgrd[3239]: <notice> Taking over service service:webserver from down member web1 
Aug 24 03:00:55 web2 avahi-daemon[3110]: Registering new address record for 192.168.12.233 on eth0.
Aug 24 03:00:55 web2 clurgmgrd[3239]: <notice> Service service:webserver started
从输出日志可以看出,web1在异常宕机后,首先由qdiskd进程将失败节点从集群中隔离,然后qdiskd进程将结果返回给cman进程,cman进程接着去调用Fence进程,最后Fence进程根据事先设置好的Fence agent调用Fence设备将web1成功Fence掉,clurgmgrd进程在接到成功Fence的信息后,web2开始接管web1的服务和IP资源,同时释放dlm锁,GFS2文件系统可以正常读写。

4 节点Mysql1宕机时
节点Mysql1在正常关机和异常宕机时,RHCS的切换状态与上面讲述的web1节点宕机情况一模一样,这么不在重复演示。

5 四个节点任意三个宕机
 这里将web1、Mysql2、Mysql1依次异常宕机,然后在web2节点查看RHCS是如何进行切换动作的。
首先停止web1节点,查看/var/log/messages日志,信息如下:
Aug 24 18:57:55 web2 openais[2691]: [TOTEM] entering GATHER state from 12. 
Aug 24 18:58:14 web2 qdiskd[2714]: <notice> Writing eviction notice for node 4 
Aug 24 18:58:15 web2 openais[2691]: [TOTEM] entering GATHER state from 0. 
Aug 24 18:58:15 web2 openais[2691]: [TOTEM] Saving state aru 8e high seq received 8e 
Aug 24 18:58:15 web2 openais[2691]: [TOTEM] Storing new sequence id for ring c14 
Aug 24 18:58:15 web2 openais[2691]: [TOTEM] entering COMMIT state. 
Aug 24 18:58:15 web2 openais[2691]: [TOTEM] entering RECOVERY state. 
Aug 24 18:58:15 web2 openais[2691]: [TOTEM] position [0] member 192.168.12.231: 
Aug 24 18:58:15 web2 openais[2691]: [TOTEM] previous ring seq 3088 rep 192.168.12.230 
Aug 24 18:58:15 web2 openais[2691]: [TOTEM] aru 8e high delivered 8e received flag 1 
Aug 24 18:58:15 web2 openais[2691]: [TOTEM] position [1] member 192.168.12.232: 
Aug 24 18:58:15 web2 openais[2691]: [TOTEM] previous ring seq 3088 rep 192.168.12.230 
Aug 24 18:58:15 web2 openais[2691]: [TOTEM] aru 8e high delivered 8e received flag 1 
Aug 24 18:58:15 web2 openais[2691]: [TOTEM] position [2] member 192.168.12.240: 
Aug 24 18:58:15 web2 openais[2691]: [TOTEM] previous ring seq 3088 rep 192.168.12.230 
Aug 24 18:58:15 web2 openais[2691]: [TOTEM] aru 8e high delivered 8e received flag 1 
Aug 24 18:58:15 web2 openais[2691]: [TOTEM] Did not need to originate any messages in recovery. 
Aug 24 18:58:15 web2 openais[2691]: [CLM  ] CLM CONFIGURATION CHANGE 
Aug 24 18:58:15 web2 openais[2691]: [CLM  ] New Configuration: 
Aug 24 18:58:15 web2 openais[2691]: [CLM  ]     r(0) ip(192.168.12.231)  
Aug 24 18:58:15 web2 openais[2691]: [CLM  ]     r(0) ip(192.168.12.232)  
Aug 24 18:58:15 web2 openais[2691]: [CLM  ]     r(0) ip(192.168.12.240)  
Aug 24 18:58:15 web2 kernel: dlm: closing connection to node 4
Aug 24 18:58:15 web2 openais[2691]: [CLM  ] Members Left: 
Aug 24 18:58:15 web2 openais[2691]: [CLM  ]     r(0) ip(192.168.12.230)  
Aug 24 18:58:15 web2 openais[2691]: [CLM  ] Members Joined: 
Aug 24 18:58:15 web2 openais[2691]: [CLM  ] CLM CONFIGURATION CHANGE 
Aug 24 18:58:15 web2 openais[2691]: [CLM  ] New Configuration: 
Aug 24 18:58:15 web2 openais[2691]: [CLM  ]     r(0) ip(192.168.12.231)  
Aug 24 18:58:15 web2 openais[2691]: [CLM  ]     r(0) ip(192.168.12.232)  
Aug 24 18:58:15 web2 openais[2691]: [CLM  ]     r(0) ip(192.168.12.240)  
Aug 24 18:58:15 web2 openais[2691]: [CLM  ] Members Left: 

Aug 24 18:58:15 web2 openais[2691]: [CLM  ] Members Joined: 
Aug 24 18:58:15 web2 openais[2691]: [SYNC ] This node is within the primary component and will provide service. 
Aug 24 18:58:15 web2 openais[2691]: [TOTEM] entering OPERATIONAL state. 
Aug 24 18:58:15 web2 openais[2691]: [CLM  ] got nodejoin message 192.168.12.231 
Aug 24 18:58:15 web2 openais[2691]: [CLM  ] got nodejoin message 192.168.12.232 
Aug 24 18:58:15 web2 openais[2691]: [CLM  ] got nodejoin message 192.168.12.240 
Aug 24 18:58:15 web2 fenced[2730]: web1 not a cluster member after 0 sec post_fail_delay
Aug 24 18:58:15 web2 openais[2691]: [CPG  ] got joinlist message from node 3 
Aug 24 18:58:15 web2 fenced[2730]: fencing node "web1"
Aug 24 18:58:15 web2 openais[2691]: [CPG  ] got joinlist message from node 1 
Aug 24 18:58:15 web2 openais[2691]: [CPG  ] got joinlist message from node 2 
Aug 24 18:58:17 web2 qdiskd[2714]: <notice> Node 4 evicted 

Aug 24 18:58:29 web2 fenced[2730]: fence "web1" success
Aug 24 18:58:29 web2 kernel: GFS2: fsid=mycluster:my-gfs2.1: jid=3: Trying to acquire journal lock...
Aug 24 18:58:29 web2 kernel: GFS2: fsid=mycluster:my-gfs2.1: jid=3: Looking at journal...
Aug 24 18:58:29 web2 kernel: GFS2: fsid=mycluster:my-gfs2.1: jid=3: Done
Aug 24 18:58:30 web2 clurgmgrd[3300]: <notice> Taking over service service:webserver from down member web1 
Aug 24 18:58:32 web2 avahi-daemon[3174]: Registering new address record for 192.168.12.233 on eth0.
Aug 24 18:58:33 web2 clurgmgrd[3300]: <notice> Service service:webserver started 

 通过观察日志可以发现,qdiskd进程会首先将节点web1从集群隔离,然后由fenced进程将web1成功fence掉,最后,web2才接管了web1的服务和IP资源。这里有个先后问题,也就是说,只有Fence成功,集群资源切换才会进行。
如果fenced进程没有成功将web1节点fence掉,那么RHCS会进入等待状态,此时集群系统GFS2共享存储也将变得不可读写,直到Fence进程返回成功信息,集群才开始进行资源切换,同时GFS2文件系统也将恢复读写。
此时,在web2节点通过cman_tool查看集群状态,信息如下:
[root@web2 ~]# cman_tool  status
Version: 6.2.0
Config Version: 40
Cluster Name: mycluster
Cluster Id: 56756
Cluster Member: Yes
Cluster Generation: 3092
Membership state: Cluster-Member
Nodes: 3
Expected votes: 6
Quorum device votes: 2
Total votes: 5
Quorum: 3  

Active subsystems: 9
Flags: Dirty 
Ports Bound: 0 177  
Node name: web2
Node ID: 1
Multicast addresses: 239.192.221.146 
Node addresses: 192.168.12.240 
 从输出可知,集群节点数变为3,同时Quorum值也有原来的4变为3,Quorum值是通过N/2整除加一得到的,其中N为所有集群节点的投票数加上表决磁盘投票值。其实也就在这里的“Total votes”值。
接着陆续将Mysql2、Mysql1异常宕机,宕机完成,在web2节点通过cman_tool查看集群状态,信息如下:

[root@web2 ~]# cman_tool  status
Version: 6.2.0
Config Version: 40
Cluster Name: mycluster
Cluster Id: 56756
Cluster Member: Yes
Cluster Generation: 3100
Membership state: Cluster-Member
Nodes: 1
Expected votes: 6
Quorum device votes: 2
Total votes: 3
Quorum: 2  
Active subsystems: 9

Flags: Dirty 
Ports Bound: 0 177  
Node name: web2
Node ID: 1
Multicast addresses: 239.192.221.146 
Node addresses: 192.168.12.240 
从输出可知,整个集群系统节点数已经变为一个,但是集群系统仍然可用,GFS2仍然可以正常读写,这些都是Qdisk的功能,一个RHCS集群中,如果没有表决磁盘的话,
仅剩一个节点的集群是不能工作的。

二、存储集群测试
 GFS2文件系统是RHCS集群中的共享存储,在集群各个节点挂载GFS2共享文件系统,然后在任意节点对共享分区进行数据的读写操作,例如:
在节点web2上创建一个文件ixdba:
[root@web2 gfs2]# echo "This is GFS2 Files test" >/gfs2/ixdba
然后在节点web1查看此文件
[root@web1 gfs2]# more /gfs2/ixdba
This is GFS2 Files test
可以看到,写操作正常。
         接着测试同时读写文件,首先在节点web1上编辑文件ixdba,然后同时在节点web2也编辑ixdba,此时会提示该文件locked。
         到这里为止,RHCS所有功能点都已经测试完成了,在测试集群功能时,观察日志信息是很有必要的,通过观察日志输出,可以知道RHCS更加详细的切换过程,如果切换失败,也会在日志中输出,然后根据错误信息,就可以很容易的定位问题。
         目前,RHCS已经广泛应用在企业的各个领域,与其它HA软件相比,RHCS可能显得比较复杂,管理和维护会比较困难,但是RHCS的可靠性和稳定性是毋庸置疑的,随着人们对业务实时性和稳定性需求的不断提升,RHCS的应用毕将越来越广泛。

(待续)













本文转自南非蚂蚁51CTO博客,原文链接: http://blog.51cto.com/ixdba/599020,如需转载请自行联系原作者




相关实践学习
基于CentOS快速搭建LAMP环境
本教程介绍如何搭建LAMP环境,其中LAMP分别代表Linux、Apache、MySQL和PHP。
全面了解阿里云能为你做什么
阿里云在全球各地部署高效节能的绿色数据中心,利用清洁计算为万物互联的新世界提供源源不断的能源动力,目前开服的区域包括中国(华北、华东、华南、香港)、新加坡、美国(美东、美西)、欧洲、中东、澳大利亚、日本。目前阿里云的产品涵盖弹性计算、数据库、存储与CDN、分析与搜索、云通信、网络、管理与监控、应用服务、互联网中间件、移动服务、视频服务等。通过本课程,来了解阿里云能够为你的业务带来哪些帮助 &nbsp; &nbsp; 相关的阿里云产品:云服务器ECS 云服务器 ECS(Elastic Compute Service)是一种弹性可伸缩的计算服务,助您降低 IT 成本,提升运维效率,使您更专注于核心业务创新。产品详情: https://www.aliyun.com/product/ecs
相关文章
|
6月前
|
存储 Oracle 关系型数据库
HBase集群环境搭建与测试(上)
HBase集群环境搭建与测试
125 0
|
1月前
|
分布式计算 资源调度 Hadoop
Hadoop集群基本测试
Hadoop集群基本测试
26 0
|
4月前
|
NoSQL 测试技术 Redis
Redis【性能 02】Redis-5.0.14伪集群和Docker集群搭建及延迟和性能测试(均无法提升性能)
Redis【性能 02】Redis-5.0.14伪集群和Docker集群搭建及延迟和性能测试(均无法提升性能)
151 0
|
4月前
|
分布式计算 Hadoop 数据安全/隐私保护
HDFS--HA部署安装:修改配置文件 测试集群工作状态的一些指令
HDFS--HA部署安装:修改配置文件 测试集群工作状态的一些指令
44 0
|
4月前
|
Kubernetes Cloud Native 应用服务中间件
云原生|kubernetes|k8s集群测试时的一些基本操作
云原生|kubernetes|k8s集群测试时的一些基本操作
70 0
|
4月前
|
Kubernetes 测试技术 开发工具
云效我标签只有测试环境:但我其实对应了两个k8s集群(测试A,测试B)环境,这种情况怎么处理呢?
云效我标签只有测试环境:但我其实对应了两个k8s集群(测试A,测试B)环境,这种情况怎么处理呢?
117 1
|
5月前
|
存储 Kubernetes API
kubernetes集群测试方案及工具?
kubernetes集群测试方案及工具?
85 1
|
6月前
|
Java C++ Spring
Spring Boot - 用JUnit 5构建完美的Spring Boot测试套件
Spring Boot - 用JUnit 5构建完美的Spring Boot测试套件
65 0
|
6月前
|
分布式计算 Hadoop Linux
HBase集群环境搭建与测试(下)
HBase集群环境搭建与测试
81 0
|
6月前
|
分布式计算 资源调度 Hadoop
Spark on Yarn集群模式搭建及测试
Spark on Yarn集群模式搭建及测试
156 0

热门文章

最新文章