在之前的文章(https://yq.aliyun.com/articles/711825?spm=a2c4e.11155435.0.0.1cd03312XDrTxu)中,对Redis Cluster的部分原理进行了介绍,本文将对Redis Cluster的故障转移、集群fail认定、主节点保护特性进行验证。
一、故障转移验证
192.168.0.119:8002> cluster nodes
255e7458f32bc45601d8797676e27b647362a74b 192.168.0.119:8003@18003 master - 0 1565081996000 21 connected 0-5460
a830ae87669d8314223184c66a46f5395c21ce12 192.168.0.120:8004@18004 slave 635aa426f39865951de9345e4bbfa1423c334e33 0 1565081993650 16 connected
ff42d2fc4c9d607e643169e8749976371a56cad0 192.168.0.118:8000@18000 slave 255e7458f32bc45601d8797676e27b647362a74b 0 1565081993000 21 connected
a6e1cb7ba768f86cb5269b055db9ad307f8299ae 192.168.0.119:8002@18002 myself,master - 0 1565081994000 13 connected 5461-10922
4446675c36711658dcf54d64ae7b72977e06e631 192.168.0.120:8006@18006 slave 635aa426f39865951de9345e4bbfa1423c334e33 0 1565081996677 21 connected
526c9a107b5df51fdd932d268bf290f47f3fb467 192.168.0.120:8005@18005 slave a6e1cb7ba768f86cb5269b055db9ad307f8299ae 0 1565081994658 13 connected
635aa426f39865951de9345e4bbfa1423c334e33 192.168.0.118:8001@18001 master - 0 1565081993000 16 connected 10923-16383
可以看到,8001有两个从节点8004、8006,8002有一个从节点8005,8003有一个从节点8000。然后我们kill掉8001节点。
[root@zj1 ~]# ps -ef | grep redis
root 101711 1 0 15:53 ? 00:00:06 ./redis-server 192.168.0.118:8001 [cluster]
root 101888 1 0 16:09 ? 00:00:04 ./redis-server 192.168.0.118:8000 [cluster]
root 102687 102645 0 17:02 pts/3 00:00:00 grep --color=auto redis
[root@zj1 ~]# kill -9 101711
[root@zj1 ~]# ps -ef | grep redis
root 101888 1 0 16:09 ? 00:00:04 ./redis-server 192.168.0.118:8000 [cluster]
root 102689 102645 0 17:02 pts/3 00:00:00 grep --color=auto redis
192.168.0.119:8002> cluster nodes
255e7458f32bc45601d8797676e27b647362a74b 192.168.0.119:8003@18003 master - 0 1565082155961 21 connected 0-5460
a830ae87669d8314223184c66a46f5395c21ce12 192.168.0.120:8004@18004 master - 0 1565082157978 22 connected 10923-16383
ff42d2fc4c9d607e643169e8749976371a56cad0 192.168.0.118:8000@18000 slave 255e7458f32bc45601d8797676e27b647362a74b 0 1565082157000 21 connected
a6e1cb7ba768f86cb5269b055db9ad307f8299ae 192.168.0.119:8002@18002 myself,master - 0 1565082154000 13 connected 5461-10922
4446675c36711658dcf54d64ae7b72977e06e631 192.168.0.120:8006@18006 slave a830ae87669d8314223184c66a46f5395c21ce12 0 1565082157000 22 connected
526c9a107b5df51fdd932d268bf290f47f3fb467 192.168.0.120:8005@18005 slave a6e1cb7ba768f86cb5269b055db9ad307f8299ae 0 1565082156970 13 connected
635aa426f39865951de9345e4bbfa1423c334e33 192.168.0.118:8001@18001 master,fail - 1565082136987 1565082135000 16 disconnected
此时可以观察到8001节点被标记为fail状态,主节点变更为8004,8006指向了8004。
192.168.0.119:8002> cluster nodes
255e7458f32bc45601d8797676e27b647362a74b 192.168.0.119:8003@18003 master - 0 1565082339000 21 connected 0-5460
a830ae87669d8314223184c66a46f5395c21ce12 192.168.0.120:8004@18004 master - 0 1565082340000 22 connected 10923-16383
ff42d2fc4c9d607e643169e8749976371a56cad0 192.168.0.118:8000@18000 slave 255e7458f32bc45601d8797676e27b647362a74b 0 1565082339716 21 connected
a6e1cb7ba768f86cb5269b055db9ad307f8299ae 192.168.0.119:8002@18002 myself,master - 0 1565082339000 13 connected 5461-10922
4446675c36711658dcf54d64ae7b72977e06e631 192.168.0.120:8006@18006 slave a830ae87669d8314223184c66a46f5395c21ce12 0 1565082339000 22 connected
526c9a107b5df51fdd932d268bf290f47f3fb467 192.168.0.120:8005@18005 slave a6e1cb7ba768f86cb5269b055db9ad307f8299ae 0 1565082340722 13 connected
635aa426f39865951de9345e4bbfa1423c334e33 192.168.0.118:8001@18001 slave a830ae87669d8314223184c66a46f5395c21ce12 0 1565082341729 22 connected
将8001节点拉起后,8001节点会变为8004的从节点,并不会发生争抢。
下面从日志的角度来分析Redis Cluster故障转移的流程:
#新主节点8004日志
95036:S 06 Aug 2019 17:02:33.210 * Connecting to MASTER 192.168.0.118:8001
95036:S 06 Aug 2019 17:02:33.210 * MASTER <-> REPLICA sync started
95036:S 06 Aug 2019 17:02:33.210 # Error condition on socket for SYNC: Connection refused
#连接不上主节点
95036:S 06 Aug 2019 17:02:34.049 * FAIL message received from a6e1cb7ba768f86cb5269b055db9ad307f8299ae about 635aa426f39865951de9345e4bbfa1423c334e33
#接收到8002节点发来的关于8001节点的消息
95036:S 06 Aug 2019 17:02:34.049 # Cluster state changed: fail#
#集群fail
95036:S 06 Aug 2019 17:02:34.116 # Start of election delayed for 698 milliseconds (rank #0, offset 5586).
#延迟698毫秒的选举开始(排名0,偏移量5586)。
95036:S 06 Aug 2019 17:02:34.216 * Connecting to MASTER 192.168.0.118:8001
95036:S 06 Aug 2019 17:02:34.216 * MASTER <-> REPLICA sync started
95036:S 06 Aug 2019 17:02:34.216 # Error condition on socket for SYNC: Connection refused
#再次连接主节点被拒绝
95036:S 06 Aug 2019 17:02:34.821 # Starting a failover election for epoch 22.
#启动选举,信息版本号是22。
95036:S 06 Aug 2019 17:02:34.825 # Failover election won: I'm the new master.
#选举中获胜,成为新主。
95036:S 06 Aug 2019 17:02:34.825 # configEpoch set to 22 after successful failover
#配置信息版本号22在故障转移成功之后。
95036:M 06 Aug 2019 17:02:34.825 # Setting secondary replication ID to 016aa00270452293b7d508d9a398ad35f017fcd3, valid up to offset: 5587. New replication ID is 8ff9afd63a115c2539002134ede6cfaf3e250840
#更新复制ID信息,在info中可以查到
95036:M 06 Aug 2019 17:02:34.825 * Discarding previously cached master state.
#丢弃以前的主缓存。
95036:M 06 Aug 2019 17:02:34.825 # Cluster state changed: ok
#集群ok
95036:M 06 Aug 2019 17:02:34.826 # Failover auth denied to 4446675c36711658dcf54d64ae7b72977e06e631: slot 10923 epoch (22) > reqEpoch (16)
#拒绝了8006的故障转移,因为8004已经选举成功。
95036:M 06 Aug 2019 17:02:35.431 * Replica 192.168.0.120:8006 asks for synchronization
#8006同步信息
95036:M 06 Aug 2019 17:02:35.431 * Partial resynchronization request from 192.168.0.120:8006 accepted. Sending 0 bytes of backlog starting from offset 5587.
#接受8006的基本重同步请求。从偏移量5587开始发送backlog。
95036:M 06 Aug 2019 17:05:39.157 * Clear FAIL state for node 635aa426f39865951de9345e4bbfa1423c334e33: master without slots is reachable again.
#清除原主8001节点故障状态。
95036:M 06 Aug 2019 17:05:40.090 * Replica 192.168.0.118:8001 asks for synchronization
95036:M 06 Aug 2019 17:05:40.090 * Partial resynchronization not accepted: Replication ID mismatch (Replica asked for '524d80f9c935b843a30be1e0b776353032846fd0', my replication IDs are '8ff9afd63a115c2539002134ede6cfaf3e250840' and '016aa00270452293b7d508d9a398ad35f017fcd3')
#不接受重复同步,复制ID信息
95036:M 06 Aug 2019 17:05:40.090 * Starting BGSAVE for SYNC with target: disk
#启动bgsave
95036:M 06 Aug 2019 17:05:40.091 * Background saving started by pid 95110
#后台线程信息
95110:C 06 Aug 2019 17:05:40.094 * DB saved on disk
95110:C 06 Aug 2019 17:05:40.094 * RDB: 4 MB of memory used by copy-on-write
#RDB备份信息
95036:M 06 Aug 2019 17:05:40.164 * Background saving terminated with success
#保存成功
95036:M 06 Aug 2019 17:05:40.164 * Synchronization with replica 192.168.0.118:8001 succeeded
#同步成功
#竞选从节点8006日志信息
94265:S 06 Aug 2019 17:02:33.410 * Connecting to MASTER 192.168.0.118:8001
94265:S 06 Aug 2019 17:02:33.410 * MASTER <-> REPLICA sync started
94265:S 06 Aug 2019 17:02:33.410 # Error condition on socket for SYNC: Connection refused
#连接主库被拒绝
94265:S 06 Aug 2019 17:02:34.048 * FAIL message received from a6e1cb7ba768f86cb5269b055db9ad307f8299ae about 635aa426f39865951de9345e4bbfa1423c334e33
#接收到8002节点发来的关于8001节点的消息
94265:S 06 Aug 2019 17:02:34.048 # Cluster state changed: fail
#集群fail
94265:S 06 Aug 2019 17:02:34.115 # Start of election delayed for 625 milliseconds (rank #0, offset 5586).
#延迟625毫秒的选举开始(排名0,偏移量5586)。
94265:S 06 Aug 2019 17:02:34.420 * Connecting to MASTER 192.168.0.118:8001
94265:S 06 Aug 2019 17:02:34.420 * MASTER <-> REPLICA sync started
94265:S 06 Aug 2019 17:02:34.420 # Error condition on socket for SYNC: Connection refused
#再次连接主库被拒绝
94265:S 06 Aug 2019 17:02:34.824 # Starting a failover election for epoch 23.
#开始选举,信息版本号是23
94265:S 06 Aug 2019 17:02:34.826 # Configuration change detected. Reconfiguring myself as a replica of a830ae87669d8314223184c66a46f5395c21ce12
#将自己配置为8004的从库
94265:S 06 Aug 2019 17:02:34.826 # Cluster state changed: ok
#集群ok
94265:S 06 Aug 2019 17:02:35.429 * Connecting to MASTER 192.168.0.120:8004
94265:S 06 Aug 2019 17:02:35.429 * MASTER <-> REPLICA sync started
94265:S 06 Aug 2019 17:02:35.430 * Non blocking connect for SYNC fired the event.
94265:S 06 Aug 2019 17:02:35.430 * Master replied to PING, replication can continue...
#连接主库进行复制,可以ping通,复制可以继续。
94265:S 06 Aug 2019 17:02:35.430 * Trying a partial resynchronization (request 016aa00270452293b7d508d9a398ad35f017fcd3:5587).
#尝试增量同步
94265:S 06 Aug 2019 17:02:35.431 * Successful partial resynchronization with master.
#同步成功
94265:S 06 Aug 2019 17:02:35.431 # Master replication ID changed to 8ff9afd63a115c2539002134ede6cfaf3e250840
#主库复制ID变更。
94265:S 06 Aug 2019 17:02:35.431 * MASTER <-> REPLICA sync: Master accepted a Partial Resynchronization.
#主库接受增量同步
94265:S 06 Aug 2019 17:05:39.156 * Clear FAIL state for node 635aa426f39865951de9345e4bbfa1423c334e33: master without slots is reachable again.
#清除8001节点失败信息,节点没有哈希槽,降为从库。
#原主节点8001日志信息
102731:M 06 Aug 2019 17:05:39.076 * Reading RDB preamble from AOF file...
#读取AOF文件头部的RDB信息
102731:M 06 Aug 2019 17:05:39.076 * Reading the remaining AOF tail...
#读取AOF信息
102731:M 06 Aug 2019 17:05:39.076 * DB loaded from append only file: 0.000 seconds
#加载花费
102731:M 06 Aug 2019 17:05:39.076 * Ready to accept connections
#准备接受连接
102731:M 06 Aug 2019 17:05:39.078 # Configuration change detected. Reconfiguring myself as a replica of a830ae87669d8314223184c66a46f5395c21ce12
#更改配置,作为8004的从库
102731:S 06 Aug 2019 17:05:39.078 * Before turning into a replica, using my master parameters to synthesize a cached master: I may be able to synchronize with the new master with just a partial transfer.
#在转换为副本之前,使用我的参数来综合缓存的主服务器:我可能只需要部分传输就可以与新主服务器同步。
102731:S 06 Aug 2019 17:05:39.078 # Cluster state changed: ok
#集群状态ok
102731:S 06 Aug 2019 17:05:40.088 * Connecting to MASTER 192.168.0.120:8004
102731:S 06 Aug 2019 17:05:40.088 * MASTER <-> REPLICA sync started
102731:S 06 Aug 2019 17:05:40.088 * Non blocking connect for SYNC fired the event.
102731:S 06 Aug 2019 17:05:40.089 * Master replied to PING, replication can continue...
102731:S 06 Aug 2019 17:05:40.089 * Trying a partial resynchronization (request 524d80f9c935b843a30be1e0b776353032846fd0:1).
102731:S 06 Aug 2019 17:05:40.092 * Full resync from master: 8ff9afd63a115c2539002134ede6cfaf3e250840:5838
102731:S 06 Aug 2019 17:05:40.092 * Discarding previously cached master state.
#连接新主库进行同步
102731:S 06 Aug 2019 17:05:40.164 * MASTER <-> REPLICA sync: receiving 176 bytes from master
102731:S 06 Aug 2019 17:05:40.164 * MASTER <-> REPLICA sync: Flushing old data
102731:S 06 Aug 2019 17:05:40.164 * MASTER <-> REPLICA sync: Loading DB in memory
102731:S 06 Aug 2019 17:05:40.164 * MASTER <-> REPLICA sync: Finished with success
#成功
102731:S 06 Aug 2019 17:05:40.165 * Background append only file rewriting started by pid 102743
102731:S 06 Aug 2019 17:05:40.202 * AOF rewrite child asks to stop sending diffs.
102743:C 06 Aug 2019 17:05:40.202 * Parent agreed to stop sending diffs. Finalizing AOF...
102743:C 06 Aug 2019 17:05:40.202 * Concatenating 0.00 MB of AOF diff received from parent.
102743:C 06 Aug 2019 17:05:40.202 * SYNC append only file rewrite performed
102743:C 06 Aug 2019 17:05:40.203 * AOF rewrite: 4 MB of memory used by copy-on-write
102731:S 06 Aug 2019 17:05:40.289 * Background AOF rewrite terminated with success
102731:S 06 Aug 2019 17:05:40.289 * Residual parent diff successfully flushed to the rewritten AOF (0.00 MB)
102731:S 06 Aug 2019 17:05:40.289 * Background AOF rewrite finished successfully
#以上为AOF重新内容
#其余主库8002
25810:M 06 Aug 2019 17:02:34.046 * Marking node 635aa426f39865951de9345e4bbfa1423c334e33 as failing (quorum reached).
#标记8001节点fail
25810:M 06 Aug 2019 17:02:34.046 # Cluster state changed: fail
#集群fail
25810:M 06 Aug 2019 17:02:34.822 # Failover auth granted to a830ae87669d8314223184c66a46f5395c21ce12 for epoch 22
#故障转移授权给版本号22
25810:M 06 Aug 2019 17:02:34.825 # Failover auth denied to 4446675c36711658dcf54d64ae7b72977e06e631: can't vote about this master before 29997 milliseconds
#拒绝8006的故障转移,在29997毫秒不能再参加选举
25810:M 06 Aug 2019 17:02:34.826 # Cluster state changed: ok
#集群OK
25810:M 06 Aug 2019 17:05:39.112 * Clear FAIL state for node 635aa426f39865951de9345e4bbfa1423c334e33: master without slots is reachable again.
#清除原主库8001故障信息
#其余从库8005
94103:S 06 Aug 2019 17:02:34.048 * FAIL message received from a6e1cb7ba768f86cb5269b055db9ad307f8299ae about 635aa426f39865951de9345e4bbfa1423c334e33
#接收到来自8002发来的关于8001的信息
94103:S 06 Aug 2019 17:02:34.048 # Cluster state changed: fail
#集群fail
94103:S 06 Aug 2019 17:02:34.863 # Cluster state changed: ok
#集群ok
94103:S 06 Aug 2019 17:05:39.157 * Clear FAIL state for node 635aa426f39865951de9345e4bbfa1423c334e33: master without slots is reachable again.
#清除故障信息
二、集群fail认定验证
192.168.0.118:8001> cluster nodes
a6e1cb7ba768f86cb5269b055db9ad307f8299ae 192.168.0.119:8002@18002 slave 526c9a107b5df51fdd932d268bf290f47f3fb467 0 1565140375234 25 connected
635aa426f39865951de9345e4bbfa1423c334e33 192.168.0.118:8001@18001 myself,slave a830ae87669d8314223184c66a46f5395c21ce12 0 1565140375000 24 connected
4446675c36711658dcf54d64ae7b72977e06e631 192.168.0.120:8006@18006 slave 526c9a107b5df51fdd932d268bf290f47f3fb467 0 1565140374222 25 connected
255e7458f32bc45601d8797676e27b647362a74b 192.168.0.119:8003@18003 master - 1565140361899 1565140359000 21 disconnected 0-5460
ff42d2fc4c9d607e643169e8749976371a56cad0 192.168.0.118:8000@18000 slave 255e7458f32bc45601d8797676e27b647362a74b 1565140361899 1565140360085 21 disconnected
526c9a107b5df51fdd932d268bf290f47f3fb467 192.168.0.120:8005@18005 master - 0 1565140374000 25 connected 5461-10922
a830ae87669d8314223184c66a46f5395c21ce12 192.168.0.120:8004@18004 master - 0 1565140372199 29 connected 10923-16383
可以看到,8005有一个从节点8002、8006,8003有一个从节点8000,8004有两个从节点8001。这是我们同时kill8003和8000节点。
[root@zj1 ~]# ps -ef | grep redis
root 101888 1 0 03:10 ? 00:01:01 ./redis-server 192.168.0.118:8000 [cluster]
root 102731 1 0 04:06 ? 00:00:55 ./redis-server 192.168.0.118:8001 [cluster]
root 107766 107714 0 08:55 pts/1 00:00:00 grep --color=auto redis
[root@zj1 ~]# kill -9 101888
[root@zj1 ~]# ps -ef | grep redis
root 102731 1 0 04:06 ? 00:00:58 ./redis-server 192.168.0.118:8001 [cluster]
root 108046 107714 0 09:20 pts/1 00:00:00 grep --color=auto redis
[root@zj2 ~]# ps -ef | grep redis
root 25810 1 0 02:54 ? 00:01:02 ./redis-server 192.168.0.119:8002 [cluster]
root 25827 1 0 02:54 ? 00:01:02 ./redis-server 192.168.0.119:8003 [cluster]
root 31495 31455 0 09:11 pts/1 00:00:00 grep --color=auto redis
[root@zj2 ~]# ps -ef | grep redis
root 25810 1 0 02:54 ? 00:01:04 ./redis-server 192.168.0.119:8002 [cluster]
root 31587 31455 0 09:20 pts/1 00:00:00 grep --color=auto redis
192.168.0.118:8001> cluster nodes
a6e1cb7ba768f86cb5269b055db9ad307f8299ae 192.168.0.119:8002@18002 slave 526c9a107b5df51fdd932d268bf290f47f3fb467 0 1565140751761 25 connected
635aa426f39865951de9345e4bbfa1423c334e33 192.168.0.118:8001@18001 myself,slave a830ae87669d8314223184c66a46f5395c21ce12 0 1565140750000 24 connected
4446675c36711658dcf54d64ae7b72977e06e631 192.168.0.120:8006@18006 slave 526c9a107b5df51fdd932d268bf290f47f3fb467 0 1565140749743 25 connected
255e7458f32bc45601d8797676e27b647362a74b 192.168.0.119:8003@18003 master,fail - 1565140361899 1565140359000 21 disconnected 0-5460
ff42d2fc4c9d607e643169e8749976371a56cad0 192.168.0.118:8000@18000 slave,fail 255e7458f32bc45601d8797676e27b647362a74b 1565140361899 1565140360085 21 disconnected
526c9a107b5df51fdd932d268bf290f47f3fb467 192.168.0.120:8005@18005 master - 0 1565140750753 25 connected 5461-10922
a830ae87669d8314223184c66a46f5395c21ce12 192.168.0.120:8004@18004 master - 0 1565140752775 29 connected 10923-16383
192.168.0.118:8001> get name
(error) CLUSTERDOWN The cluster is down
192.168.0.118:8001> cluster nodes
a6e1cb7ba768f86cb5269b055db9ad307f8299ae 192.168.0.119:8002@18002 slave 526c9a107b5df51fdd932d268bf290f47f3fb467 0 1565141081955 25 connected
635aa426f39865951de9345e4bbfa1423c334e33 192.168.0.118:8001@18001 myself,slave a830ae87669d8314223184c66a46f5395c21ce12 0 1565141079000 24 connected
4446675c36711658dcf54d64ae7b72977e06e631 192.168.0.120:8006@18006 slave 526c9a107b5df51fdd932d268bf290f47f3fb467 0 1565141083971 25 connected
255e7458f32bc45601d8797676e27b647362a74b 192.168.0.119:8003@18003 master - 0 1565141082963 21 connected 0-5460
ff42d2fc4c9d607e643169e8749976371a56cad0 192.168.0.118:8000@18000 slave 255e7458f32bc45601d8797676e27b647362a74b 0 1565141081000 21 connected
526c9a107b5df51fdd932d268bf290f47f3fb467 192.168.0.120:8005@18005 master - 0 1565141081000 25 connected 5461-10922
a830ae87669d8314223184c66a46f5395c21ce12 192.168.0.120:8004@18004 master - 0 1565141082000 29 connected 10923-16383
192.168.0.118:8001> get name
-> Redirected to slot [5798] located at 192.168.0.120:8005
"1"
此时可以看到集群fail。将节点重新启动后,集群恢复正常。然后我们再同时kill掉8004、8003节点。
[root@zj2 bin]# ps -ef | grep redis
root 25810 1 0 02:54 ? 00:01:09 ./redis-server 192.168.0.119:8002 [cluster]
root 31672 1 0 09:24 ? 00:00:03 ./redis-server 192.168.0.119:8003 [cluster]
root 31677 31238 0 09:24 pts/0 00:00:00 ./redis-cli -c -h 192.168.0.118 -p 8001
root 32070 31455 0 10:05 pts/1 00:00:00 grep --color=auto redis
[root@zj2 bin]# kill -9 31672
[root@zj2 bin]# ps -ef | grep redis
root 25810 1 0 02:54 ? 00:01:11 ./redis-server 192.168.0.119:8002 [cluster]
root 31677 31238 0 09:24 pts/0 00:00:00 ./redis-cli -c -h 192.168.0.118 -p 8001
root 32250 31455 0 10:25 pts/1 00:00:00 grep --color=auto redis
[root@zj3 ~]# ps -ef | grep redis
root 94265 1 0 03:04 ? 00:01:02 ./redis-server 192.168.0.120:8006 [cluster]
root 95036 1 0 04:00 ? 00:00:57 ./redis-server 192.168.0.120:8004 [cluster]
root 97114 1 0 07:15 ? 00:00:27 ./redis-server 192.168.0.120:8005 [cluster]
root 101590 101550 0 10:22 pts/0 00:00:00 grep --color=auto redis
[root@zj3 ~]# kill -9 95036
[root@zj3 ~]# ps -ef | grep redis
root 94265 1 0 03:04 ? 00:01:02 ./redis-server 192.168.0.120:8006 [cluster]
root 97114 1 0 07:15 ? 00:00:28 ./redis-server 192.168.0.120:8005 [cluster]
root 101616 101550 0 10:25 pts/0 00:00:00 grep --color=auto redis
192.168.0.120:8005> cluster nodes
255e7458f32bc45601d8797676e27b647362a74b 192.168.0.119:8003@18003 master,fail? - 1565144621348 1565144620000 21 disconnected 0-5460
4446675c36711658dcf54d64ae7b72977e06e631 192.168.0.120:8006@18006 slave 526c9a107b5df51fdd932d268bf290f47f3fb467 0 1565144676000 25 connected
ff42d2fc4c9d607e643169e8749976371a56cad0 192.168.0.118:8000@18000 slave 255e7458f32bc45601d8797676e27b647362a74b 0 1565144677009 21 connected
526c9a107b5df51fdd932d268bf290f47f3fb467 192.168.0.120:8005@18005 myself,master - 0 1565144673000 25 connected 5461-10922
635aa426f39865951de9345e4bbfa1423c334e33 192.168.0.118:8001@18001 slave a830ae87669d8314223184c66a46f5395c21ce12 0 1565144675998 29 connected
a6e1cb7ba768f86cb5269b055db9ad307f8299ae 192.168.0.119:8002@18002 slave 526c9a107b5df51fdd932d268bf290f47f3fb467 0 1565144675000 25 connected
a830ae87669d8314223184c66a46f5395c21ce12 192.168.0.120:8004@18004 master,fail? - 1565144621348 1565144619431 29 disconnected 10923-16383
192.168.0.120:8005> get name
(error) CLUSTERDOWN The cluster is down
此时可以看到集群fail。将节点重新启动后,集群恢复正常。下面从日志的角度来分析集群fail认定过程:
#kill8000、8003主从节点日志内容
108095:S 07 Aug 2019 09:23:58.066 * Marking node 255e7458f32bc45601d8797676e27b647362a74b as failing (quorum reached).
#标记节点fail
108095:S 07 Aug 2019 09:23:58.066 # Cluster state changed: fail
#集群fail
108095:S 07 Aug 2019 09:24:17.325 * Connecting to MASTER 192.168.0.119:8003
108095:S 07 Aug 2019 09:24:17.325 * MASTER <-> REPLICA sync started
108095:S 07 Aug 2019 09:24:17.326 # Error condition on socket for SYNC: Connection refused
#连接8003拒绝
108095:S 07 Aug 2019 09:24:18.133 # Currently unable to failover: Disconnected from master for longer than allowed. Please check the 'cluster-replica-validity-factor' configuration option.
#无法故障转移
#kill8002、8003节点日志内容
25810:S 07 Aug 2019 09:12:58.569 * FAIL message received from 526c9a107b5df51fdd932d268bf290f47f3fb467 about 255e7458f32bc45601d8797676e27b647362a74b
25810:S 07 Aug 2019 09:12:58.569 * FAIL message received from 526c9a107b5df51fdd932d268bf290f47f3fb467 about ff42d2fc4c9d607e643169e8749976371a56cad0
25810:S 07 Aug 2019 09:12:58.569 # Cluster state changed: fail
25810:S 07 Aug 2019 09:23:42.034 * Clear FAIL state for node ff42d2fc4c9d607e643169e8749976371a56cad0: replica is reachable again.
25810:S 07 Aug 2019 09:24:39.711 * Clear FAIL state for node 255e7458f32bc45601d8797676e27b647362a74b: is reachable again and nobody is serving its slots after some time.
25810:S 07 Aug 2019 09:24:39.711 # Cluster state changed: ok
#总体来说,就是接收到8002、8003节点的故障信息、标记集群故障、清除故障信息,集群OK。
三、主节点保护验证
192.168.0.119:8003> cluster nodes
a830ae87669d8314223184c66a46f5395c21ce12 192.168.0.120:8004@18004 master - 0 1565090553000 22 connected 10923-16383
635aa426f39865951de9345e4bbfa1423c334e33 192.168.0.118:8001@18001 slave a830ae87669d8314223184c66a46f5395c21ce12 0 1565090552000 22 connected
255e7458f32bc45601d8797676e27b647362a74b 192.168.0.119:8003@18003 myself,master - 0 1565090551000 21 connected 0-5460
526c9a107b5df51fdd932d268bf290f47f3fb467 192.168.0.120:8005@18005 slave a6e1cb7ba768f86cb5269b055db9ad307f8299ae 0 1565090553117 13 connected
a6e1cb7ba768f86cb5269b055db9ad307f8299ae 192.168.0.119:8002@18002 master - 0 1565090552109 13 connected 5461-10922
ff42d2fc4c9d607e643169e8749976371a56cad0 192.168.0.118:8000@18000 slave 255e7458f32bc45601d8797676e27b647362a74b 0 1565090554124 21 connected
4446675c36711658dcf54d64ae7b72977e06e631 192.168.0.120:8006@18006 slave a830ae87669d8314223184c66a46f5395c21ce12 0 1565090551000 22 connected
可以看到8004有两个从节点,其余主节点均为单从节点。现在我们kill掉8005节点,此时8002节点为一个单主节点。
[root@zj3 redis2]# ps -ef | grep redis
root 94103 1 0 15:58 ? 00:00:16 ./redis-server 192.168.0.120:8005 [cluster]
root 94265 1 0 16:07 ? 00:00:15 ./redis-server 192.168.0.120:8006 [cluster]
root 95036 1 0 17:03 ? 00:00:10 ./redis-server 192.168.0.120:8004 [cluster]
root 96591 94830 0 19:22 pts/2 00:00:00 grep --color=auto redis
[root@zj3 redis2]# kill -9 94103
[root@zj3 redis2]# ps -ef | grep redis
root 94265 1 0 16:07 ? 00:00:15 ./redis-server 192.168.0.120:8006 [cluster]
root 95036 1 0 17:03 ? 00:00:10 ./redis-server 192.168.0.120:8004 [cluster]
root 96593 94830 0 19:23 pts/2 00:00:00 grep --color=auto redis
192.168.0.119:8003> cluster nodes
a830ae87669d8314223184c66a46f5395c21ce12 192.168.0.120:8004@18004 master - 0 1565093701000 22 connected 10923-16383
635aa426f39865951de9345e4bbfa1423c334e33 192.168.0.118:8001@18001 slave a830ae87669d8314223184c66a46f5395c21ce12 0 1565093704000 22 connected
255e7458f32bc45601d8797676e27b647362a74b 192.168.0.119:8003@18003 myself,master - 0 1565093701000 21 connected 0-5460
526c9a107b5df51fdd932d268bf290f47f3fb467 192.168.0.120:8005@18005 slave,fail a6e1cb7ba768f86cb5269b055db9ad307f8299ae 1565090585158 1565090582000 13 disconnected
a6e1cb7ba768f86cb5269b055db9ad307f8299ae 192.168.0.119:8002@18002 master - 0 1565093704823 13 connected 5461-10922
ff42d2fc4c9d607e643169e8749976371a56cad0 192.168.0.118:8000@18000 slave 255e7458f32bc45601d8797676e27b647362a74b 0 1565093704000 21 connected
4446675c36711658dcf54d64ae7b72977e06e631 192.168.0.120:8006@18006 slave a6e1cb7ba768f86cb5269b055db9ad307f8299ae 0 1565093702755 22 connected
可以看到此时8006节点从8004节点脱离,成为8002节点的从节点。
[root@zj3 bin]# ./redis-server redis.conf
[root@zj3 bin]# ps -ef | grep redis
root 94265 1 0 16:07 ? 00:00:35 ./redis-server 192.168.0.120:8006 [cluster]
root 95036 1 0 17:03 ? 00:00:30 ./redis-server 192.168.0.120:8004 [cluster]
root 97114 1 0 20:17 ? 00:00:00 ./redis-server 192.168.0.120:8005 [cluster]
root 97128 94830 0 20:18 pts/2 00:00:00 grep --color=auto redis
192.168.0.119:8003> cluster nodes
a830ae87669d8314223184c66a46f5395c21ce12 192.168.0.120:8004@18004 master - 0 1565093872696 22 connected 10923-16383
635aa426f39865951de9345e4bbfa1423c334e33 192.168.0.118:8001@18001 slave a830ae87669d8314223184c66a46f5395c21ce12 0 1565093870000 22 connected
255e7458f32bc45601d8797676e27b647362a74b 192.168.0.119:8003@18003 myself,master - 0 1565093872000 21 connected 0-5460
526c9a107b5df51fdd932d268bf290f47f3fb467 192.168.0.120:8005@18005 slave a6e1cb7ba768f86cb5269b055db9ad307f8299ae 0 1565093871044 13 connected
a6e1cb7ba768f86cb5269b055db9ad307f8299ae 192.168.0.119:8002@18002 master - 0 1565093871000 13 connected 5461-10922
ff42d2fc4c9d607e643169e8749976371a56cad0 192.168.0.118:8000@18000 slave 255e7458f32bc45601d8797676e27b647362a74b 0 1565093873720 21 connected
4446675c36711658dcf54d64ae7b72977e06e631 192.168.0.120:8006@18006 slave a6e1cb7ba768f86cb5269b055db9ad307f8299ae 0 1565093871666 22 connected
将8005节点重新启动后,会重新成为8002的从节点。
下面从日志的角度来分析R主节点保护的流程:
#8006节点信息
94265:S 06 Aug 2019 19:23:20.347 * FAIL message received from 255e7458f32bc45601d8797676e27b647362a74b about 526c9a107b5df51fdd932d268bf290f47f3fb467
#接收来自8003节点关于8005的故障信息
94265:S 06 Aug 2019 19:23:25.460 # Migrating to orphaned master a6e1cb7ba768f86cb5269b055db9ad307f8299ae
#迁移到单主节点8002
94265:S 06 Aug 2019 19:23:25.460 # Connection with master lost.
#断开主节点的连接
94265:S 06 Aug 2019 19:23:25.460 * Caching the disconnected master state.
#断开主节点连接缓存
94265:S 06 Aug 2019 19:23:25.661 * Connecting to MASTER 192.168.0.119:8002
#连接到8002节点
94265:S 06 Aug 2019 19:23:25.662 * MASTER <-> REPLICA sync started
94265:S 06 Aug 2019 19:23:25.662 * Non blocking connect for SYNC fired the event.
94265:S 06 Aug 2019 19:23:25.663 * Master replied to PING, replication can continue...
94265:S 06 Aug 2019 19:23:25.664 * Trying a partial resynchronization (request 8ff9afd63a115c2539002134ede6cfaf3e250840:17025).
94265:S 06 Aug 2019 19:23:25.666 * Full resync from master: fa209a113f8e60cf96b522d26b138795b2905351:17137
#同步数据内容
94265:S 06 Aug 2019 19:23:25.666 * Discarding previously cached master state.
#丢弃以前缓存的主状态
94265:S 06 Aug 2019 19:23:25.693 * MASTER <-> REPLICA sync: receiving 189 bytes from master
94265:S 06 Aug 2019 19:23:25.693 * MASTER <-> REPLICA sync: Flushing old data
94265:S 06 Aug 2019 19:23:25.693 * MASTER <-> REPLICA sync: Loading DB in memory
94265:S 06 Aug 2019 19:23:25.693 * MASTER <-> REPLICA sync: Finished with success
#同步成功
94265:S 06 Aug 2019 19:23:25.694 * Background append only file rewriting started by pid 96594
94265:S 06 Aug 2019 19:23:25.730 * AOF rewrite child asks to stop sending diffs.
96594:C 06 Aug 2019 19:23:25.730 * Parent agreed to stop sending diffs. Finalizing AOF...
96594:C 06 Aug 2019 19:23:25.730 * Concatenating 0.00 MB of AOF diff received from parent.
96594:C 06 Aug 2019 19:23:25.730 * SYNC append only file rewrite performed
96594:C 06 Aug 2019 19:23:25.730 * AOF rewrite: 2 MB of memory used by copy-on-write
94265:S 06 Aug 2019 19:23:25.762 * Background AOF rewrite terminated with success
94265:S 06 Aug 2019 19:23:25.762 * Residual parent diff successfully flushed to the rewritten AOF (0.00 MB)
94265:S 06 Aug 2019 19:23:25.762 * Background AOF rewrite finished successfully
#重写成功
94265:S 06 Aug 2019 20:17:51.076 * Clear FAIL state for node 526c9a107b5df51fdd932d268bf290f47f3fb467: replica is reachable again.
#清除8005节点的故障信息
#8005节点日志信息
97114:M 06 Aug 2019 20:17:50.973 * Reading RDB preamble from AOF file...
97114:M 06 Aug 2019 20:17:50.973 * Reading the remaining AOF tail...
97114:M 06 Aug 2019 20:17:50.973 * DB loaded from append only file: 0.000 seconds
#读取AOF文件信息
97114:M 06 Aug 2019 20:17:50.973 * Ready to accept connections
#准备连接
97114:S 06 Aug 2019 20:17:50.974 * Before turning into a replica, using my master parameters to synthesize a cached master: I may be able to synchronize with the new master with just a partial transfer.
#在转换为从节点之前,使用主参数来综合缓存的主服务器:可能只需要部分传输就可以与新主服务器同步。
97114:S 06 Aug 2019 20:17:50.974 # Cluster state changed: ok
#集群ok
97114:S 06 Aug 2019 20:17:52.001 * Connecting to MASTER 192.168.0.119:8002
#连接到8002主节点
97114:S 06 Aug 2019 20:17:52.002 * MASTER <-> REPLICA sync started
97114:S 06 Aug 2019 20:17:52.004 * Non blocking connect for SYNC fired the event.
97114:S 06 Aug 2019 20:17:52.006 * Master replied to PING, replication can continue...
97114:S 06 Aug 2019 20:17:52.008 * Trying a partial resynchronization (request f8c47806fcdb3f8191986cd08f5dbbc71e8cdca8:1).
97114:S 06 Aug 2019 20:17:52.017 * Full resync from master: fa209a113f8e60cf96b522d26b138795b2905351:21603
#数据同步信息
97114:S 06 Aug 2019 20:17:52.017 * Discarding previously cached master state.
#丢弃以前缓存的主状态
97114:S 06 Aug 2019 20:17:52.055 * MASTER <-> REPLICA sync: receiving 189 bytes from master
97114:S 06 Aug 2019 20:17:52.055 * MASTER <-> REPLICA sync: Flushing old data
97114:S 06 Aug 2019 20:17:52.056 * MASTER <-> REPLICA sync: Loading DB in memory
97114:S 06 Aug 2019 20:17:52.056 * MASTER <-> REPLICA sync: Finished with success
97114:S 06 Aug 2019 20:17:52.058 * Background append only file rewriting started by pid 97118
#AOF重写
97114:S 06 Aug 2019 20:17:52.157 * AOF rewrite child asks to stop sending diffs.
97118:C 06 Aug 2019 20:17:52.157 * Parent agreed to stop sending diffs. Finalizing AOF...
97118:C 06 Aug 2019 20:17:52.157 * Concatenating 0.00 MB of AOF diff received from parent.
97118:C 06 Aug 2019 20:17:52.158 * SYNC append only file rewrite performed
97118:C 06 Aug 2019 20:17:52.159 * AOF rewrite: 4 MB of memory used by copy-on-write
97114:S 06 Aug 2019 20:17:52.206 * Background AOF rewrite terminated with success
97114:S 06 Aug 2019 20:17:52.206 * Residual parent diff successfully flushed to the rewritten AOF (0.00 MB)
97114:S 06 Aug 2019 20:17:52.206 * Background AOF rewrite finished successfully
#重写成功
#8004节点日志信息
95036:M 06 Aug 2019 19:23:20.347 * Marking node 526c9a107b5df51fdd932d268bf290f47f3fb467 as failing (quorum reached).
#标记8005节点fail
95036:M 06 Aug 2019 19:23:25.460 # Connection with replica 192.168.0.120:8006 lost.
#与从节点8006连接丢失
95036:M 06 Aug 2019 20:17:51.074 * Clear FAIL state for node 526c9a107b5df51fdd932d268bf290f47f3fb467: replica is reachable again.
#清除8005fail信息
#8002节点日志信息
25810:M 06 Aug 2019 19:23:05.112 # Connection with replica 192.168.0.120:8005 lost.
#失去8005节点信息
25810:M 06 Aug 2019 19:23:20.343 * FAIL message received from 255e7458f32bc45601d8797676e27b647362a74b about 526c9a107b5df51fdd932d268bf290f47f3fb467
#收到来自8003节点发送关于8005节点的信息
25810:M 06 Aug 2019 19:23:25.661 * Replica 192.168.0.120:8006 asks for synchronization
#8006要求同步信息
25810:M 06 Aug 2019 19:23:25.661 * Partial resynchronization not accepted: Replication ID mismatch (Replica asked for '8ff9afd63a115c2539002134ede6cfaf3e250840', my replication IDs are 'fa209a113f8e60cf96b522d26b138795b2905351' and '0000000000000000000000000000000000000000')
#不接受部分重同步,复制ID不匹配
25810:M 06 Aug 2019 19:23:25.661 * Starting BGSAVE for SYNC with target: disk
25810:M 06 Aug 2019 19:23:25.662 * Background saving started by pid 28058
28058:C 06 Aug 2019 19:23:25.664 * DB saved on disk
28058:C 06 Aug 2019 19:23:25.665 * RDB: 6 MB of memory used by copy-on-write
25810:M 06 Aug 2019 19:23:25.689 * Background saving terminated with success
25810:M 06 Aug 2019 19:23:25.690 * Synchronization with replica 192.168.0.120:8006 succeeded
#同步成功
25810:M 06 Aug 2019 20:17:51.027 * Clear FAIL state for node 526c9a107b5df51fdd932d268bf290f47f3fb467: replica is reachable again.
#清除8005节点fail信息
25810:M 06 Aug 2019 20:17:52.002 * Replica 192.168.0.120:8005 asks for synchronization
25810:M 06 Aug 2019 20:17:52.003 * Partial resynchronization not accepted: Replication ID mismatch (Replica asked for 'f8c47806fcdb3f8191986cd08f5dbbc71e8cdca8', my replication IDs are 'fa209a113f8e60cf96b522d26b138795b2905351' and '0000000000000000000000000000000000000000')
25810:M 06 Aug 2019 20:17:52.003 * Starting BGSAVE for SYNC with target: disk
25810:M 06 Aug 2019 20:17:52.009 * Background saving started by pid 28571
28571:C 06 Aug 2019 20:17:52.014 * DB saved on disk
28571:C 06 Aug 2019 20:17:52.015 * RDB: 4 MB of memory used by copy-on-write
25810:M 06 Aug 2019 20:17:52.047 * Background saving terminated with success
25810:M 06 Aug 2019 20:17:52.048 * Synchronization with replica 192.168.0.120:8005 succeeded
#同步成功
#8003节点日志信息
25827:M 06 Aug 2019 19:23:20.342 * Marking node 526c9a107b5df51fdd932d268bf290f47f3fb467 as failing (quorum reached).
#标记8005节点fail
25827:M 06 Aug 2019 20:17:51.045 * Clear FAIL state for node 526c9a107b5df51fdd932d268bf290f47f3fb467: replica is reachable again.
#清除8005节点故障信息
#8001节点日志信息
102731:S 06 Aug 2019 19:23:20.345 * FAIL message received from 255e7458f32bc45601d8797676e27b647362a74b about 526c9a107b5df51fdd932d268bf290f47f3fb467
#接受8003发来的8005节点的故障信息
102731:S 06 Aug 2019 20:17:51.072 * Clear FAIL state for node 526c9a107b5df51fdd932d268bf290f47f3fb467: replica is reachable again.
#清除8005节点故障信息
四、总结
4.1 主节点保护
集群中某节点中的所有从实例宕机时,Redis Cluster会将其他节点的非唯一从实例进行副本迁移,成为此节点的从实例。
集群中每个主节点至少有一个slave,使得Cluster 具有高可用。集群中只需要保持 2*master+1 个节点,就可以保持任一节点宕机时,故障转移后继续高可用。
4.2 集群fail条件
Redis Cluster保证基本可用的特性,在达到一定条件时才会认定为fail:
1、某个主节点和所有从节点全部挂掉,则集群进入fail状态。
2、如果集群超过半数以上主节点挂掉,无论是否有从节点,集群进入fail状态。
3、如果集群任意主节点挂掉,且当前主节点没有从节点,集群进入fail状态。
4.3 集群故障转移
当Redis Cluster某直接点宕机后,其从节点会发起竞选成为新主,原主库重新加入后会变为从库。
本文对Redis Cluster故障转移、集群fail认定、主节点保护特性进行验证并对日志文件进行分析,表明了故障后的内部过程。