【Redis核心知识八】Redis集群之Cluster模式及集群搭建（二）-阿里云开发者社区

启动集群，使用命令redis-trib.rb来启动集群

./redis-trib.rb create --replicas 1  192.168.5.101:6379 192.168.5.102:6379 192.168.5.103:6379 192.168.5.102:6380 192.168.5.103:6380 192.168.5.101:6380

结果启动时报如下内容：

[root@192 src]# ./redis-trib.rb create --replicas 1  192.168.5.101:6379 192.168.5.102:6379 192.168.5.103:6379 192.168.5.102:6380 192.168.5.103:6380 192.168.5.101:6380
WARNING: redis-trib.rb is not longer available!
You should use redis-cli instead.
All commands and features belonging to redis-trib.rb have been moved
to redis-cli.
In order to use them you should call redis-cli with the --cluster
option followed by the subcommand name, arguments and options.
Use the following syntax:
redis-cli --cluster SUBCOMMAND [ARGUMENTS] [OPTIONS]
Example:
redis-cli --cluster create 192.168.5.101:6379 192.168.5.102:6379 192.168.5.103:6379 192.168.5.102:6380 192.168.5.103:6380 192.168.5.101:6380 --cluster-replicas 1
To get help about all subcommands, type:
redis-cli --cluster help

启动集群并查看启动日志

实践出真知啊，Redis新版本集群启用已经不再依赖于Ruby了，直接用Redis-cli即可：

[root@192 src]# redis-cli --cluster create 192.168.5.101:6379 192.168.5.102:6379 192.168.5.103:6379 192.168.5.102:6380 192.168.5.103:6380 192.168.5.101:6380 --cluster-replicas 1
>>> Performing hash slots allocation on 6 nodes...
Master[0] -> Slots 0 - 5460
Master[1] -> Slots 5461 - 10922
Master[2] -> Slots 10923 - 16383
Adding replica 192.168.5.102:6380 to 192.168.5.101:6379
Adding replica 192.168.5.103:6380 to 192.168.5.102:6379
Adding replica 192.168.5.101:6380 to 192.168.5.103:6379
M: 2eab309dd5f41f317bd1c2b0c8616aee7e4ac05b 192.168.5.101:6379
   slots:[0-5460] (5461 slots) master
M: 94d9e1637bd1701b146e367ffa7a69e8c24566e8 192.168.5.102:6379
   slots:[5461-10922] (5462 slots) master
M: 6c1005a89742e50db240775204c03ab3d7558e59 192.168.5.103:6379
   slots:[10923-16383] (5461 slots) master
S: 13ab2f0291cea595f51d6efac60c3e62278e64cb 192.168.5.102:6380
   replicates 2eab309dd5f41f317bd1c2b0c8616aee7e4ac05b
S: 46be63362f47ece342c54f4e042ef09d3ca0ec1b 192.168.5.103:6380
   replicates 94d9e1637bd1701b146e367ffa7a69e8c24566e8
S: bea352a33ee43ca0f6e5238a8170a01820af7f93 192.168.5.101:6380
   replicates 6c1005a89742e50db240775204c03ab3d7558e59
Can I set the above configuration? (type 'yes' to accept): yes
>>> Nodes configuration updated
>>> Assign a different config epoch to each node
>>> Sending CLUSTER MEET messages to join the cluster
Waiting for the cluster to join
.
>>> Performing Cluster Check (using node 192.168.5.101:6379)
M: 2eab309dd5f41f317bd1c2b0c8616aee7e4ac05b 192.168.5.101:6379
   slots:[0-5460] (5461 slots) master
   1 additional replica(s)
S: bea352a33ee43ca0f6e5238a8170a01820af7f93 192.168.5.101:6380
   slots: (0 slots) slave
   replicates 6c1005a89742e50db240775204c03ab3d7558e59
S: 46be63362f47ece342c54f4e042ef09d3ca0ec1b 192.168.5.103:6380
   slots: (0 slots) slave
   replicates 94d9e1637bd1701b146e367ffa7a69e8c24566e8
M: 94d9e1637bd1701b146e367ffa7a69e8c24566e8 192.168.5.102:6379
   slots:[5461-10922] (5462 slots) master
   1 additional replica(s)
S: 13ab2f0291cea595f51d6efac60c3e62278e64cb 192.168.5.102:6380
   slots: (0 slots) slave
   replicates 2eab309dd5f41f317bd1c2b0c8616aee7e4ac05b
M: 6c1005a89742e50db240775204c03ab3d7558e59 192.168.5.103:6379
   slots:[10923-16383] (5461 slots) master
   1 additional replica(s)
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
[root@192 src]#

可以看到各个的配置文件如下：

101机器上6379的node配置文件及服务器活动日志

bea352a33ee43ca0f6e5238a8170a01820af7f93 192.168.5.101:6380@16380 slave 6c1005a89742e50db240775204c03ab3d7558e59 0 1605332446460 3 connected
46be63362f47ece342c54f4e042ef09d3ca0ec1b 192.168.5.103:6380@16380 slave 94d9e1637bd1701b146e367ffa7a69e8c24566e8 0 1605332448477 2 connected
94d9e1637bd1701b146e367ffa7a69e8c24566e8 192.168.5.102:6379@16379 master - 0 1605332448000 2 connected 5461-10922
13ab2f0291cea595f51d6efac60c3e62278e64cb 192.168.5.102:6380@16380 slave 2eab309dd5f41f317bd1c2b0c8616aee7e4ac05b 0 1605332446000 1 connected
6c1005a89742e50db240775204c03ab3d7558e59 192.168.5.103:6379@16379 master - 0 1605332447469 3 connected 10923-16383
2eab309dd5f41f317bd1c2b0c8616aee7e4ac05b 192.168.5.101:6379@16379 myself,master - 0 1605332447000 1 connected 0-5460
vars currentEpoch 6 lastVoteEpoch 0

同时可以看下该主节点在集群启动过程中的活动日志：

70362:M 14 Nov 2020 13:40:43.171 # configEpoch set to 1 via CLUSTER SET-CONFIG-EPOCH
70362:M 14 Nov 2020 13:40:43.211 # IP address for this node updated to 192.168.5.101
70362:M 14 Nov 2020 13:40:46.150 * Replica 192.168.5.102:6380 asks for synchronization
70362:M 14 Nov 2020 13:40:46.150 * Partial resynchronization not accepted: Replication ID mismatch (Replica asked for '5c437fa82b0ca0caabfa2f0a17d15d9fc8f2548f', my replication IDs are 'd162ace4d79189e94f2a1f202c6fec4230a4189d' and '0000000000000000000000000000000000000000')
70362:M 14 Nov 2020 13:40:46.150 * Replication backlog created, my new replication IDs are '64bd8964903df1d43df728eed12e7c00956674f3' and '0000000000000000000000000000000000000000'
70362:M 14 Nov 2020 13:40:46.150 * Starting BGSAVE for SYNC with target: disk
70362:M 14 Nov 2020 13:40:46.151 * Background saving started by pid 85385
85385:C 14 Nov 2020 13:40:46.165 * DB saved on disk
85385:C 14 Nov 2020 13:40:46.167 * RDB: 0 MB of memory used by copy-on-write
70362:M 14 Nov 2020 13:40:46.258 * Background saving terminated with success
70362:M 14 Nov 2020 13:40:46.258 * Synchronization with replica 192.168.5.102:6380 succeeded
70362:M 14 Nov 2020 13:40:48.174 # Cluster state changed: ok

101机器上6380的node配置文件及服务器活动日志

6c1005a89742e50db240775204c03ab3d7558e59 192.168.5.103:6379@16379 master - 0 1605332445000 3 connected 10923-16383
bea352a33ee43ca0f6e5238a8170a01820af7f93 192.168.5.101:6380@16380 myself,slave 6c1005a89742e50db240775204c03ab3d7558e59 0 1605332446000 3 connected
46be63362f47ece342c54f4e042ef09d3ca0ec1b 192.168.5.103:6380@16380 slave 94d9e1637bd1701b146e367ffa7a69e8c24566e8 0 1605332445452 2 connected
13ab2f0291cea595f51d6efac60c3e62278e64cb 192.168.5.102:6380@16380 slave 2eab309dd5f41f317bd1c2b0c8616aee7e4ac05b 0 1605332446000 1 connected
94d9e1637bd1701b146e367ffa7a69e8c24566e8 192.168.5.102:6379@16379 master - 0 1605332447480 2 connected 5461-10922
2eab309dd5f41f317bd1c2b0c8616aee7e4ac05b 192.168.5.101:6379@16379 master - 0 1605332446468 1 connected 0-5460
vars currentEpoch 6 lastVoteEpoch 0

同时可以看下该从节点在集群启动过程中的活动日志：

70795:M 14 Nov 2020 13:40:43.173 # configEpoch set to 6 via CLUSTER SET-CONFIG-EPOCH
70795:M 14 Nov 2020 13:40:43.334 # IP address for this node updated to 192.168.5.101
70795:S 14 Nov 2020 13:40:45.182 * Before turning into a replica, using my own master parameters to synthesize a cached master: I may be able to synchronize with the new master with just a partial transfer.
70795:S 14 Nov 2020 13:40:45.182 # Cluster state changed: ok
70795:S 14 Nov 2020 13:40:45.551 * Connecting to MASTER 192.168.5.103:6379
70795:S 14 Nov 2020 13:40:45.551 * MASTER <-> REPLICA sync started
70795:S 14 Nov 2020 13:40:45.551 * Non blocking connect for SYNC fired the event.
70795:S 14 Nov 2020 13:40:45.552 * Master replied to PING, replication can continue...
70795:S 14 Nov 2020 13:40:45.552 * Trying a partial resynchronization (request de83e39b5538aec381a9b7cb415f0afd9e9fb69e:1).
70795:S 14 Nov 2020 13:40:45.553 * Full resync from master: 36c2011a9e000813ddbe35ea240fd347ad351a04:0
70795:S 14 Nov 2020 13:40:45.553 * Discarding previously cached master state.
70795:S 14 Nov 2020 13:40:45.634 * MASTER <-> REPLICA sync: receiving 175 bytes from master to disk
70795:S 14 Nov 2020 13:40:45.634 * MASTER <-> REPLICA sync: Flushing old data
70795:S 14 Nov 2020 13:40:45.659 * MASTER <-> REPLICA sync: Loading DB in memory
70795:S 14 Nov 2020 13:40:45.660 * Loading RDB produced by version 6.0.8
70795:S 14 Nov 2020 13:40:45.660 * RDB age 0 seconds
70795:S 14 Nov 2020 13:40:45.660 * RDB memory usage when created 2.46 Mb
70795:S 14 Nov 2020 13:40:45.660 * MASTER <-> REPLICA sync: Finished with success
70795:S 14 Nov 2020 13:40:45.661 * Background append only file rewriting started by pid 85384
70795:S 14 Nov 2020 13:40:45.700 * AOF rewrite child asks to stop sending diffs.
85384:C 14 Nov 2020 13:40:45.700 * Parent agreed to stop sending diffs. Finalizing AOF...
85384:C 14 Nov 2020 13:40:45.700 * Concatenating 0.00 MB of AOF diff received from parent.
85384:C 14 Nov 2020 13:40:45.700 * SYNC append only file rewrite performed
85384:C 14 Nov 2020 13:40:45.701 * AOF rewrite: 0 MB of memory used by copy-on-write
70795:S 14 Nov 2020 13:40:45.762 * Background AOF rewrite terminated with success
70795:S 14 Nov 2020 13:40:45.762 * Residual parent diff successfully flushed to the rewritten AOF (0.00 MB)
70795:S 14 Nov 2020 13:40:45.762 * Background AOF rewrite finished successfully

操作集群存取数据

我们在服务器的输出日志上可以看到多个插槽的输出，那么接下来看看数据是如何存取的：

[root@192 redis-6.0.8]# redis-cli -h 192.168.5.101
192.168.5.101:6379> set love guochengyu
(error) MOVED 16198 192.168.5.103:6379

可以看的出存储到了插槽16198，被路由到了master3上了，所以需要使用重定向的方式插入和获取，只需要直接在客户端命令后边加个-c即可：

[root@192 redis-6.0.8]#  redis-cli -h 192.168.5.101 -c
192.168.5.101:6379> set love guochenyubaobei
-> Redirected to slot [16198] located at 192.168.5.103:6379
OK
192.168.5.103:6379> get love
"guochenyubaobei"
192.168.5.103:6379>

集群的主从切换

主从切换分为两种，一种是主服务器下线，一种是从服务器下线。

小小的slave丢了无所谓

一个从节点下线，可以从主的日志中看到，10秒连接不上，就下线，下线后再上线主向从节点复制数据。其它主从会记录整个集群状态：

关闭Redis的101master对应的从102slave

20189:signal-handler (1605334497) Received SIGINT scheduling shutdown...
20189:S 14 Nov 2020 14:14:57.915 # User requested shutdown...
20189:S 14 Nov 2020 14:14:57.915 * Calling fsync() on the AOF file.
20189:S 14 Nov 2020 14:14:57.915 * Saving the final RDB snapshot before exiting.
20189:S 14 Nov 2020 14:14:57.916 * DB saved on disk
20189:S 14 Nov 2020 14:14:57.916 # Redis is now ready to exit, bye bye...
[root@192 config]#

其对应的101master节点lost了自己的从节点：

70362:M 14 Nov 2020 14:14:57.919 # Connection with replica 192.168.5.102:6380 lost.
70362:M 14 Nov 2020 14:15:10.105 * FAIL message received from 6c1005a89742e50db240775204c03ab3d7558e59 about 13ab2f0291cea595f51d6efac60c3e62278e64cb

而这个消息是101master节点从103master节点那儿听说的，所以103主节点上第一次标记了信息：

9496:M 14 Nov 2020 14:15:10.099 * Marking node 13ab2f0291cea595f51d6efac60c3e62278e64cb as failing (quorum reached).

其它几个节点也收到了同样的通知。

重启Redis的101master对应的从102slave

当把该102从节点【slave】上线的时候，所有节点收到通知取消了标记：

9496:M 14 Nov 2020 14:23:29.425 * Clear FAIL state for node 13ab2f0291cea595f51d6efac60c3e62278e64cb: replica is reachable again.

同时102从节点【slave】对应的master还负责一些信息的主从同步：

70362:M 14 Nov 2020 14:23:29.463 * Clear FAIL state for node 13ab2f0291cea595f51d6efac60c3e62278e64cb: replica is reachable again.
70362:M 14 Nov 2020 14:23:30.381 * Replica 192.168.5.102:6380 asks for synchronization
70362:M 14 Nov 2020 14:23:30.381 * Partial resynchronization not accepted: Replication ID mismatch (Replica asked for 'eeea1b2c358998aacc3dcf525484245183fcb9ab', my replication IDs are '64bd8964903df1d43df728eed12e7c00956674f3' and '0000000000000000000000000000000000000000')
70362:M 14 Nov 2020 14:23:30.381 * Starting BGSAVE for SYNC with target: disk
70362:M 14 Nov 2020 14:23:30.382 * Background saving started by pid 86035
86035:C 14 Nov 2020 14:23:30.386 * DB saved on disk
86035:C 14 Nov 2020 14:23:30.388 * RDB: 0 MB of memory used by copy-on-write
70362:M 14 Nov 2020 14:23:30.470 * Background saving terminated with success
70362:M 14 Nov 2020 14:23:30.470 * Synchronization with replica 192.168.5.102:6380 succeeded

然后102从节点【slave】自然而然的需要从主同步数据。

21392:M 14 Nov 2020 14:23:29.366 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
21392:M 14 Nov 2020 14:23:29.366 # Server initialized
21392:M 14 Nov 2020 14:23:29.366 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
21392:M 14 Nov 2020 14:23:29.366 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo madvise > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled (set to 'madvise' or 'never').
21392:M 14 Nov 2020 14:23:29.367 * Reading RDB preamble from AOF file...
21392:M 14 Nov 2020 14:23:29.367 * Loading RDB produced by version 6.0.8
21392:M 14 Nov 2020 14:23:29.367 * RDB age 2563 seconds
21392:M 14 Nov 2020 14:23:29.367 * RDB memory usage when created 2.43 Mb
21392:M 14 Nov 2020 14:23:29.367 * RDB has an AOF tail
21392:M 14 Nov 2020 14:23:29.367 * Reading the remaining AOF tail...
21392:M 14 Nov 2020 14:23:29.367 * DB loaded from append only file: 0.000 seconds
21392:M 14 Nov 2020 14:23:29.367 * Ready to accept connections
21392:S 14 Nov 2020 14:23:29.367 * Before turning into a replica, using my own master parameters to synthesize a cached master: I may be able to synchronize with the new master with just a partial transfer.
21392:S 14 Nov 2020 14:23:29.367 # Cluster state changed: ok
21392:S 14 Nov 2020 14:23:30.374 * Connecting to MASTER 192.168.5.101:6379
21392:S 14 Nov 2020 14:23:30.374 * MASTER <-> REPLICA sync started
21392:S 14 Nov 2020 14:23:30.375 * Non blocking connect for SYNC fired the event.
21392:S 14 Nov 2020 14:23:30.377 * Master replied to PING, replication can continue...
21392:S 14 Nov 2020 14:23:30.379 * Trying a partial resynchronization (request eeea1b2c358998aacc3dcf525484245183fcb9ab:1).
21392:S 14 Nov 2020 14:23:30.381 * Full resync from master: 64bd8964903df1d43df728eed12e7c00956674f3:2856
21392:S 14 Nov 2020 14:23:30.381 * Discarding previously cached master state.
21392:S 14 Nov 2020 14:23:30.469 * MASTER <-> REPLICA sync: receiving 176 bytes from master to disk
21392:S 14 Nov 2020 14:23:30.469 * MASTER <-> REPLICA sync: Flushing old data
21392:S 14 Nov 2020 14:23:30.469 * MASTER <-> REPLICA sync: Loading DB in memory
21392:S 14 Nov 2020 14:23:30.469 * Loading RDB produced by version 6.0.8
21392:S 14 Nov 2020 14:23:30.469 * RDB age 0 seconds
21392:S 14 Nov 2020 14:23:30.469 * RDB memory usage when created 2.45 Mb
21392:S 14 Nov 2020 14:23:30.469 * MASTER <-> REPLICA sync: Finished with success
21392:S 14 Nov 2020 14:23:30.469 * Background append only file rewriting started by pid 21396
21392:S 14 Nov 2020 14:23:30.519 * AOF rewrite child asks to stop sending diffs.
21396:C 14 Nov 2020 14:23:30.520 * Parent agreed to stop sending diffs. Finalizing AOF...
21396:C 14 Nov 2020 14:23:30.520 * Concatenating 0.00 MB of AOF diff received from parent.
21396:C 14 Nov 2020 14:23:30.520 * SYNC append only file rewrite performed
21396:C 14 Nov 2020 14:23:30.520 * AOF rewrite: 0 MB of memory used by copy-on-write
21392:S 14 Nov 2020 14:23:30.576 * Background AOF rewrite terminated with success
21392:S 14 Nov 2020 14:23:30.576 * Residual parent diff successfully flushed to the rewritten AOF (0.00 MB)
21392:S 14 Nov 2020 14:23:30.576 * Background AOF rewrite finished successfully

王朝更迭，master变slave

接下来进入年度大戏，王朝更迭，老master一段时间失联，被slave谋朝篡位，当它再回来的时候，只能屈居为一个小小的slave。一个主节点下线，从节点尝试重连，连到10秒【10次】，认为主节点失败，自己申请成为主节点，主重新连接后成为了slave，已经被改朝换代了。其它主从会记录整个集群状态。

关闭Redis的101master

关闭后自己的从节点102salve试图去按照配置连接主节点，但是连不上：

21392:S 14 Nov 2020 14:30:21.427 # Connection with master lost.
21392:S 14 Nov 2020 14:30:21.427 * Caching the disconnected master state.
21392:S 14 Nov 2020 14:30:21.500 * Connecting to MASTER 192.168.5.101:6379
21392:S 14 Nov 2020 14:30:21.500 * MASTER <-> REPLICA sync started
21392:S 14 Nov 2020 14:30:21.500 # Error condition on socket for SYNC: Operation now in progress
21392:S 14 Nov 2020 14:30:22.508 * Connecting to MASTER 192.168.5.101:6379
21392:S 14 Nov 2020 14:30:22.509 * MASTER <-> REPLICA sync started
21392:S 14 Nov 2020 14:30:22.509 # Error condition on socket for SYNC: Operation now in progress
21392:S 14 Nov 2020 14:30:23.517 * Connecting to MASTER 192.168.5.101:6379
21392:S 14 Nov 2020 14:30:23.517 * MASTER <-> REPLICA sync started
21392:S 14 Nov 2020 14:30:23.517 # Error condition on socket for SYNC: Operation now in progress
21392:S 14 Nov 2020 14:30:24.525 * Connecting to MASTER 192.168.5.101:6379
21392:S 14 Nov 2020 14:30:24.525 * MASTER <-> REPLICA sync started
21392:S 14 Nov 2020 14:30:24.526 # Error condition on socket for SYNC: Operation now in progress
21392:S 14 Nov 2020 14:30:25.530 * Connecting to MASTER 192.168.5.101:6379
21392:S 14 Nov 2020 14:30:25.530 * MASTER <-> REPLICA sync started
21392:S 14 Nov 2020 14:30:25.530 # Error condition on socket for SYNC: Operation now in progress
21392:S 14 Nov 2020 14:30:26.538 * Connecting to MASTER 192.168.5.101:6379
21392:S 14 Nov 2020 14:30:26.538 * MASTER <-> REPLICA sync started
21392:S 14 Nov 2020 14:30:26.538 # Error condition on socket for SYNC: Operation now in progress
21392:S 14 Nov 2020 14:30:27.546 * Connecting to MASTER 192.168.5.101:6379
21392:S 14 Nov 2020 14:30:27.546 * MASTER <-> REPLICA sync started
21392:S 14 Nov 2020 14:30:27.546 # Error condition on socket for SYNC: Operation now in progress
21392:S 14 Nov 2020 14:30:28.556 * Connecting to MASTER 192.168.5.101:6379
21392:S 14 Nov 2020 14:30:28.556 * MASTER <-> REPLICA sync started
21392:S 14 Nov 2020 14:30:28.556 # Error condition on socket for SYNC: Operation now in progress
21392:S 14 Nov 2020 14:30:29.566 * Connecting to MASTER 192.168.5.101:6379
21392:S 14 Nov 2020 14:30:29.566 * MASTER <-> REPLICA sync started
21392:S 14 Nov 2020 14:30:29.567 # Error condition on socket for SYNC: Operation now in progress
21392:S 14 Nov 2020 14:30:30.574 * Connecting to MASTER 192.168.5.101:6379
21392:S 14 Nov 2020 14:30:30.574 * MASTER <-> REPLICA sync started
21392:S 14 Nov 2020 14:30:30.575 # Error condition on socket for SYNC: Operation now in progress
21392:S 14 Nov 2020 14:30:31.580 * Connecting to MASTER 192.168.5.101:6379
21392:S 14 Nov 2020 14:30:31.580 * MASTER <-> REPLICA sync started
21392:S 14 Nov 2020 14:30:31.580 # Error condition on socket for SYNC: Operation now in progress
21392:S 14 Nov 2020 14:30:32.588 * Connecting to MASTER 192.168.5.101:6379
21392:S 14 Nov 2020 14:30:32.588 * MASTER <-> REPLICA sync started
21392:S 14 Nov 2020 14:30:32.588 # Error condition on socket for SYNC: Operation now in progress
21392:S 14 Nov 2020 14:30:33.596 * Connecting to MASTER 192.168.5.101:6379
21392:S 14 Nov 2020 14:30:33.596 * MASTER <-> REPLICA sync started
21392:S 14 Nov 2020 14:30:33.596 # Error condition on socket for SYNC: Operation now in progress
21392:S 14 Nov 2020 14:30:34.246 * FAIL message received from 6c1005a89742e50db240775204c03ab3d7558e59 about 2eab309dd5f41f317bd1c2b0c8616aee7e4ac05b
21392:S 14 Nov 2020 14:30:34.246 # Cluster state changed: fail
21392:S 14 Nov 2020 14:30:34.304 # Start of election delayed for 792 milliseconds (rank #0, offset 3430).
21392:S 14 Nov 2020 14:30:34.605 * Connecting to MASTER 192.168.5.101:6379
21392:S 14 Nov 2020 14:30:34.605 * MASTER <-> REPLICA sync started
21392:S 14 Nov 2020 14:30:34.606 # Error condition on socket for SYNC: Operation now in progress
21392:S 14 Nov 2020 14:30:35.108 # Starting a failover election for epoch 7.
21392:S 14 Nov 2020 14:30:35.110 # Failover election won: I'm the new master.
21392:S 14 Nov 2020 14:30:35.110 # configEpoch set to 7 after successful failover
21392:M 14 Nov 2020 14:30:35.110 * Discarding previously cached master state.
21392:M 14 Nov 2020 14:30:35.110 # Setting secondary replication ID to 64bd8964903df1d43df728eed12e7c00956674f3, valid up to offset: 3431. New replication ID is 3088efe0ca743a931b3863cbb0fd673811c31b7a
21392:M 14 Nov 2020 14:30:35.110 # Cluster state changed: ok

于是谋朝篡位将自己设置为主，同时其他节点也都收到了通知。我们查看集群信息如下：

[root@192 redis-6.0.8]#  redis-cli -h 192.168.5.102 -c                    
192.168.5.102:6379> cluster nodes
13ab2f0291cea595f51d6efac60c3e62278e64cb 192.168.5.102:6380@16380 master - 0 1605335617797 7 connected 0-5460
46be63362f47ece342c54f4e042ef09d3ca0ec1b 192.168.5.103:6380@16380 slave 94d9e1637bd1701b146e367ffa7a69e8c24566e8 0 1605335615777 2 connected
2eab309dd5f41f317bd1c2b0c8616aee7e4ac05b 192.168.5.101:6379@16379 master,fail - 1605335422306 1605335419000 1 disconnected
6c1005a89742e50db240775204c03ab3d7558e59 192.168.5.103:6379@16379 master - 0 1605335614000 3 connected 10923-16383
bea352a33ee43ca0f6e5238a8170a01820af7f93 192.168.5.101:6380@16380 slave 6c1005a89742e50db240775204c03ab3d7558e59 0 1605335616788 3 connected
94d9e1637bd1701b146e367ffa7a69e8c24566e8 192.168.5.102:6379@16379 myself,master - 0 1605335615000 2 connected 5461-10922
192.168.5.102:6379>

可以看到一个失败的master和一个篡位成功的master

重启Redis的101master，奥不，现在是101slave，作为102master的奴隶

虽然重新启动了，但是只能作为一个slave存在了，所以启动后它需要从主同步数据

86216:M 14 Nov 2020 14:35:39.278 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
86216:M 14 Nov 2020 14:35:39.278 # Server initialized
86216:M 14 Nov 2020 14:35:39.278 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
86216:M 14 Nov 2020 14:35:39.278 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo madvise > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled (set to 'madvise' or 'never').
86216:M 14 Nov 2020 14:35:39.278 * Ready to accept connections
86216:M 14 Nov 2020 14:35:39.279 # Configuration change detected. Reconfiguring myself as a replica of 13ab2f0291cea595f51d6efac60c3e62278e64cb
86216:S 14 Nov 2020 14:35:39.279 * Before turning into a replica, using my own master parameters to synthesize a cached master: I may be able to synchronize with the new master with just a partial transfer.
86216:S 14 Nov 2020 14:35:39.279 # Cluster state changed: ok
86216:S 14 Nov 2020 14:35:40.285 * Connecting to MASTER 192.168.5.102:6380
86216:S 14 Nov 2020 14:35:40.285 * MASTER <-> REPLICA sync started
86216:S 14 Nov 2020 14:35:40.285 * Non blocking connect for SYNC fired the event.
86216:S 14 Nov 2020 14:35:40.286 * Master replied to PING, replication can continue...
86216:S 14 Nov 2020 14:35:40.287 * Trying a partial resynchronization (request 647cca874c4c83a50d4fe5f82690eb51df36aa6d:1).
86216:S 14 Nov 2020 14:35:40.287 * Full resync from master: 3088efe0ca743a931b3863cbb0fd673811c31b7a:3430
86216:S 14 Nov 2020 14:35:40.287 * Discarding previously cached master state.
86216:S 14 Nov 2020 14:35:40.307 * MASTER <-> REPLICA sync: receiving 176 bytes from master to disk
86216:S 14 Nov 2020 14:35:40.307 * MASTER <-> REPLICA sync: Flushing old data
86216:S 14 Nov 2020 14:35:40.309 * MASTER <-> REPLICA sync: Loading DB in memory
86216:S 14 Nov 2020 14:35:40.309 * Loading RDB produced by version 6.0.8
86216:S 14 Nov 2020 14:35:40.309 * RDB age 0 seconds
86216:S 14 Nov 2020 14:35:40.309 * RDB memory usage when created 2.45 Mb
86216:S 14 Nov 2020 14:35:40.309 * MASTER <-> REPLICA sync: Finished with success
86216:S 14 Nov 2020 14:35:40.309 * Background append only file rewriting started by pid 86220
86216:S 14 Nov 2020 14:35:40.349 * AOF rewrite child asks to stop sending diffs.
86220:C 14 Nov 2020 14:35:40.349 * Parent agreed to stop sending diffs. Finalizing AOF...
86220:C 14 Nov 2020 14:35:40.349 * Concatenating 0.00 MB of AOF diff received from parent.
86220:C 14 Nov 2020 14:35:40.350 * SYNC append only file rewrite performed
86220:C 14 Nov 2020 14:35:40.350 * AOF rewrite: 0 MB of memory used by copy-on-write
86216:S 14 Nov 2020 14:35:40.387 * Background AOF rewrite terminated with success
86216:S 14 Nov 2020 14:35:40.387 * Residual parent diff successfully flushed to the rewritten AOF (0.00 MB)
86216:S 14 Nov 2020 14:35:40.387 * Background AOF rewrite finished successfully

再次查看集群状态：

192.168.5.102:6379> cluster nodes
13ab2f0291cea595f51d6efac60c3e62278e64cb 192.168.5.102:6380@16380 master - 0 1605335837000 7 connected 0-5460
46be63362f47ece342c54f4e042ef09d3ca0ec1b 192.168.5.103:6380@16380 slave 94d9e1637bd1701b146e367ffa7a69e8c24566e8 0 1605335837414 2 connected
2eab309dd5f41f317bd1c2b0c8616aee7e4ac05b 192.168.5.101:6379@16379 slave 13ab2f0291cea595f51d6efac60c3e62278e64cb 0 1605335837000 7 connected
6c1005a89742e50db240775204c03ab3d7558e59 192.168.5.103:6379@16379 master - 0 1605335837000 3 connected 10923-16383
bea352a33ee43ca0f6e5238a8170a01820af7f93 192.168.5.101:6380@16380 slave 6c1005a89742e50db240775204c03ab3d7558e59 0 1605335838422 3 connected
94d9e1637bd1701b146e367ffa7a69e8c24566e8 192.168.5.102:6379@16379 myself,master - 0 1605335836000 2 connected 5461-10922
192.168.5.102:6379>

【Redis核心知识八】Redis集群之Cluster模式及集群搭建（二）

启动集群并查看启动日志

操作集群存取数据

集群的主从切换

小小的slave丢了无所谓

王朝更迭，master变slave

热门文章

最新文章

相关课程

相关电子书

相关实验场景

【Redis核心知识 八】Redis集群之Cluster模式及集群搭建（二）

启动集群并查看启动日志

操作集群存取数据

集群的主从切换

小小的slave丢了无所谓

王朝更迭，master变slave

热门文章

最新文章

相关课程

相关电子书

相关实验场景

【Redis核心知识八】Redis集群之Cluster模式及集群搭建（二）