在主从复制架构这篇blog里聊到了哨兵模式,在复杂的多主复合架构里,哨兵是必不可少的,所以这篇blog来详细解读下什么是哨兵模式,以及哨兵模式是怎么运行的。
哨兵模式概述
了解哨兵之前先了解一个概念,主从复制里的主从切换。
主从切换技术
主从切换技术的方法是:当主服务器宕机后,需要手动把一台从服务器切换为主服务器,这就需要人工干预,费事费力,还会造成一段时间内服务不可用。这不是一种推荐的方式,更多时候,我们优先考虑哨兵模式。
哨兵模式
为了解决主从切换的各种问题,Redis使用了哨兵模式来进行处理。哨兵是一个分布式系统,用于对主从结构中的每台服务器进行监控,当master出现故障时通过选票机制选取新的master,并将每台slave连接到新的master上
哨兵的三个主要作用如下:
- 监控:通过发送命令,让Redis服务器返回监控其运行状态,包括主服务器和从服务器。master存活检测,master与slave运行检测
- 通知:当被监控的服务器发生故障时,向外发出通知【其它哨兵,客户端】
- 自动故障转移:当哨兵监测到master宕机,会断开master与slave的连接,选取一个新的master【自动将slave切换成master】,然后通过发布订阅模式通知其他的从服务器,修改配置文件,让它们切换主机连接到新master,通知所有客户端新的服务器地址【让用户无感知】。
以上就是哨兵的主要作用。其实哨兵也是一台Redis服务器,只不过不对外提供数据服务,这和kafka的zookeeper以及ElasticSearch的候选主节点承担类似的作用一样,就是集群的管理者。只不过Redis这个名字比较洋气,叫做哨兵。通常哨兵为单数,一般配置3个起。
启用哨兵模式
其实在Redis里我们是可以看到这样一个配置文件的,只不过我们之前没有操作过,那就是哨兵的配置文件,其实哨兵也有一个自己的配置文件
配置文件修改
我们可以看下这个配置文件里的关键信息:
[root@192 redis-6.0.8]# cat sentinel.conf | grep -v "#" | grep -v "^$"
可以查看到如下的配置
port 26379 //哨兵端口号 daemonize no //守护进程关闭 pidfile /var/run/redis-sentinel.pid logfile "" dir /tmp //文件存放目录 sentinel monitor mymaster 127.0.0.1 6379 2 //master端口号,以及只要有2个哨兵认为主机宕机即认定主机宕机n/2+1 sentinel down-after-milliseconds mymaster 30000 //哨兵在30秒内监测不到主机存活即认为宕机 sentinel parallel-syncs mymaster 1 //数据同步时默认同时同步一台,越小性能越差越稳定 sentinel failover-timeout mymaster 180000 //数据同步时超过180秒即3分钟则认为超时 sentinel deny-scripts-reconfig yes
我们调整为如下配置去操作:
port 26379 //哨兵端口号 daemonize no //守护进程关闭 dir /root/redis-6.0.8/data/ //文件存放目录 sentinel monitor mymaster 127.0.0.1 6379 2 //master端口号,以及只要有2个哨兵认为主机宕机即认定主机宕机n/2+1 sentinel down-after-milliseconds mymaster 30000 //哨兵在30秒内监测不到主机存活即认为宕机 sentinel parallel-syncs mymaster 1 //数据同步时默认同时同步一台,越小性能越差越稳定 sentinel failover-timeout mymaster 180000 //数据同步时超过180秒即3分钟则认为超时
在conf里新建如下几个文件,我们在三个端口布三个哨兵:6379、6380、6381。
哨兵配置文件
sentinel-26379.conf的配置为:
port 26379 dir /root/redis-6.0.8/data/ sentinel monitor tmlmaster 127.0.0.1 6379 2 sentinel down-after-milliseconds tmlmaster 30000 sentinel parallel-syncs tmlmaster 1 sentinel failover-timeout tmlmaster 180000
sentinel-26380.conf的配置为:
port 26380 dir /root/redis-6.0.8/data/ sentinel monitor tmlmaster 127.0.0.1 6379 2 sentinel down-after-milliseconds tmlmaster 30000 sentinel parallel-syncs tmlmaster 1 sentinel failover-timeout tmlmaster 180000
sentinel-26381.conf的配置为:
port 26381 dir /root/redis-6.0.8/data/ sentinel monitor tmlmaster 127.0.0.1 6379 2 sentinel down-after-milliseconds tmlmaster 30000 sentinel parallel-syncs tmlmaster 1 sentinel failover-timeout tmlmaster 180000
同时我们还需要搭建好主从架构,缺一台6381服务器,那么我们新增一个配置即可,新增后可以看到我们的主从配置为:
主从配置文件
这样我们所有的配置就已经配置完毕了,一主二从,主为6379,从为6380和6381:
主master的配置文件:redis-6379.conf
port 6379 daemonize no #logfile "redis-6379.log" bind 127.0.0.1 dir /root/redis-6.0.8/data/ dbfilename dump-6379.rdb rdbcompression yes rdbchecksum yes save 10 2 appendonly yes appendfilename appendonly-6379.aof appendfsync everysec
从slave1的配置文件:redis-6380.conf
port 6380 daemonize no #logfile "redis-6380.log" dir /root/redis-6.0.8/data/ dbfilename dump-6380.rdb rdbcompression yes rdbchecksum yes slaveof 127.0.0.1 6379 repl-backlog-size 100mb slave-server-stale-data no
从slave2的配置文件:redis-6381.conf
port 6381 daemonize no #logfile "redis-6381.log" dir /root/redis-6.0.8/data/ dbfilename dump-6381.rdb rdbcompression yes rdbchecksum yes slaveof 127.0.0.1 6379 repl-backlog-size 100mb slave-server-stale-data no
整体配置文件查看
[root@192 config]# ll 总用量 24 -rw-r--r-- 1 root root 234 10月 31 17:53 redis-6379.conf -rw-r--r-- 1 root root 211 11月 1 10:37 redis-6380.conf -rw-r--r-- 1 root root 211 11月 1 16:12 redis-6381.conf -rw-rw-r-- 1 root root 212 11月 1 16:08 sentinel-26379.conf -rw-rw-r-- 1 root root 212 11月 1 16:08 sentinel-26380.conf -rw-rw-r-- 1 root root 212 11月 1 16:08 sentinel-26381.conf [root@192 config]#
启动服务器及哨兵
启动顺序为先启动主服务器,然后启动从服务器,最后启动哨兵。先把历史的redis相关的进程都干掉,重新开始启动:
[root@192 config]# ps -ef | grep redis- root 62045 61684 0 10:21 pts/2 00:00:18 redis-server 127.0.0.1:6379 root 67659 61734 0 16:24 pts/3 00:00:00 grep --color=auto redis- [root@192 config]# kill -s 9 62045 [root@192 config]# kill -s 9 61684 [root@192 config]# ps -ef | grep redis- root 67667 61734 0 16:24 pts/3 00:00:00 grep --color=auto redis- [root@192 config]#
启动主从服务器
分别启动主从服务器后可以看到主服务器上打出如下日志,证明主从连接成功:
67863:M 01 Nov 2020 16:28:07.340 * Replica 127.0.0.1:6380 asks for synchronization 67863:M 01 Nov 2020 16:28:07.340 * Full resync requested by replica 127.0.0.1:6380 67863:M 01 Nov 2020 16:28:07.340 * Replication backlog created, my new replication IDs are '4944f27b383bcf43ff80f60187d34eb1f854007f' and '0000000000000000000000000000000000000000' 67863:M 01 Nov 2020 16:28:07.340 * Starting BGSAVE for SYNC with target: disk 67863:M 01 Nov 2020 16:28:07.340 * Background saving started by pid 67892 67892:C 01 Nov 2020 16:28:07.341 * DB saved on disk 67892:C 01 Nov 2020 16:28:07.341 * RDB: 0 MB of memory used by copy-on-write 67863:M 01 Nov 2020 16:28:07.434 * Background saving terminated with success 67863:M 01 Nov 2020 16:28:07.435 * Synchronization with replica 127.0.0.1:6380 succeeded 67863:M 01 Nov 2020 16:28:23.060 * Replica 127.0.0.1:6381 asks for synchronization 67863:M 01 Nov 2020 16:28:23.061 * Full resync requested by replica 127.0.0.1:6381 67863:M 01 Nov 2020 16:28:23.061 * Starting BGSAVE for SYNC with target: disk 67863:M 01 Nov 2020 16:28:23.061 * Background saving started by pid 67905 67905:C 01 Nov 2020 16:28:23.061 * DB saved on disk 67905:C 01 Nov 2020 16:28:23.062 * RDB: 0 MB of memory used by copy-on-write 67863:M 01 Nov 2020 16:28:23.148 * Background saving terminated with success 67863:M 01 Nov 2020 16:28:23.148 * Synchronization with replica 127.0.0.1:6381 succeeded
启动哨兵
首先预置开启各个会话准备进行连接:
启动哨兵的命令略有不同,redis-sentinel sentinel-26379.conf
,启动后可以看到一个哨兵id:Sentinel ID
[root@192 ~]# cd redis-6.0.8/config/ [root@192 config]# redis-sentinel sentinel-26379.conf 68018:X 01 Nov 2020 16:31:52.415 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo 68018:X 01 Nov 2020 16:31:52.415 # Redis version=6.0.8, bits=64, commit=00000000, modified=0, pid=68018, just started 68018:X 01 Nov 2020 16:31:52.415 # Configuration loaded 68018:X 01 Nov 2020 16:31:52.415 * Increased maximum number of open files to 10032 (it was originally set to 1024). _._ _.-``__ ''-._ _.-`` `. `_. ''-._ Redis 6.0.8 (00000000/0) 64 bit .-`` .-```. ```\/ _.,_ ''-._ ( ' , .-` | `, ) Running in sentinel mode |`-._`-...-` __...-.``-._|'` _.-'| Port: 26379 | `-._ `._ / _.-' | PID: 68018 `-._ `-._ `-./ _.-' _.-' |`-._`-._ `-.__.-' _.-'_.-'| | `-._`-._ _.-'_.-' | http://redis.io `-._ `-._`-.__.-'_.-' _.-' |`-._`-._ `-.__.-' _.-'_.-'| | `-._`-._ _.-'_.-' | `-._ `-._`-.__.-'_.-' _.-' `-._ `-.__.-' _.-' `-._ _.-' `-.__.-' 68018:X 01 Nov 2020 16:31:52.416 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128. 68018:X 01 Nov 2020 16:31:52.417 # Sentinel ID is d8a28dddef8b037ba99bd551ac9f0bde8ea5dde3 68018:X 01 Nov 2020 16:31:52.417 # +monitor master tmlmaster 127.0.0.1 6379 quorum 2 68018:X 01 Nov 2020 16:31:52.418 * +slave slave 127.0.0.1:6380 127.0.0.1 6380 @ tmlmaster 127.0.0.1 6379 68018:X 01 Nov 2020 16:31:52.419 * +slave slave 127.0.0.1:6381 127.0.0.1 6381 @ tmlmaster 127.0.0.1 6379
同时可以看到我们的主从信息。启动后我们使用哨兵客户端进行连接:
[root@192 ~]# cd redis-6.0.8/ [root@192 redis-6.0.8]# redis-cli -p 26379 127.0.0.1:26379> set name tml (error) ERR unknown command `set`, with args beginning with: `name`, `tml`, 127.0.0.1:26379> info # Server redis_version:6.0.8 redis_git_sha1:00000000 redis_git_dirty:0 redis_build_id:4c35dfe260ddf15c redis_mode:sentinel os:Linux 3.10.0-1127.el7.x86_64 x86_64 arch_bits:64 multiplexing_api:epoll atomicvar_api:atomic-builtin gcc_version:9.3.1 process_id:68018 run_id:cd9b376fa25c4c68e42033ba249c16a12a0dc015 tcp_port:26379 uptime_in_seconds:183 uptime_in_days:0 hz:16 configured_hz:10 lru_clock:10384175 executable:/root/redis-6.0.8/config/redis-sentinel config_file:/root/redis-6.0.8/config/sentinel-26379.conf io_threads_active:0 # Clients connected_clients:1 client_recent_max_input_buffer:2 client_recent_max_output_buffer:0 blocked_clients:0 tracking_clients:0 clients_in_timeout_table:0 # CPU used_cpu_sys:0.260898 used_cpu_user:0.055376 used_cpu_sys_children:0.000000 used_cpu_user_children:0.000000 # Stats total_connections_received:1 total_commands_processed:0 instantaneous_ops_per_sec:0 total_net_input_bytes:63 total_net_output_bytes:131 instantaneous_input_kbps:0.00 instantaneous_output_kbps:0.00 rejected_connections:0 sync_full:0 sync_partial_ok:0 sync_partial_err:0 expired_keys:0 expired_stale_perc:0.00 expired_time_cap_reached_count:0 expire_cycle_cpu_milliseconds:3 evicted_keys:0 keyspace_hits:0 keyspace_misses:0 pubsub_channels:0 pubsub_patterns:0 latest_fork_usec:0 migrate_cached_sockets:0 slave_expires_tracked_keys:0 active_defrag_hits:0 active_defrag_misses:0 active_defrag_key_hits:0 active_defrag_key_misses:0 tracking_total_keys:0 tracking_total_items:0 tracking_total_prefixes:0 unexpected_error_replies:0 total_reads_processed:3 total_writes_processed:2 io_threaded_reads_processed:0 io_threaded_writes_processed:0 # Sentinel sentinel_masters:1 sentinel_tilt:0 sentinel_running_scripts:0 sentinel_scripts_queue_length:0 sentinel_simulate_failure_flags:0 master0:name=tmlmaster,status=ok,address=127.0.0.1:6379,slaves=2,sentinels=1 127.0.0.1:26379>
可以看到我们哨兵是不能执行数据操作的,同时可以看到哨兵的一些配置信息。在哨兵启用监控后,哨兵的配置文件发生了变化,
sentinel-26379.conf的配置为
port 26379 dir "/root/redis-6.0.8/data" sentinel myid d8a28dddef8b037ba99bd551ac9f0bde8ea5dde3 sentinel deny-scripts-reconfig yes sentinel monitor tmlmaster 127.0.0.1 6379 2 sentinel config-epoch tmlmaster 0 # Generated by CONFIG REWRITE protected-mode no user default on nopass ~* +@all sentinel leader-epoch tmlmaster 0 sentinel known-replica tmlmaster 127.0.0.1 6380 sentinel known-replica tmlmaster 127.0.0.1 6381 sentinel current-epoch 0
多了一些哨兵识别发现的主从机器,记录了他们的ip和端口号。此时如果我们再启动哨兵2和哨兵3后,哨兵们会相互发现彼此,例如我这里启动一个哨兵2和哨兵3,从哨兵3的启动记录可以看到
68411:X 01 Nov 2020 16:45:58.132 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128. 68411:X 01 Nov 2020 16:45:58.133 # Sentinel ID is 2adb9e95cee7937e77dc42eb018504d94c7ed433 68411:X 01 Nov 2020 16:45:58.133 # +monitor master tmlmaster 127.0.0.1 6379 quorum 2 68411:X 01 Nov 2020 16:45:58.133 * +slave slave 127.0.0.1:6380 127.0.0.1 6380 @ tmlmaster 127.0.0.1 6379 68411:X 01 Nov 2020 16:45:58.134 * +slave slave 127.0.0.1:6381 127.0.0.1 6381 @ tmlmaster 127.0.0.1 6379 68411:X 01 Nov 2020 16:45:58.521 * +sentinel sentinel f1b5da2c58c4c45929178484718230114bdf090a 127.0.0.1 26380 @ tmlmaster 127.0.0.1 6379 68411:X 01 Nov 2020 16:45:59.969 * +sentinel sentinel d8a28dddef8b037ba99bd551ac9f0bde8ea5dde3 127.0.0.1 26379 @ tmlmaster 127.0.0.1 6379
不仅可以看到自己的哨兵id还可以识别到已经启动的哨兵1的id,同时再去看哨兵1的日志,多了两行新加进来的哨兵2和哨兵3的id:
68018:X 01 Nov 2020 16:31:52.416 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128. 68018:X 01 Nov 2020 16:31:52.417 # Sentinel ID is d8a28dddef8b037ba99bd551ac9f0bde8ea5dde3 68018:X 01 Nov 2020 16:31:52.417 # +monitor master tmlmaster 127.0.0.1 6379 quorum 2 68018:X 01 Nov 2020 16:31:52.418 * +slave slave 127.0.0.1:6380 127.0.0.1 6380 @ tmlmaster 127.0.0.1 6379 68018:X 01 Nov 2020 16:31:52.419 * +slave slave 127.0.0.1:6381 127.0.0.1 6381 @ tmlmaster 127.0.0.1 6379 68018:X 01 Nov 2020 16:39:26.885 * +sentinel sentinel f1b5da2c58c4c45929178484718230114bdf090a 127.0.0.1 26380 @ tmlmaster 127.0.0.1 6379 68018:X 01 Nov 2020 16:46:00.198 * +sentinel sentinel 2adb9e95cee7937e77dc42eb018504d94c7ed433 127.0.0.1 26381 @ tmlmaster 127.0.0.1 6379