云服务器Redis集群部署及客户端通过公网IP连接问题

本文涉及的产品
Redis 开源版,标准版 2GB
推荐场景:
搭建游戏排行榜
日志服务 SLS,月写入数据量 50GB 1个月
简介: 目录1、配置文件2、启动服务并创建集群(1)启动6个Redis服务(2)通过客户端命令创建集群3、客户端连接(1)客户端配置(2)测试用例(3)错误日志分析4、问题解决(1)查redis.conf配置文件(2)修改配置文件(3)重新启动Redis服务并创建集群5、故障转移期间Lettuce客户端连接问题(1)测试用例(2)停掉其中一个master节点,模拟宕机(3)解决办法1)更换Redis客户端2)Lettuce客户端配置Redis集群拓扑刷新

目录

1、配置文件

2、启动服务并创建集群

(1)启动6个Redis服务

(2)通过客户端命令创建集群

3、客户端连接

(1)客户端配置

(2)测试用例

(3)错误日志分析

4、问题解决

(1)查redis.conf配置文件

(2)修改配置文件

(3)重新启动Redis服务并创建集群

5、故障转移期间Lettuce客户端连接问题

(1)测试用例

(2)停掉其中一个master节点,模拟宕机

(3)解决办法

1)更换Redis客户端

2)Lettuce客户端配置Redis集群拓扑刷新

1、配置文件



准备了6个配置文件:redis-6381.conf,redis-6382.conf,redis-6383.conf,redis-6384.conf,redis-6385.conf,

redis-6386.conf。配置文件内容如下:

# 配置文件进行了精简,完整配置可自行和官方提供的完整conf文件进行对照。端口号自行对应修改
#后台启动的意思
daemonize yes 
#端口号
port 6381
# IP绑定,redis不建议对公网开放,这里绑定了服务器私网IP及环回地址
bind 172.17.0.13 127.0.0.1
# redis数据文件存放的目录
dir /redis/workingDir
# 日志文件
logfile "/redis/logs/cluster-node-6381.log"
# 开启AOF
appendonly yes
 # 开启集群
cluster-enabled yes
# 集群持久化配置文件,内容包含其它节点的状态,持久化变量等,会自动生成在上面配置的dir目录下
cluster-config-file cluster-node-6381.conf
# 集群节点不可用的最大时间(毫秒),如果主节点在指定时间内不可达,那么会进行故障转移
cluster-node-timeout 5000


备注:Redis版本为6.0.4

2、启动服务并创建集群

(1)启动6个Redis服务

redis-server redis-6381.conf
redis-server redis-6382.conf
redis-server redis-6383.conf
redis-server redis-6384.conf
redis-server redis-6385.conf
redis-server redis-6386.conf


(2)通过客户端命令创建集群

创建集群,每个master节点分配一个从节点:

redis-cli --cluster create \ 
172.17.0.13:6381 172.17.0.13:6382 172.17.0.13:6383 \ 
172.17.0.13:6384 172.17.0.13:6385 172.17.0.13:6386 \
--cluster-replicas 1

3、客户端连接

(1)客户端配置

@Configuration
public class RedisClusterConfig {
  @Bean
  public RedisConnectionFactory redisConnectionFactory() {
    // 客户端读写分离配置
    LettuceClientConfiguration clientConfig = LettuceClientConfiguration.builder()
            .readFrom(ReadFrom.REPLICA_PREFERRED)
            .build();
    RedisClusterConfiguration redisClusterConfiguration = new RedisClusterConfiguration(Arrays.asList(
            "122.51.151.130:6381",
            "122.51.151.130:6382",
            "122.51.151.130:6383",
            "122.51.151.130:6384",
            "122.51.151.130:6385",
            "122.51.151.130:6386"));
    return new LettuceConnectionFactory(redisClusterConfiguration, clientConfig);
  }
}

(2)测试用例

@RunWith(SpringRunner.class)
@SpringBootTest(classes = Application.class)
public class RedisClusterTest {
  @Autowired
  private StringRedisTemplate stringRedisTemplate;
  @Test
  public void readFromReplicaWriteToMasterTest() {
    System.out.println("开始设置值...");
    stringRedisTemplate.opsForValue().set("username", "Nick");
    System.out.println("获取值:" + stringRedisTemplate.opsForValue().get("username"));
  }
}


(3)错误日志分析

2020-08-14 14:57:49.180  WARN 22012 --- [ioEventLoop-6-4] i.l.c.c.topology.ClusterTopologyRefresh  : Unable to connect to [172.17.0.13:6384]: connection timed out: /172.17.0.13:6384
2020-08-14 14:57:49.180  WARN 22012 --- [ioEventLoop-6-3] i.l.c.c.topology.ClusterTopologyRefresh  : Unable to connect to [172.17.0.13:6383]: connection timed out: /172.17.0.13:6383
2020-08-14 14:57:49.182  WARN 22012 --- [ioEventLoop-6-2] i.l.c.c.topology.ClusterTopologyRefresh  : Unable to connect to [172.17.0.13:6382]: connection timed out: /172.17.0.13:6382
2020-08-14 14:57:49.182  WARN 22012 --- [ioEventLoop-6-1] i.l.c.c.topology.ClusterTopologyRefresh  : Unable to connect to [172.17.0.13:6381]: connection timed out: /172.17.0.13:6381
2020-08-14 14:57:49.190  WARN 22012 --- [ioEventLoop-6-1] i.l.c.c.topology.ClusterTopologyRefresh  : Unable to connect to [172.17.0.13:6385]: connection timed out: /172.17.0.13:6385
2020-08-14 14:57:49.191  WARN 22012 --- [ioEventLoop-6-2] i.l.c.c.topology.ClusterTopologyRefresh  : Unable to connect to [172.17.0.13:6386]: connection timed out: /172.17.0.13:6386
2020-08-14 14:57:59.389  WARN 22012 --- [ioEventLoop-6-3] i.l.core.cluster.RedisClusterClient      : connection timed out: /172.17.0.13:6382
2020-08-14 14:58:09.391  WARN 22012 --- [ioEventLoop-6-4] i.l.core.cluster.RedisClusterClient      : connection timed out: /172.17.0.13:6381
2020-08-14 14:58:19.393  WARN 22012 --- [ioEventLoop-6-1] i.l.core.cluster.RedisClusterClient      : connection timed out: /172.17.0.13:6383
2020-08-14 14:58:29.396  WARN 22012 --- [ioEventLoop-6-2] i.l.core.cluster.RedisClusterClient      : connection timed out: /172.17.0.13:6384
2020-08-14 14:58:39.399  WARN 22012 --- [ioEventLoop-6-3] i.l.core.cluster.RedisClusterClient      : connection timed out: /172.17.0.13:6386
2020-08-14 14:58:49.402  WARN 22012 --- [ioEventLoop-6-4] i.l.core.cluster.RedisClusterClient      : connection timed out: /172.17.0.13:6385


4、问题解决

(1)查redis.conf配置文件

让Redis暴露公网IP其实在redis.conf配置文件里是能找到的,下面这段配置主要针对docker这种特殊的部署,这里我们也可以手动指定Redis的公网IP、端口以及总线端口(默认服务端口加10000)。

########################## CLUSTER DOCKER/NAT support  ########################
# In certain deployments, Redis Cluster nodes address discovery fails, because
# addresses are NAT-ted or because ports are forwarded (the typical case is
# Docker and other containers).
#
# In order to make Redis Cluster working in such environments, a static
# configuration where each node knows its public address is needed. The
# following two options are used for this scope, and are:
#
# * cluster-announce-ip
# * cluster-announce-port
# * cluster-announce-bus-port
#
# Each instruct the node about its address, client port, and cluster message
# bus port. The information is then published in the header of the bus packets
# so that other nodes will be able to correctly map the address of the node
# publishing the information.
#
# If the above options are not used, the normal Redis Cluster auto-detection
# will be used instead.
#
# Note that when remapped, the bus port may not be at the fixed offset of
# clients port + 10000, so you can specify any port and bus-port depending
# on how they get remapped. If the bus-port is not set, a fixed offset of
# 10000 will be used as usually.
#
# Example:
#
# cluster-announce-ip 10.1.1.5
# cluster-announce-port 6379
# cluster-announce-bus-port 6380


(2)修改配置文件

手动指定了公网ip后,Redis集群中的节点会通过公网IP进行通信,也就是外网访问。因此相关的总线端口,如下面的16381等总线端口必须在云服务器中的安全组中放开,不然集群会处于fail状态。

# 配置文件进行了精简,完整配置可自行和官方提供的完整conf文件进行对照。端口号自行对应修改
#后台启动的意思
daemonize yes 
#端口号
port 6381
# IP绑定,redis不建议对公网开放,这里绑定了服务器私网IP及环回地址
bind 172.17.0.13 127.0.0.1
# redis数据文件存放的目录
dir /redis/workingDir
# 日志文件
logfile "/redis/logs/cluster-node-6381.log"
# 开启AOF
appendonly yes
 # 开启集群
cluster-enabled yes
# 集群持久化配置文件,内容包含其它节点的状态,持久化变量等,会自动生成在上面配置的dir目录下
cluster-config-file cluster-node-6381.conf
# 集群节点不可用的最大时间(毫秒),如果主节点在指定时间内不可达,那么会进行故障转移
cluster-node-timeout 5000
# 云服务器上部署需指定公网ip
cluster-announce-ip 122.51.151.130
# Redis总线端口,用于与其它节点通信
cluster-announce-bus-port 16381

(3)重新启动Redis服务并创建集群

这个时候我们可以查看一下节点配置文件cluster-node-6381.conf的内容前后有啥变化。

未指定公网IP前:

[universe@VM_0_13_centos workingDir]$ cat cluster-node-6381.conf 
34287d78c1e9c4ff49880bb976707a0c17676f82 172.17.0.13:6384@16384 slave 1a206270f835a79e43e281df5f6f8215ab49d713 0 1597390563209 4 connected
e306ae5e3ead5f2a837d3bdc0b95c0bd8e3cff99 172.17.0.13:6383@16383 master - 0 1597390565212 3 connected 10923-16383
0932cc203a19f37a3f5ebca8278962f5b325c67e 172.17.0.13:6385@16385 slave 2cc1aed536ff5b48c2fdd94f16cd96cefc4fd4ef 0 1597390564711 5 connected
2cc1aed536ff5b48c2fdd94f16cd96cefc4fd4ef 172.17.0.13:6382@16382 master - 0 1597390565000 2 connected 5461-10922
1a206270f835a79e43e281df5f6f8215ab49d713 172.17.0.13:6381@16381 myself,master - 0 1597390564000 1 connected 0-5460
0f63accb455594d0625cffa8d09aacc580d7e428 172.17.0.13:6386@16386 slave e306ae5e3ead5f2a837d3bdc0b95c0bd8e3cff99 0 1597390564210 6 connected


指定公网IP后:

[universe@VM_0_13_centos workingDir]$ cat cluster-node-6381.conf 
e2691ffd4bf7d867bc91b3b91c7b233a5f1e5dd2 122.51.151.130:6384@16384 master - 0 1597389992286 7 connected 10923-16383
511668874d39a7b1f701cc3df6f21d00510bfeae 122.51.151.130:6383@16383 slave e2691ffd4bf7d867bc91b3b91c7b233a5f1e5dd2 0 1597389991283 7 connected
e77e540ef4115abe920fb191f354b81f42e7b4ed 122.51.151.130:6381@16381 myself,master - 0 1597389991000 1 connected 0-5460
2a3ea359311b34cd59e10da7d2f1bba48403f0ee 122.51.151.130:6385@16385 slave e77e540ef4115abe920fb191f354b81f42e7b4ed 0 1597389990583 5 connected
2bf4f01a4dba802eb1a50d9510947a4af0ac92ef 122.51.151.130:6382@16382 master - 0 1597389992789 2 connected 5461-10922
2b7671e002143b329c9c6c969bfb825a86fb41b2 122.51.151.130:6386@16386 slave 2bf4f01a4dba802eb1a50d9510947a4af0ac92ef 0 1597389991784 6 connected
vars currentEpoch 7 lastVoteEpoch 7


这里我们可以发现,各节点暴露的IP全是公网IP了,再次运行测试用例,一切正常。

5、故障转移期间Lettuce客户端连接问题

(1)测试用例

@RunWith(SpringRunner.class)
@SpringBootTest(classes = Application.class)
public class RedisClusterTest {
  @Autowired
  private StringRedisTemplate stringRedisTemplate;
  @Test
  public void automaticFailoverTest() throws InterruptedException {
    int count = 0;
    while (true) {
      try {
        stringRedisTemplate.opsForValue().set("count", String.valueOf(++count));
        System.out.println("修改count的值:" + count);
        System.out.println("获取count的值:" + stringRedisTemplate.opsForValue().get("count"));
        Thread.sleep(2000);
      } catch (Exception e) {
        System.out.println("可能发生切主,重新操作...");
        Thread.sleep(3000);
      }
    }
  }
}

(2)停掉其中一个master节点,模拟宕机

日志如下:

2020-08-20 19:33:25.118  INFO 13696 --- [xecutorLoop-1-1] i.l.core.protocol.ConnectionWatchdog     : Reconnecting, last destination was /122.51.151.130:6384
2020-08-20 19:33:26.213  WARN 13696 --- [ioEventLoop-6-1] i.l.core.protocol.ConnectionWatchdog     : Cannot reconnect to [122.51.151.130:6384]: Connection refused: no further information: /122.51.151.130:6384
2020-08-20 19:33:31.015  INFO 13696 --- [xecutorLoop-1-2] i.l.core.protocol.ConnectionWatchdog     : Reconnecting, last destination was 122.51.151.130:6384
2020-08-20 19:33:32.107  WARN 13696 --- [ioEventLoop-6-2] i.l.core.protocol.ConnectionWatchdog     : Cannot reconnect to [122.51.151.130:6384]: Connection refused: no further information: /122.51.151.130:6384
2020-08-20 19:33:36.616  INFO 13696 --- [xecutorLoop-1-2] i.l.core.protocol.ConnectionWatchdog     : Reconnecting, last destination was 122.51.151.130:6384
2020-08-20 19:33:37.709  WARN 13696 --- [ioEventLoop-6-2] i.l.core.protocol.ConnectionWatchdog     : Cannot reconnect to [122.51.151.130:6384]: Connection refused: no further information: /122.51.151.130:6384
2020-08-20 19:33:42.016  INFO 13696 --- [xecutorLoop-1-4] i.l.core.protocol.ConnectionWatchdog     : Reconnecting, last destination was 122.51.151.130:6384
2020-08-20 19:33:43.110  WARN 13696 --- [ioEventLoop-6-4] i.l.core.protocol.ConnectionWatchdog     : Cannot reconnect to [122.51.151.130:6384]: Connection refused: no further information: /122.51.151.130:6384
2020-08-20 19:33:47.216  INFO 13696 --- [xecutorLoop-1-1] i.l.core.protocol.ConnectionWatchdog     : Reconnecting, last destination was 122.51.151.130:6384
2020-08-20 19:33:48.317  WARN 13696 --- [ioEventLoop-6-1] i.l.core.protocol.ConnectionWatchdog     : Cannot reconnect to [122.51.151.130:6384]: Connection refused: no further information: /122.51.151.130:6384
2020-08-20 19:33:56.515  INFO 13696 --- [xecutorLoop-1-2] i.l.core.protocol.ConnectionWatchdog     : Reconnecting, last destination was 122.51.151.130:6384
2020-08-20 19:33:57.605  WARN 13696 --- [ioEventLoop-6-2] i.l.core.protocol.ConnectionWatchdog     : Cannot reconnect to [122.51.151.130:6384]: Connection refused: no further information: /122.51.151.130:6384
2020-08-20 19:34:14.016  INFO 13696 --- [xecutorLoop-1-3] i.l.core.protocol.ConnectionWatchdog     : Reconnecting, last destination was 122.51.151.130:6384
2020-08-20 19:34:15.113  WARN 13696 --- [ioEventLoop-6-3] i.l.core.protocol.ConnectionWatchdog     : Cannot reconnect to [122.51.151.130:6384]: Connection refused: no further information: /122.51.151.130:6384
可能发生切主,重新操作...
2020-08-20 19:34:45.116  INFO 13696 --- [xecutorLoop-1-4] i.l.core.protocol.ConnectionWatchdog     : Reconnecting, last destination was 122.51.151.130:6384
2020-08-20 19:34:46.212  WARN 13696 --- [ioEventLoop-6-4] i.l.core.protocol.ConnectionWatchdog     : Cannot reconnect to [122.51.151.130:6384]: Connection refused: no further information: /122.51.151.130:6384
2020-08-20 19:35:16.216  INFO 13696 --- [xecutorLoop-1-1] i.l.core.protocol.ConnectionWatchdog     : Reconnecting, last destination was 122.51.151.130:6384
2020-08-20 19:35:17.310  WARN 13696 --- [ioEventLoop-6-1] i.l.core.protocol.ConnectionWatchdog     : Cannot reconnect to [122.51.151.130:6384]: Connection refused: no further information: /122.51.151.130:6384
可能发生切主,重新操作...


等了很长一段时间发现,发现客户端一致处于重连状态,这Lettuce客户端可能有毒。

(3)解决办法

1)更换Redis客户端

将客户端换为Jedis后,再次模拟主节点宕机,发现过段时间后客户端连接恢复正常了。

@Configuration
public class RedisClusterConfig {
  @Bean
  public RedisConnectionFactory redisConnectionFactory() {
    RedisClusterConfiguration redisClusterConfiguration = new RedisClusterConfiguration(Arrays.asList(
            "122.51.151.130:6381",
            "122.51.151.130:6382",
            "122.51.151.130:6383",
            "122.51.151.130:6384",
            "122.51.151.130:6385",
            "122.51.151.130:6386"));
    return new JedisConnectionFactory(redisClusterConfiguration);
  }
}

难道Lettuce客户端不支持主从切换后客户端重连么,那是不可能的。我们在github上找到了关于lettuce关于Redis集群的一些信息,相关地址如下:

https://github.com/lettuce-io/lettuce-core/wiki/Redis-Cluster

https://github.com/lettuce-io/lettuce-core/wiki/Client-options#cluster-specific-options


接下来按照文档上的提示修改客户端配置:

@Configuration
public class RedisClusterConfig {
  @Bean
  public RedisConnectionFactory redisConnectionFactory() {
    // 开启自适应集群拓扑刷新和周期拓扑刷新,不开启相应槽位主节点挂掉会出现服务不可用,直到挂掉节点重新恢复
    ClusterTopologyRefreshOptions clusterTopologyRefreshOptions =  ClusterTopologyRefreshOptions.builder()
            .enableAllAdaptiveRefreshTriggers() // 开启自适应刷新,自适应刷新不开启,Redis集群变更时将会导致连接异常
            .adaptiveRefreshTriggersTimeout(Duration.ofSeconds(30)) //自适应刷新超时时间(默认30秒),默认关闭开启后时间为30秒
            .enablePeriodicRefresh(Duration.ofSeconds(20))  // 默认关闭开启后时间为60秒 ClusterTopologyRefreshOptions.DEFAULT_REFRESH_PERIOD 60  .enablePeriodicRefresh(Duration.ofSeconds(2)) = .enablePeriodicRefresh().refreshPeriod(Duration.ofSeconds(2))
            .build();
    ClientOptions clientOptions = ClusterClientOptions.builder()
            .topologyRefreshOptions(clusterTopologyRefreshOptions)
            .build();
    // 客户端读写分离配置
    LettuceClientConfiguration clientConfig = LettuceClientConfiguration.builder()
            .clientOptions(clientOptions)
            .build();
    RedisClusterConfiguration redisClusterConfiguration = new RedisClusterConfiguration(Arrays.asList(
            "122.51.151.130:6381",
            "122.51.151.130:6382",
            "122.51.151.130:6383",
            "122.51.151.130:6384",
            "122.51.151.130:6385",
            "122.51.151.130:6386"));
    return new LettuceConnectionFactory(redisClusterConfiguration, clientConfig);
  }
}
相关实践学习
通义万相文本绘图与人像美化
本解决方案展示了如何利用自研的通义万相AIGC技术在Web服务中实现先进的图像生成。
7天玩转云服务器
云服务器ECS(Elastic Compute Service)是一种弹性可伸缩的计算服务,可降低 IT 成本,提升运维效率。本课程手把手带你了解ECS、掌握基本操作、动手实操快照管理、镜像管理等。了解产品详情: https://www.aliyun.com/product/ecs
相关文章
|
2月前
|
NoSQL Java 网络安全
SpringBoot启动时连接Redis报错:ERR This instance has cluster support disabled - 如何解决?
通过以上步骤一般可以解决由于配置不匹配造成的连接错误。在调试问题时,一定要确保服务端和客户端的Redis配置保持同步一致。这能够确保SpringBoot应用顺利连接到正确配置的Redis服务,无论是单机模式还是集群模式。
208 5
|
5月前
|
安全
基于Reactor模式的高性能服务器之Acceptor组件(处理连接)
本节介绍了对底层 Socket 进行封装的设计与实现,通过 `Socket` 类隐藏系统调用细节,提供简洁、安全、可读性强的接口。重点包括 `Socket` 类的核心作用(管理 `sockfd_`)、成员函数的功能(如绑定地址、监听、接受连接等),以及 `Acceptor` 组件的职责:监听连接、接收新客户端连接并分发给上层处理。同时说明了 `Acceptor` 与 `EventLoop` 和 `TcpServer` 的协作关系,并展示了其成员变量和关键函数的工作机制。
121 2
|
4月前
|
SQL Oracle 关系型数据库
【赵渝强老师】Oracle客户端与服务器端连接建立的过程
Oracle数据库采用客户端-服务器架构,客户端通过其网络环境与服务器通信,实现数据库访问。监听程序负责建立连接,通过命令lsnrctl可启动、停止及查看监听状态。本文介绍了监听器的作用及相关基础管理操作。
197 0
|
9月前
|
NoSQL Redis Docker
Docker——阿里云服务器利用docker搭建redis集群
本文详细记录了使用Docker搭建Redis集群的过程,包括检查Docker和Docker Compose的安装、创建Redis配置文件、编写`docker-compose.yml`文件、启动Redis节点、创建Redis集群的具体步骤,以及最终的验证方法。文章还提供了在多服务器环境下搭建Redis集群的注意事项,帮助读者全面了解 Redis 集群的部署流程。
975 68
|
12月前
|
NoSQL 应用服务中间件 API
Redis是如何建立连接和处理命令的
本文主要讲述 Redis 是如何监听客户端发出的set、get等命令的。
1599 160
|
10月前
|
消息中间件 存储 NoSQL
java连接redis和基础操作命令
通过以上内容,您可以掌握在Java中连接Redis以及进行基础操作的基本方法,进而在实际项目中灵活应用。
519 30
|
监控 NoSQL 网络协议
【Azure Redis】部署在AKS中的应用,连接Redis高频率出现timeout问题
查看Redis状态,没有任何异常,服务没有更新,Service Load, CPU, Memory, Connect等指标均正常。在排除Redis端问题后,转向了AKS中。 开始调查AKS的网络状态。最终发现每次Redis客户端出现超时问题时,几乎都对应了AKS NAT Gateway的更新事件,而Redis服务端没有任何异常。因此,超时问题很可能是由于NAT Gateway更新事件导致TCP连接被重置。
233 7
|
安全 Linux 文件存储
如何在本地服务器部署TeslaMate并远程查看特斯拉汽车数据无需公网ip
如何在本地服务器部署TeslaMate并远程查看特斯拉汽车数据无需公网ip
969 0
|
4月前
|
存储 弹性计算 安全
阿里云轻量服务器通用型、CPU优化型、多公网IP型、国际型、容量型不同实例区别与选择参考
阿里云轻量应用服务器实例类型分为通用型、CPU优化型、多公网IP型、国际型、容量型,不同规格族的适用场景和特点不同,收费标准也不一样。本文为大家介绍轻量应用服务器通用型、多公网IP型、容量型有何区别?以及选择参考。
|
2月前
|
存储 缓存 安全
阿里云轻量应用服务器实例:通用型、多公网IP型、CPU优化、国际及容量型区别对比
阿里云轻量服务器分通用型、CPU优化型、多公网IP型、国际型和容量型。通用型适合网站与应用;CPU优化型提供稳定高性能计算;多公网IP型支持2-3个IP,适用于账号管理;国际型覆盖海外地域,助力出海业务;容量型提供大存储,适配网盘与实训场景。
222 1

热门文章

最新文章