问题描述
Redis根据定价层说明,不同级别支持的连接数最多可达4万(同时),但是当短时间又大量连接请求建立的时候,Redis服务的服务压力非常大,到达100%。严重影响了高响应的要求。最严重时,经常出现Redis Client Operation timeout错误。
问题分析
根据设计,Redis 只使用一个线程进行命令处理。 Azure Cache for Redis 还利用其它核心进行 I/O 处理。 拥有更多的内核可能不会产生线性缩放,但可提高吞吐量性能。 而且,较大 VM 的带宽限制通常比较小 VM 的更高。 这有助于避免网络饱和,从而避免应用程序超时。
- 基本缓存和标准缓存
- C0 (250 MB) 缓存 - 最多支持 256 个连接
- C1 (1 GB) 缓存 - 最多支持 1,000 个连接
- C2 (2.5 GB) 缓存 - 最多支持 2,000 个连接
- C3 (6 GB) 缓存 - 最多支持 5,000 个连接
- C4 (13 GB) 缓存 - 最多支持 10,000 个连接
- C5 (26 GB) 缓存 - 最多支持 15,000 个连接
- C6 (53 GB) 缓存 - 最多支持 20,000 个连接
- 高级缓存
- P1 (6 GB - 60 GB) - 最多支持 7,500 个连接
- P2 (13 GB - 130 GB) - 最多支持 15,000 个连接
- P3 (26 GB - 260 GB) - 最多支持 30,000 个连接
- P4 (53 GB - 530 GB) - 最多支持 40,000 个连接
虽然每个缓存大小 最多 允许一定数量的连接,但与 Redis 的每个连接都具有其关联的开销。 此类开销的一个示例是,由于 TLS/SSL 加密而导致的 CPU 和内存使用。 给定缓存大小的最大连接限制假定轻负载缓存。 如果连接开销的负载 和 客户端操作的负载超出了系统容量,那么即使未超出当前缓存大小的连接限制,缓存也可能会遇到容量问题。
解决方案
启用连接池,重复使用连接。创建新连接是高开销的操作,会增大延迟,因此请尽量重复使用连接。 如果你选择创建新连接,请确保在释放旧连接之前先将其关闭(即使是在 .NET 或 Java 等托管内存语言中)。
避免高开销操作 - 某些 Redis 操作(例如 KEYS 命令)的开销很大,应该避免。 有关详细信息,请参阅有关长时间运行的命令的一些注意事项
如果请求连接及性能要求已经超过了单个服务器的极限,则考虑使用Redis Cluster (集群,增加分片数)。
连接池实例
一: Jedis的连接池设置
................. // boolean useSsl = true; String cacheHostname = "xxxx.redis.cache.chinacloudapi.cn"; String cachekey = " Key"; JedisPool jdspool = getPool(cacheHostname, 6379, cachekey); Jedis jedis1 = jdspool.getResource(); System.out.println("Cache Response : " + jedis1.set("Message-1","hello pool")); String msg = "Hello! The cache is working from Java!"; for (int i = 0; i < 10000; i++) { try { Jedis jedis = jdspool.getResource(); System.out.println("Cache Response : " + jedis.set("Message-Java-" + i, msg + i)); jedis.close(); } catch (Exception ex) { ex.printStackTrace(); } } .................... /** * 获取连接池. * * @return 连接池实例 */ public static JedisPool getPool(String ip, int port, String cachekey) { JedisPoolConfig config = new JedisPoolConfig(); config.setMaxIdle(20); config.setMaxTotal(20); config.setMinIdle(10); config.setMaxWaitMillis(2000); config.setTestOnBorrow(true); config.setTestOnReturn(true); JedisPool pool = null; try { /** * 如果你遇到 java.net.SocketTimeoutException: Read timed out exception的异常信息 * 请尝试在构造JedisPool的时候设置自己的超时值. JedisPool默认的超时时间是2秒(单位毫秒) */ pool = new JedisPool(config, ip, port, 2000, cachekey); } catch (Exception e) { e.printStackTrace(); } return pool; }
注意:主要的设置参数为 setMaxTotal 和 setMaxIdle, MaxTotal为连接池中连接的最大数,而MaxIdle则表示允许最多多少个连接空闲时,以便随时提供Jedis连接。 最好情况下,MaxTotal与MaxIdle一样。
Parameter | Description | Default value | Recommended settings |
maxTotal | The maximum number of connections that are supported by the pool. | 8 | For more information, see Recommended settings. |
maxIdle | The maximum number of idle connections in the pool. | 8 | For more information, see Recommended settings. |
minIdle | The minimum number of idle connections in the pool. | 0 | For more information, see Recommended settings. |
blockWhenExhausted | Specifies whether the client must wait when the resource pool is exhausted. Only when this parameter is set to true, the maxWaitMillis parameter takes effect. | true | We recommend that you use the default value. |
maxWaitMillis | The maximum number of milliseconds that the client must wait when no connection is available. | A value of -1 specifies that the connection never times out. | We recommend that you do not use the default value. |
testOnBorrow | Specifies whether to validate connections by using the PING command before the connections are borrowed from the pool. Invalid connections are removed from the pool. | false | We recommend that you set this parameter to false when the workload is heavy. This allows you to reduce the overhead of a ping test. |
testOnReturn | Specifies whether to validate connections by using the PING command before the connections are returned to the pool. Invalid connections are removed from the pool. | false | We recommend that you set this parameter to false when the workload is heavy. This allows you to reduce the overhead of a ping test. |
jmxEnabled | Specifies whether to enable Java Management Extensions (JMX) monitoring. | true | We recommend that you enable JMX monitoring. Take note that you must also enable the fe |
Recommended settings (https://partners-intl.aliyun.com/help/doc-detail/98726.htm)
maxTotal: The maximum number of connections.
To set a proper value of maxTotal, take note of the following factors:
- The expected concurrent connections based on your business requirements.
- The amount of time that is consumed by the client to run the command.
- The limit of Redis resources. For example, if you multiply maxTotal by the number of nodes (ECS instances), the product must be smaller than the supported maximum number of connections in Redis. You can view the maximum connections on the Instance Information page in the ApsaraDB for Redis console.
- The resource that is consumed to create and release connections. If the number of connections that are created and released for a request is large, the processes that are performed to create and release connections are adversely affected.
For example, the average time that is consumed to run a command, or the average time that is required to borrow or return resources and to run Jedis commands with network overhead, is approximately 1 ms. The queries per second (QPS) of a connection is about 1 second/1 millisecond = 1000. The expected QPS of an individual Redis instance is 50,000 (the total number of QPS divided by the number of Redis shards). The theoretically required size of a resource pool (maxTotal) is 50,000/1,000 = 50.
However, this is only a theoretical value. To reserve some resources, the value of the maxTotal parameter can be larger than the theoretical value. However, if the value of the maxTotal parameter is too large, the connections consume a large amount of client and server resources. For Redis servers that have a high QPS, if a large number of commands are blocked, the issue cannot be solved even by a large resource pool.
maxIdle and minIdle
maxIdle is the actual maximum number of connections required by workloads. maxTotal includes the number of idle connections as a surplus. If the value of maxIdle is too small on heavily loaded systems,
new Jedis
connections are created to serve the requests. minIdle specifies the minimum number of established connections that must be kept in the pool.The connection pool achieves its best performance when maxTotal = maxIdle. This way, the performance is not affected by the scaling of the connection pool. We recommend that you set the maxIdle and minIdle parameters to the same value if the user traffic fluctuates. If the number of concurrent connections is small or the value of the maxIdle parameter is too large, the connection resources are wasted.
You can evaluate the size of the connection pool used by each node based on the actual total QPS and the number of clients that Redis serves.
Retrieve proper values based on monitoring data
In actual scenarios, a more reliable method is to try to retrieve optimal values based on monitoring data. You can use JMX monitoring or other monitoring tools to find proper values.
You cannot obtain resources from the resource pool in the following cases:
- Timeout:
redis.clients.jedis.exceptions.JedisConnectionException: Could not get a resource from the pool … Caused by: java.util.NoSuchElementException: Timeout waiting for idle object at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:449)
- When you set the blockWhenExhausted parameter to false, the time specified by borrowMaxWaitMillis is not used and the borrowObject call blocks the connection until an idle connection is available.
redis.clients.jedis.exceptions.JedisConnectionException: Could not get a resource from the pool … Caused by: java.util.NoSuchElementException: Pool exhausted at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:464)
This exception may not be caused by a limited pool size. For more information, see Recommended settings. To fix this issue, we recommend that you check the network, the parameters of the resource pool, the resource pool monitoring (JMX monitoring), the code (for example, the reason is that
jedis.close()
is not executed), slow queries, and the domain name system (DNS).