前面可以看到,虽然图形化监控Redis比较美观、直接,但是安装起来比较麻烦。如果只是想简单看一下Redis的负载情况的话,完全可以用它提供的一些命令来完成。
2.1 吞吐量
Redis提供的INFO命令不仅能够查看实时的吞吐量(ops/sec),还能看到一些有用的运行时信息。下面用grep过滤出一些比较重要的实时信息,比如已连接的和在阻塞的客户端、已用内存、拒绝连接、实时的tps和数据流量等:
- [root@vm redis-3.0.3]# src/redis-cli -h 127.0.0.1 info | grep -e "connected_clients" -e "blocked_clients" -e "used_memory_human" -e "used_memory_peak_human" -e "rejected_connections" -e "evicted_keys" -e "instantaneous"
- connected_clients:1
- blocked_clients:0
- used_memory_human:799.66K
- used_memory_peak_human:852.35K
- instantaneous_ops_per_sec:0
- instantaneous_input_kbps:0.00
- instantaneous_output_kbps:0.00
- rejected_connections:0
- evicted_keys:0
复制代码
2.2 延迟
2.2.1 客户端PING
从客户端可以监控Redis的延迟,利用Redis提供的PING命令,不断PING服务端,记录服务端响应PONG的时间。下面开两个终端,一个监控延迟,一个监视服务端收到的命令:
- [root@vm redis-3.0.3]# src/redis-cli --latency -h 127.0.0.1
- min: 0, max: 1, avg: 0.08
- [root@vm redis-3.0.3]# src/redis-cli -h 127.0.0.1
- 127.0.0.1:6379> monitor
- OK
- 1439361594.867170 [0 127.0.0.1:59737] "PING"
- 1439361594.877413 [0 127.0.0.1:59737] "PING"
- 1439361594.887643 [0 127.0.0.1:59737] "PING"
- 1439361594.897858 [0 127.0.0.1:59737] "PING"
- 1439361594.908063 [0 127.0.0.1:59737] "PING"
- 1439361594.918277 [0 127.0.0.1:59737] "PING"
- 1439361594.928469 [0 127.0.0.1:59737] "PING"
- 1439361594.938693 [0 127.0.0.1:59737] "PING"
- 1439361594.948899 [0 127.0.0.1:59737] "PING"
- 1439361594.959110 [0 127.0.0.1:59737] "PING"
复制代码
如果我们故意用DEBUG命令制造延迟,就能看到一些输出上的变化:
- [root@vm redis-3.0.3]# src/redis-cli --latency -h 127.0.0.1
- min: 0, max: 1995, avg: 1.60 (2361 samples)
- [root@vm redis-3.0.3]# src/redis-cli -h 127.0.0.1
- 127.0.0.1:6379> debug sleep 1
- OK
- (1.00s)
- 127.0.0.1:6379> debug sleep .15
- OK
- 127.0.0.1:6379> debug sleep .5
- OK
- (0.50s)
- 127.0.0.1:6379> debug sleep 2
- OK
- (2.00s)
复制代码
2.2.2 服务端内部机制
服务端内部的延迟监控稍微麻烦一些,因为延迟记录的默认阈值是0。尽管空间和时间耗费很小,Redis为了高性能还是默认关闭了它。所以首先我们要开启它,设置一个合理的阈值,例如下面命令中设置的100ms:
- 127.0.0.1:6379> CONFIG SET latency-monitor-threshold 100
- OK
复制代码
因为Redis执行命令非常快,所以我们用DEBUG命令人为制造一些慢执行命令:
- 127.0.0.1:6379> debug sleep 2
- OK
- (2.00s)
- 127.0.0.1:6379> debug sleep .15
- OK
- 127.0.0.1:6379> debug sleep .5
- OK
复制代码
下面就用LATENCY的各种子命令来查看延迟记录:
LATEST:四列分别表示事件名、最近延迟的Unix时间戳、最近的延迟、最大延迟。
HISTORY:延迟的时间序列。可用来产生图形化显示或报表。
GRAPH:以图形化的方式显示。最下面以竖行显示的是指延迟在多久以前发生。
RESET:清除延迟记录。
- 127.0.0.1:6379> latency latest
- 1) 1) "command"
- 2) (integer) 1439358778
- 3) (integer) 500
- 4) (integer) 2000
- 127.0.0.1:6379> latency history command
- 1) 1) (integer) 1439358773
- 2) (integer) 2000
- 2) 1) (integer) 1439358776
- 2) (integer) 150
- 3) 1) (integer) 1439358778
- 2) (integer) 500
- 127.0.0.1:6379> latency graph command
- command - high 2000 ms, low 150 ms (all time high 2000 ms)
- --------------------------------------------------------------------------------
- #
- |
- |
- |_#
- 666
- mmm
复制代码
在执行一条DEBUG命令会发现GRAPH图的变化,多出一条新的柱状线,下面的时间2s就是指延迟刚发生两秒钟:
- 127.0.0.1:6379> debug sleep 1.5
- OK
- (1.50s)
- 127.0.0.1:6379> latency graph command
- command - high 2000 ms, low 150 ms (all time high 2000 ms)
- --------------------------------------------------------------------------------
- #
- | #
- | |
- |_#|
- 2222
- 333s
- mmm
复制代码
还有一个有趣的子命令DOCTOR,它能列出一些指导建议,例如开启慢日志进一步追查问题原因,查看是否有大对象被踢出或过期,以及操作系统的配置建议等。
- 127.0.0.1:6379> latency doctor
- Dave, I have observed latency spikes in this Redis instance. You don't mind talking about it, do you Dave?
- 1. command: 3 latency spikes (average 883ms, mean deviation 744ms, period 210.00 sec). Worst all time event 2000ms.
- I have a few advices for you:
- - Check your Slow Log to understand what are the commands you are running which are too slow to execute. Please check http://redis.io/commands/slowlog for more information.
- - Deleting, expiring or evicting (because of maxmemory policy) large objects is a blocking operation. If you have very large objects that are often deleted, expired, or evicted, try to fragment those objects into multiple smaller objects.
- - I detected a non zero amount of anonymous huge pages used by your process. This creates very serious latency events in different conditions, especially when Redis is persisting on disk. To disable THP support use the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled', make sure to also add it into /etc/rc.local so that the command will be executed again after a reboot. Note that even if you have already disabled THP, you still need to restart the Redis process to get rid of the huge pages already created.
复制代码
2.2.3 度量延迟Baseline
延迟中的一部分是来自环境的,比如操作系统内核、虚拟化环境等等。Redis提供了让我们度量这一部分延迟基线(Baseline)的方法:
- [root@vm redis-3.0.3]# src/redis-cli --intrinsic-latency 100 -h 127.0.0.1
- Max latency so far: 2 microseconds.
- Max latency so far: 3 microseconds.
- Max latency so far: 26 microseconds.
- Max latency so far: 37 microseconds.
- Max latency so far: 1179 microseconds.
- Max latency so far: 1623 microseconds.
- Max latency so far: 1795 microseconds.
- Max latency so far: 2142 microseconds.
- 35818026 total runs (avg latency: 2.7919 microseconds / 27918.90 nanoseconds per run).
- Worst run took 767x longer than the average latency.
复制代码
–intrinsic-latency后面是测试的时长(秒),一般100秒足够了。
本文作者:geelou
本文来自云栖社区合作伙伴rediscn,了解相关信息可以关注redis.cn网站。