【原创】记录几个最近遇到的未解问题(resolved)

本文涉及的产品
Redis 开源版,标准版 2GB
推荐场景:
搭建游戏排行榜
云数据库 Tair(兼容Redis),内存型 2GB
简介:

问题一:ejabberd 持续 crashdump

[root@upucore_105 logs]# ls *.dump
erl_crash_20160420-023053.dump  erl_crash_20160420-142431.dump  erl_crash_20160421-012212.dump  erl_crash_20160421-051153.dump  erl_crash_20160421-070845.dump  erl_crash_20160421-122015.dump
erl_crash_20160420-024120.dump  erl_crash_20160420-142731.dump  erl_crash_20160421-012310.dump  erl_crash_20160421-052453.dump  erl_crash_20160421-071828.dump  erl_crash_20160421-123153.dump
erl_crash_20160420-024331.dump  erl_crash_20160420-143215.dump  erl_crash_20160421-013453.dump  erl_crash_20160421-052753.dump  erl_crash_20160421-074435.dump  erl_crash_20160421-124029.dump
erl_crash_20160420-024823.dump  erl_crash_20160420-143324.dump  erl_crash_20160421-013552.dump  erl_crash_20160421-053627.dump  erl_crash_20160421-081136.dump  erl_crash_20160421-131853.dump
erl_crash_20160420-032253.dump  erl_crash_20160420-150153.dump  erl_crash_20160421-014439.dump  erl_crash_20160421-054022.dump  erl_crash_20160421-084953.dump  erl_crash_20160421-132339.dump
erl_crash_20160420-034503.dump  erl_crash_20160420-160153.dump  erl_crash_20160421-014537.dump  erl_crash_20160421-054153.dump  erl_crash_20160421-085503.dump  erl_crash_20160421-134852.dump
erl_crash_20160420-040853.dump  erl_crash_20160420-161253.dump  erl_crash_20160421-014636.dump  erl_crash_20160421-055355.dump  erl_crash_20160421-085701.dump  erl_crash_20160421-140313.dump
erl_crash_20160420-041253.dump  erl_crash_20160420-163453.dump  erl_crash_20160421-014953.dump  erl_crash_20160421-060534.dump  erl_crash_20160421-085953.dump  erl_crash_20160421-141056.dump
erl_crash_20160420-050153.dump  erl_crash_20160420-174102.dump  erl_crash_20160421-020359.dump  erl_crash_20160421-060753.dump  erl_crash_20160421-090453.dump  erl_crash_20160421-141453.dump
erl_crash_20160420-074907.dump  erl_crash_20160420-180053.dump  erl_crash_20160421-021247.dump  erl_crash_20160421-061418.dump  erl_crash_20160421-090838.dump  erl_crash_20160421-151653.dump
erl_crash_20160420-080959.dump  erl_crash_20160420-184253.dump  erl_crash_20160421-022715.dump  erl_crash_20160421-061714.dump  erl_crash_20160421-092406.dump  erl_crash_20160421-152315.dump
erl_crash_20160420-085042.dump  erl_crash_20160420-191953.dump  erl_crash_20160421-023503.dump  erl_crash_20160421-062011.dump  erl_crash_20160421-092756.dump  erl_crash_20160421-153753.dump
erl_crash_20160420-091353.dump  erl_crash_20160420-194153.dump  erl_crash_20160421-024154.dump  erl_crash_20160421-062307.dump  erl_crash_20160421-092953.dump  erl_crash_20160421-154453.dump
erl_crash_20160420-093153.dump  erl_crash_20160420-223553.dump  erl_crash_20160421-025234.dump  erl_crash_20160421-062953.dump  erl_crash_20160421-094516.dump  erl_crash_20160421-160301.dump
erl_crash_20160420-102753.dump  erl_crash_20160420-231953.dump  erl_crash_20160421-025332.dump  erl_crash_20160421-063853.dump  erl_crash_20160421-102153.dump  erl_crash_20160421-160653.dump
erl_crash_20160420-103453.dump  erl_crash_20160421-003149.dump  erl_crash_20160421-031945.dump  erl_crash_20160421-064523.dump  erl_crash_20160421-103326.dump  erl_crash_20160421-171823.dump
erl_crash_20160420-104653.dump  erl_crash_20160421-003253.dump  erl_crash_20160421-033617.dump  erl_crash_20160421-064820.dump  erl_crash_20160421-111837.dump  erl_crash_20160421-173053.dump
erl_crash_20160420-112753.dump  erl_crash_20160421-004228.dump  erl_crash_20160421-041015.dump  erl_crash_20160421-065116.dump  erl_crash_20160421-112953.dump  erl_crash_20160421-173953.dump
erl_crash_20160420-115008.dump  erl_crash_20160421-005857.dump  erl_crash_20160421-042247.dump  erl_crash_20160421-065803.dump  erl_crash_20160421-115902.dump  erl_crash_20160421-180145.dump
erl_crash_20160420-134303.dump  erl_crash_20160421-005956.dump  erl_crash_20160421-042642.dump  erl_crash_20160421-070059.dump  erl_crash_20160421-120853.dump  erl_crash_20160421-181715.dump
erl_crash_20160420-140853.dump  erl_crash_20160421-010153.dump  erl_crash_20160421-044117.dump  erl_crash_20160421-070356.dump  erl_crash_20160421-121233.dump
[root@upucore_105 logs]# 
[root@upucore_105 logs]# ls *.dump|wc -l
125
[root@upucore_105 logs]# 
[root@upucore_105 logs]# date
Thu Apr 21 18:21:54 CST 2016
[root@upucore_105 logs]# 
[root@upucore_105 logs]# for I in *.dump; do grep "Slogan" $I; echo "----"; done     
Slogan: Kernel pid terminated (application_controller) ({application_start_failure,kernel,{shutdown,{kernel,start,[normal,[]]}}})
----
Slogan: Kernel pid terminated (application_controller) ({application_start_failure,kernel,{shutdown,{kernel,start,[normal,[]]}}})
----
Slogan: Kernel pid terminated (application_controller) ({application_start_failure,kernel,{shutdown,{kernel,start,[normal,[]]}}})
----
...
----
Slogan: Kernel pid terminated (application_controller) ({application_start_failure,kernel,{shutdown,{kernel,start,[normal,[]]}}})
----
[root@upucore_105 logs]# 

[root@upucore_105 logs]# ps -A -o args,stime,etime |grep ejabberd
/usr/local/mo_ejabberd/bin/ Apr20  1-15:42:20
...
可以看到,ejabberd 是 4 月 20 日启动的,持续运行了一天多,生成了 125 个 crashdump 文件,但 ejabberd 进程还在。  
除了上述错误信息外,之前还看到下面这种  
Slogan: init terminating in do_boot ()
结论:    可以参考 erlang 手册中关于 erl_crash.dump 的相关说明,截图如下:  
 
怀疑运行环境中存在版本不一致问题。  

问题二:redis 服务被不断 shutdown

_._                                                  
           _.-``__ ''-._                                             
      _.-``    `.  `_.  ''-._           Redis 2.8.18 (00000000/0) 64 bit
  .-`` .-```.  ```\/    _.,_ ''-._                                   
 (    '      ,       .-`  | `,    )     Running in stand alone mode
 |`-._`-...-` __...-.``-._|'` _.-'|     Port: 6379
 |    `-._   `._    /     _.-'    |     PID: 204410
  `-._    `-._  `-./  _.-'    _.-'                                   
 |`-._`-._    `-.__.-'    _.-'_.-'|                                  
 |    `-._`-._        _.-'_.-'    |           http://redis.io        
  `-._    `-._`-.__.-'_.-'    _.-'                                   
 |`-._`-._    `-.__.-'    _.-'_.-'|                                  
 |    `-._`-._        _.-'_.-'    |                                  
  `-._    `-._`-.__.-'_.-'    _.-'                                   
      `-._    `-.__.-'    _.-'                                       
          `-._        _.-'                                           
              `-.__.-'                                               

[204410] 19 Apr 18:27:35.131 # Server started, Redis version 2.8.18
[204410] 19 Apr 18:27:35.132 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
[204410] 19 Apr 18:27:35.132 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.
[204410] 19 Apr 18:27:35.132 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
[204410] 19 Apr 18:27:35.161 - Accepted 127.0.0.1:16364
[204410] 19 Apr 18:27:35.166 * DB loaded from disk: 0.034 seconds
[204410] 19 Apr 18:27:35.166 * The server is now ready to accept connections on port 6379
[204410] 19 Apr 18:27:35.166 - Client closed connection
[204410] 19 Apr 18:27:35.166 - DB 0: 13573 keys (0 volatile) in 16384 slots HT.
[204410] 19 Apr 18:27:35.167 - 0 clients connected (0 slaves), 4797704 bytes in use
[204410] 19 Apr 18:27:35.223 - Accepted 172.16.186.205:17311
[204410] 19 Apr 18:27:36.078 - Accepted 172.16.186.203:16992
[204410] 19 Apr 18:27:36.078 * Slave 172.16.186.203:6379 asks for synchronization
[204410] 19 Apr 18:27:36.079 * Partial resynchronization not accepted: Runid mismatch (Client asked for 'f88ffa4c476425b22a0c1b56932937669b795c0f', I'm '0c4731011b0b911b000c1d70fdc3f907f76ce180')
[204410] 19 Apr 18:27:36.079 * Starting BGSAVE for SYNC with target: disk
[204410] 19 Apr 18:27:36.080 * Background saving started by pid 204415
[204415] 19 Apr 18:27:36.137 * DB saved on disk
[204415] 19 Apr 18:27:36.137 * RDB: 10 MB of memory used by copy-on-write
[204410] 19 Apr 18:27:36.168 * Background saving terminated with success
[204410] 19 Apr 18:27:36.190 * Synchronization with slave 172.16.186.203:6379 succeeded
[204410] 19 Apr 18:27:38.167 - Accepted 127.0.0.1:16418
[204410] 19 Apr 18:27:38.168 - Client closed connection
[204410] 19 Apr 18:27:38.172 - Accepted 172.16.186.205:10391
[204410] 19 Apr 18:27:38.172 - Client closed connection
[204410] 19 Apr 18:27:40.173 - DB 0: 13573 keys (0 volatile) in 16384 slots HT.
[204410] 19 Apr 18:27:40.173 - 1 clients connected (1 slaves), 5924936 bytes in use
[204410] 19 Apr 18:27:45.180 - DB 0: 13573 keys (0 volatile) in 16384 slots HT.
[204410] 19 Apr 18:27:45.180 - 1 clients connected (1 slaves), 5924648 bytes in use
[204410] 19 Apr 18:27:50.190 - DB 0: 13575 keys (0 volatile) in 16384 slots HT.
[204410] 19 Apr 18:27:50.191 - 1 clients connected (1 slaves), 5925024 bytes in use
[204410] 19 Apr 18:27:55.199 - DB 0: 13575 keys (0 volatile) in 16384 slots HT.
[204410] 19 Apr 18:27:55.199 - 1 clients connected (1 slaves), 5926848 bytes in use
[204410] 19 Apr 18:27:58.194 - Client closed connection
[204410] 19 Apr 18:27:58.194 # Connection with slave 172.16.186.203:6379 lost.
[204410] 19 Apr 18:28:00.208 - DB 0: 13575 keys (0 volatile) in 16384 slots HT.
[204410] 19 Apr 18:28:00.208 - 1 clients connected (0 slaves), 5870736 bytes in use
[204410] 19 Apr 18:28:02.248 - Accepted 172.16.186.203:17514
[204410] 19 Apr 18:28:02.248 * Slave 172.16.186.203:6379 asks for synchronization
[204410] 19 Apr 18:28:02.248 * Full resync requested by slave 172.16.186.203:6379
[204410] 19 Apr 18:28:02.248 * Starting BGSAVE for SYNC with target: disk
[204410] 19 Apr 18:28:02.250 * Background saving started by pid 205002
[205002] 19 Apr 18:28:02.307 * DB saved on disk
[205002] 19 Apr 18:28:02.308 * RDB: 12 MB of memory used by copy-on-write
[204410] 19 Apr 18:28:02.311 * Background saving terminated with success
[204410] 19 Apr 18:28:02.332 * Synchronization with slave 172.16.186.203:6379 succeeded
[204410] 19 Apr 18:28:05.216 - DB 0: 13575 keys (0 volatile) in 16384 slots HT.
[204410] 19 Apr 18:28:05.216 - 1 clients connected (1 slaves), 5891664 bytes in use
[204410] 19 Apr 18:28:10.225 - DB 0: 13575 keys (0 volatile) in 16384 slots HT.
[204410] 19 Apr 18:28:10.225 - 1 clients connected (1 slaves), 5891664 bytes in use
[204410] 19 Apr 18:28:15.233 - DB 0: 13575 keys (0 volatile) in 16384 slots HT.
[204410] 19 Apr 18:28:15.233 - 1 clients connected (1 slaves), 5891664 bytes in use
[204410] 19 Apr 18:28:20.239 - DB 0: 13575 keys (0 volatile) in 16384 slots HT.
[204410] 19 Apr 18:28:20.239 - 1 clients connected (1 slaves), 5891664 bytes in use
[204410] 19 Apr 18:28:25.246 - DB 0: 13575 keys (0 volatile) in 16384 slots HT.
[204410] 19 Apr 18:28:25.246 - 1 clients connected (1 slaves), 5891664 bytes in use
[204410] 19 Apr 18:28:30.254 - DB 0: 13575 keys (0 volatile) in 16384 slots HT.
[204410] 19 Apr 18:28:30.254 - 1 clients connected (1 slaves), 5891664 bytes in use
[204410] 19 Apr 18:28:33.507 - Accepted 127.0.0.1:17448
[204410] 19 Apr 18:28:33.507 # User requested shutdown...
[204410] 19 Apr 18:28:33.508 * Saving the final RDB snapshot before exiting.
[204410] 19 Apr 18:28:33.568 * DB saved on disk
[204410] 19 Apr 18:28:33.568 * Removing the pid file.
[204410] 19 Apr 18:28:33.568 # Redis is now ready to exit, bye bye...
                _._                                                  
           _.-``__ ''-._                                             
      _.-``    `.  `_.  ''-._           Redis 2.8.18 (00000000/0) 64 bit
  .-`` .-```.  ```\/    _.,_ ''-._                                   
 (    '      ,       .-`  | `,    )     Running in stand alone mode
 |`-._`-...-` __...-.``-._|'` _.-'|     Port: 6379
 |    `-._   `._    /     _.-'    |     PID: 206040
  `-._    `-._  `-./  _.-'    _.-'                                   
 |`-._`-._    `-.__.-'    _.-'_.-'|                                  
 |    `-._`-._        _.-'_.-'    |           http://redis.io        
  `-._    `-._`-.__.-'_.-'    _.-'                                   
 |`-._`-._    `-.__.-'    _.-'_.-'|                                  
 |    `-._`-._        _.-'_.-'    |                                  
  `-._    `-._`-.__.-'_.-'    _.-'                                   
      `-._    `-.__.-'    _.-'                                       
          `-._        _.-'                                           
              `-.__.-'                                               

[206040] 19 Apr 18:28:33.580 # Server started, Redis version 2.8.18
[206040] 19 Apr 18:28:33.580 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
[206040] 19 Apr 18:28:33.580 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.
[206040] 19 Apr 18:28:33.580 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
[206040] 19 Apr 18:28:33.605 - Accepted 127.0.0.1:17454
[206040] 19 Apr 18:28:33.610 * DB loaded from disk: 0.030 seconds
[206040] 19 Apr 18:28:33.610 * The server is now ready to accept connections on port 6379
[206040] 19 Apr 18:28:33.610 - Client closed connection
[206040] 19 Apr 18:28:33.610 - DB 0: 13575 keys (0 volatile) in 16384 slots HT.
[206040] 19 Apr 18:28:33.611 - 0 clients connected (0 slaves), 4798104 bytes in use
[206040] 19 Apr 18:28:33.671 - Accepted 172.16.186.205:16672
[206040] 19 Apr 18:28:34.341 - Accepted 172.16.186.203:18122
[206040] 19 Apr 18:28:34.342 * Slave 172.16.186.203:6379 asks for synchronization
[206040] 19 Apr 18:28:34.342 * Partial resynchronization not accepted: Runid mismatch (Client asked for '0c4731011b0b911b000c1d70fdc3f907f76ce180', I'm '2ae45a0020a36a175b290e23f04672c54fc7fdef')
[206040] 19 Apr 18:28:34.342 * Starting BGSAVE for SYNC with target: disk
[206040] 19 Apr 18:28:34.344 * Background saving started by pid 206049
[206049] 19 Apr 18:28:34.397 * DB saved on disk
[206049] 19 Apr 18:28:34.398 * RDB: 10 MB of memory used by copy-on-write
[206040] 19 Apr 18:28:34.413 * Background saving terminated with success
[206040] 19 Apr 18:28:34.436 * Synchronization with slave 172.16.186.203:6379 succeeded
[206040] 19 Apr 18:28:36.610 - Accepted 127.0.0.1:17513
[206040] 19 Apr 18:28:36.611 - Client closed connection
[206040] 19 Apr 18:28:36.615 - Accepted 172.16.186.205:11486
[206040] 19 Apr 18:28:36.615 - Client closed connection
[206040] 19 Apr 18:28:38.621 - DB 0: 13575 keys (0 volatile) in 16384 slots HT.
[206040] 19 Apr 18:28:38.621 - 1 clients connected (1 slaves), 5889264 bytes in use
[206040] 19 Apr 18:28:43.627 - DB 0: 13575 keys (0 volatile) in 16384 slots HT.
[206040] 19 Apr 18:28:43.627 - 1 clients connected (1 slaves), 5888208 bytes in use
[206040] 19 Apr 18:28:48.635 - DB 0: 13575 keys (0 volatile) in 16384 slots HT.
[206040] 19 Apr 18:28:48.636 - 1 clients connected (1 slaves), 5888208 bytes in use
[206040] 19 Apr 18:28:53.577 - Client closed connection
[206040] 19 Apr 18:28:53.577 # Connection with slave 172.16.186.203:6379 lost.
[206040] 19 Apr 18:28:53.645 - DB 0: 13572 keys (0 volatile) in 16384 slots HT.
[206040] 19 Apr 18:28:53.645 - 1 clients connected (0 slaves), 5870128 bytes in use
[206040] 19 Apr 18:28:57.632 - Accepted 172.16.186.203:18604
[206040] 19 Apr 18:28:57.632 * Slave 172.16.186.203:6379 asks for synchronization
[206040] 19 Apr 18:28:57.632 * Full resync requested by slave 172.16.186.203:6379
[206040] 19 Apr 18:28:57.633 * Starting BGSAVE for SYNC with target: disk
[206040] 19 Apr 18:28:57.634 * Background saving started by pid 206569
[206569] 19 Apr 18:28:57.690 * DB saved on disk
[206569] 19 Apr 18:28:57.691 * RDB: 12 MB of memory used by copy-on-write
[206040] 19 Apr 18:28:57.752 * Background saving terminated with success
[206040] 19 Apr 18:28:57.773 * Synchronization with slave 172.16.186.203:6379 succeeded
[206040] 19 Apr 18:28:58.653 - DB 0: 13572 keys (0 volatile) in 16384 slots HT.
[206040] 19 Apr 18:28:58.653 - 1 clients connected (1 slaves), 5929296 bytes in use
[206040] 19 Apr 18:29:03.661 - DB 0: 13572 keys (0 volatile) in 16384 slots HT.
[206040] 19 Apr 18:29:03.661 - 1 clients connected (1 slaves), 5930992 bytes in use
[206040] 19 Apr 18:29:08.670 - DB 0: 13572 keys (0 volatile) in 16384 slots HT.
[206040] 19 Apr 18:29:08.670 - 1 clients connected (1 slaves), 5930992 bytes in use
[206040] 19 Apr 18:29:13.679 - DB 0: 13572 keys (0 volatile) in 16384 slots HT.
[206040] 19 Apr 18:29:13.679 - 1 clients connected (1 slaves), 5930992 bytes in use
[206040] 19 Apr 18:29:18.689 - DB 0: 13572 keys (0 volatile) in 16384 slots HT.
[206040] 19 Apr 18:29:18.689 - 1 clients connected (1 slaves), 5930992 bytes in use
[206040] 19 Apr 18:29:23.698 - DB 0: 13572 keys (0 volatile) in 16384 slots HT.
[206040] 19 Apr 18:29:23.698 - 1 clients connected (1 slaves), 5930992 bytes in use
[206040] 19 Apr 18:29:28.706 - DB 0: 13572 keys (0 volatile) in 16384 slots HT.
[206040] 19 Apr 18:29:28.706 - 1 clients connected (1 slaves), 5930992 bytes in use
[206040] 19 Apr 18:29:33.715 - DB 0: 13572 keys (0 volatile) in 16384 slots HT.
[206040] 19 Apr 18:29:33.715 - 1 clients connected (1 slaves), 5930992 bytes in use
[206040] 19 Apr 18:29:38.580 - Accepted 127.0.0.1:18538
[206040] 19 Apr 18:29:38.581 # User requested shutdown...
[206040] 19 Apr 18:29:38.581 * Saving the final RDB snapshot before exiting.
[206040] 19 Apr 18:29:38.636 * DB saved on disk
[206040] 19 Apr 18:29:38.636 * Removing the pid file.
[206040] 19 Apr 18:29:38.636 # Redis is now ready to exit, bye bye...
                _._                                                  
           _.-``__ ''-._                                             
      _.-``    `.  `_.  ''-._           Redis 2.8.18 (00000000/0) 64 bit
  .-`` .-```.  ```\/    _.,_ ''-._                                   
 (    '      ,       .-`  | `,    )     Running in stand alone mode
 |`-._`-...-` __...-.``-._|'` _.-'|     Port: 6379
 |    `-._   `._    /     _.-'    |     PID: 207901
  `-._    `-._  `-./  _.-'    _.-'                                   
 |`-._`-._    `-.__.-'    _.-'_.-'|                                  
 |    `-._`-._        _.-'_.-'    |           http://redis.io        
  `-._    `-._`-.__.-'_.-'    _.-'                                   
 |`-._`-._    `-.__.-'    _.-'_.-'|                                  
 |    `-._`-._        _.-'_.-'    |                                  
  `-._    `-._`-.__.-'_.-'    _.-'                                   
      `-._    `-.__.-'    _.-'                                       
          `-._        _.-'                                           
              `-.__.-'                                               

[207901] 19 Apr 18:29:38.649 # Server started, Redis version 2.8.18
通过针对抓包,日志,和网络连接等进行分析,目前得出如下结论
  • shutdown 命令来自于 127.0.0.1 <-> 127.0.0.1 的 TCP 连接;
  • redis 接收 shutdown 后会自行关闭 socket ,所以 TIME_WAIT 状态在 redis 侧;
  • 每 55 秒左右 shutdown 一次;
结论:之前就怀疑是由于运维人员的脚本检测导致的问题,结果不幸命中~~

该脚本用于进行配置信息检测和变更
...
if [ "$backup_status"x = "0"x ];then
	grep -q "^bind 127.0.0.1 $RedisLocalInnerIp" $path2 || {
		sed -i "/^bind /c\bind 127.0.0.1 $RedisLocalInnerIp" $path2
		./start.sh	 
	}
else
	grep -q "^bind $VIP 127.0.0.1 $RedisLocalInnerIp" $path2 || {
		sed -i "/^bind /c\bind $VIP 127.0.0.1 $RedisLocalInnerIp" $path2
		./start.sh
	}
fi
...
该脚本用于检测 redis 进程运行情况(在某些检测状态下进行强杀)
...
cmd="ps aux|grep "/usr/local/redis/bin/redis-server"|grep -v grep|wc -l"
proc=$(eval $cmd)
if [ $proc == "1" ]; then
	/usr/local/redis/bin/redis-cli shutdown
elif [ $proc == "0" ]; then
	continue
else
	redis_pids=$(pidof /usr/local/redis/bin/redis-server)
	[ -z "$redis_pids" ] && echo "redis is not running" || (kill -9 $redis_pids && echo "$date redis is killed by stop.sh" >> $logpath)
fi
...
原因:在第一个脚本中针对配置检测的命令存在错误(上面已修正,错误太低级就不贴了),导致一直认为配置存在问题,进而在一定检测周期之后,重启 redis 。
[root@xnu_205 redis]#
[root@xnu_205 redis]# strace -tt -s 1024 -p 12936
Process 12936 attached
22:50:14.894121 epoll_wait(3, {}, 10128, 3) = 0
22:50:14.897436 open("/proc/12936/stat", O_RDONLY) = 7
22:50:14.897566 read(7, "12936 (redis-server) R 1 12936 12936 0 -1 4202816 768 0 0 0 6 1 0 0 20 0 3 0 508818644 41508864 2406 18446744073709551615 4194304 5108532 140726224740688 140726224735808 241015252957 0 0 4097 17610 18446744073709551615 0 0 17 12 0 0 0 0 0\n", 4096) = 239
22:50:14.897691 close(7)                = 0
...
22:50:56.525098 open("/usr/log/redis/redis.log", O_WRONLY|O_CREAT|O_APPEND, 0666) = 7
22:50:56.525176 fstat(7, {st_mode=S_IFREG|0644, st_size=1320763367, ...}) = 0
22:50:56.525245 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fdd373fd000
22:50:56.525345 fstat(7, {st_mode=S_IFREG|0644, st_size=1320763367, ...}) = 0
22:50:56.525429 lseek(7, 1320763367, SEEK_SET) = 1320763367
22:50:56.525494 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=388, ...}) = 0
22:50:56.525594 write(7, "[12936] 15 Apr 22:50:56.525 - 1 clients connected (0 slaves), 3429072 bytes in use\n", 83) = 83
22:50:56.525732 close(7)                = 0
22:50:56.525803 munmap(0x7fdd373fd000, 4096) = 0
22:50:56.525892 epoll_ctl(3, EPOLL_CTL_MOD, 6, {EPOLLIN|EPOLLOUT, {u32=6, u64=6}}) = 0
22:50:56.525986 epoll_wait(3, {{EPOLLOUT, {u32=6, u64=6}}}, 10128, 100) = 1
22:50:56.526056 write(6, "*3\r\n$8\r\nREPLCONF\r\n$3\r\nACK\r\n$5\r\n51047\r\n", 38) = 38
22:50:56.526148 epoll_ctl(3, EPOLL_CTL_MOD, 6, {EPOLLIN, {u32=6, u64=6}}) = 0
22:50:56.526228 epoll_wait(3, {{EPOLLIN, {u32=6, u64=6}}}, 10128, 99) = 1
22:50:56.563906 read(6, "*1\r\n$5\r\nMULTI\r\n*1\r\n$4\r\nEXEC\r\n", 16384) = 29
22:50:56.564249 epoll_wait(3, {{EPOLLIN, {u32=6, u64=6}}}, 10128, 61) = 1
22:50:56.565874 read(6, "*1\r\n$5\r\nMULTI\r\n*2\r\n$3\r\nDEL\r\n$33\r\ncollector:00:0C:29:DA:3B:48:timer\r\n*1\r\n$4\r\nEXEC\r\n", 16384) = 82
22:50:56.566160 epoll_wait(3, {{EPOLLIN, {u32=6, u64=6}}}, 10128, 59) = 1
22:50:56.567651 read(6, "*1\r\n$5\r\nMULTI\r\n*2\r\n$3\r\nDEL\r\n$32\r\ncollector:00:0C:29:DA:3B:48:info\r\n*1\r\n$4\r\nEXEC\r\n", 16384) = 81
22:50:56.567953 epoll_wait(3, {{EPOLLIN, {u32=6, u64=6}}}, 10128, 58) = 1
22:50:56.569163 read(6, "*1\r\n$5\r\nMULTI\r\n*3\r\n$4\r\nSREM\r\n$9\r\ncollector\r\n$17\r\n00:0C:29:DA:3B:48\r\n*1\r\n$4\r\nEXEC\r\n", 16384) = 82
22:50:56.569281 epoll_wait(3, {}, 10128, 56) = 0
22:50:56.625464 open("/proc/12936/stat", O_RDONLY) = 7
22:50:56.625566 read(7, "12936 (redis-server) R 1 12936 12936 0 -1 4202816 790 0 0 0 9 7 0 0 20 0 3 0 508818644 41508864 2410 18446744073709551615 4194304 5108532 140726224740688 140726224735808 241015252957 0 0 4097 17610 18446744073709551615 0 0 17 0 0 0 0 0 0\n", 4096) = 238
22:50:56.625656 close(7)                = 0
22:50:56.625765 epoll_wait(3, {}, 10128, 100) = 0
22:50:56.725989 open("/proc/12936/stat", O_RDONLY) = 7
22:50:56.726090 read(7, "12936 (redis-server) R 1 12936 12936 0 -1 4202816 790 0 0 0 9 7 0 0 20 0 3 0 508818644 41508864 2410 18446744073709551615 4194304 5108532 140726224740688 140726224735808 241015252957 0 0 4097 17610 18446744073709551615 0 0 17 0 0 0 0 0 0\n", 4096) = 238
22:50:56.726185 close(7)                = 0

22:50:56.726261 epoll_wait(3, {{EPOLLIN, {u32=5, u64=5}}}, 10128, 100) = 1
22:50:56.731228 accept(5, {sa_family=AF_INET, sin_port=htons(13397), sin_addr=inet_addr("127.0.0.1")}, [16]) = 7
22:50:56.731359 open("/usr/log/redis/redis.log", O_WRONLY|O_CREAT|O_APPEND, 0666) = 8
22:50:56.731438 fstat(8, {st_mode=S_IFREG|0644, st_size=1320763450, ...}) = 0
22:50:56.731502 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fdd373fd000
22:50:56.731565 fstat(8, {st_mode=S_IFREG|0644, st_size=1320763450, ...}) = 0
22:50:56.731620 lseek(8, 1320763450, SEEK_SET) = 1320763450
22:50:56.731693 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=388, ...}) = 0
22:50:56.731800 write(8, "[12936] 15 Apr 22:50:56.731 - Accepted 127.0.0.1:13397\n", 55) = 55
22:50:56.731896 close(8)                = 0
22:50:56.731951 munmap(0x7fdd373fd000, 4096) = 0
22:50:56.732024 fcntl(7, F_GETFL)       = 0x2 (flags O_RDWR)
22:50:56.732076 fcntl(7, F_SETFL, O_RDWR|O_NONBLOCK) = 0
22:50:56.732126 setsockopt(7, SOL_TCP, TCP_NODELAY, [1], 4) = 0
22:50:56.732183 epoll_ctl(3, EPOLL_CTL_ADD, 7, {EPOLLIN, {u32=7, u64=7}}) = 0
22:50:56.732246 accept(5, 0x7ffd60a2de70, [128]) = -1 EAGAIN (Resource temporarily unavailable)
22:50:56.732343 epoll_wait(3, {{EPOLLIN, {u32=7, u64=7}}}, 10128, 94) = 1

收到 shutdown 命令
22:50:56.732405 read(7, "*1\r\n$8\r\nshutdown\r\n", 16384) = 18
22:50:56.732477 open("/usr/log/redis/redis.log", O_WRONLY|O_CREAT|O_APPEND, 0666) = 8
22:50:56.732538 fstat(8, {st_mode=S_IFREG|0644, st_size=1320763505, ...}) = 0
22:50:56.732592 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fdd373fd000
22:50:56.732648 fstat(8, {st_mode=S_IFREG|0644, st_size=1320763505, ...}) = 0
22:50:56.732735 lseek(8, 1320763505, SEEK_SET) = 1320763505
22:50:56.732827 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=388, ...}) = 0

22:50:56.732925 write(8, "[12936] 15 Apr 22:50:56.732 # User requested shutdown...\n", 57) = 57
22:50:56.733010 close(8)                = 0
22:50:56.733064 munmap(0x7fdd373fd000, 4096) = 0
22:50:56.733134 open("/usr/log/redis/redis.log", O_WRONLY|O_CREAT|O_APPEND, 0666) = 8
22:50:56.733200 fstat(8, {st_mode=S_IFREG|0644, st_size=1320763562, ...}) = 0
22:50:56.733261 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fdd373fd000
22:50:56.733338 fstat(8, {st_mode=S_IFREG|0644, st_size=1320763562, ...}) = 0
22:50:56.733403 lseek(8, 1320763562, SEEK_SET) = 1320763562
22:50:56.733490 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=388, ...}) = 0
22:50:56.733583 write(8, "[12936] 15 Apr 22:50:56.733 * Saving the final RDB snapshot before exiting.\n", 76) = 76
22:50:56.733664 close(8)                = 0
22:50:56.733734 munmap(0x7fdd373fd000, 4096) = 0
22:50:56.733805 open("temp-12936.rdb", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 8
22:50:56.733928 fstat(8, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
22:50:56.733982 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fdd373fd000

开始生成 RDB snapshot
22:50:56.734648 write(8, "REDIS0006\376\0\r\37terminal:1211110000225:baseinfo@~~\0\0\0s\0\0\0\10\0\0\4moid\6$5c97551e-9315-4b4b-b1b8-abd99bad5ca1&\vdomain_moid\r\30fya1iwz59u7s7xpzapyrq9an\32\4name\6\340a\6\310\373\31\1\0\0\n\4e164\6\340a\6\310\373\31\1\0\0\377\r\37terminal:3333330000803:baseinfo@~~\0\0\0s\0\0\0\10\0\0\4moid\6$7f1531e2-03a7-4ec9-a9c1-ad643c960c0c&\vdomain_moid\r\0302ttyq3nfet50d7m2iubntqmd\32\4name\6\340\243[\363\31\10\3\0\0\n\4e164\6\340\243[\363\31\10\3\0\0\377\r\37terminal:3333330000534:baseinfo@~~\0\0\0s\0\0\0\10\0\0\4moid\6$f9f27425-4321-45d0-b152-e97231326e16&\vdomain_moid\r\0302ttyq3nfet50d7m2iubntqmd\32\4name\6\340\226Z\363\31\10\3\0\0\n\4e164\6\340\226Z\363\31\10\3\0\0\377\r6terminal:caa5e471-ec55-41da-8d60-c39edfb52ba8:baseinfo@~~\0\0\0s\0\0\0\10\0\0\4moid\6$caa5e471-ec55-41da-8d60-c39edfb52ba8&\vdomain_moid\r\30fya1iwz59u7s7xpzapyrq9an\32\4name\6\340\217\5\310\373\31\1\0\0\n\4e164\6\340\217\5\310\373\31\1\0\0\377\r6terminal:14a22ddd-1cee-445c-b2b3-1fe78e434d9e:baseinfo@~~\0\0\0s\0\0\0\10\0\0\4moid\6$14a22ddd-1cee-445c-b2b3-1fe78e434d9e&\vdomain_moid\r\30fya1iwz59u7s7xpzapyrq9an\32\4name\6\340)\6\310\373\31\1\0\0\n\4e164\6\340)\6\310\373\31\1\0\0\377\r\37terminal:1211110004968:baseinfo@~~\0\0\0s\0\0\0\10\0\0\4moid\6$33c39589-7a57-4b4f-8400-16fc464cd286&\vdomain_moid\r\30fya1iwz59u7s7xpzapyrq9an\32\4name\6\340\350\30\310\373\31\1\0\0\n\4e164\6\340\350\30\310\373\31\1\0\0\377\r"..., 4096) = 4096
...
22:51:09.970594 write(8, "10001101:baseinfo@~~\0\0\0s\0\0\0\10\0\0\4moid\6$c53442cc-43ce-48e5-a48e-9b27143245aa&\vdomain_moid\r\30fya1iwz59u7s7xpzapyrq9an\32\4name\6\340\315\t\310\373\31\1\0\0\n\4e164\6\340\315\t\310\373\31\1\0\0\377\r6terminal:5c31d1df-5d74-443d-aa4a-6c39851be016:baseinfo@~~\0\0\0s\0\0\0\10\0\0\4moid\6$5c31d1df-5d74-443d-aa4a-6c39851be016&\vdomain_moid\r\0302ttyq3nfet50d7m2iubntqmd\32\4name\6\340<\\\363\31\10\3\0\0\n\4e164\6\340<\\\363\31\10\3\0\0\377\r6terminal:398b8203-8220-47cf-8ac5-ee4f84c3eea7:baseinfo@~~\0\0\0s\0\0\0\10\0\0\4moid\6$398b8203-8220-47cf-8ac5-ee4f84c3eea7&\vdomain_moid\r\30fya1iwz59u7s7xpzapyrq9an\32\4name\6\340 \n\310\373\31\1\0\0\n\4e164\6\340 \n\310\373\31\1\0\0\377\r\37terminal:1211110000206:baseinfo@~~\0\0\0s\0\0\0\10\0\0\4moid\6$fedd4c38-64ae-4123-87f4-c843fb0dafab&\vdomain_moid\r\30fya1iwz59u7s7xpzapyrq9an\32\4name\6\340N\6\310\373\31\1\0\0\n\4e164\6\340N\6\310\373\31\1\0\0\377\r\37terminal:1211110001987:baseinfo@~~\0\0\0s\0\0\0\10\0\0\4moid\6$b44341f7-e1f0-4748-9de0-df938d9ce39f&\vdomain_moid\r\30fya1iwz59u7s7xpzapyrq9an\32\4name\6\340C\r\310\373\31\1\0\0\n\4e164\6\340C\r\310\373\31\1\0\0\377\r6terminal:a8728c20-8af2-462d-9728-3a168a30e574:baseinfo@~~\0\0\0s\0\0\0\10\0\0\4moid\6$a8728c20-8af2-462d-9728-3a168a30e574&\vdomain_moid\r\0302ttyq3nfet50d7m2iubntqmd\32\4name\6\340\241\\\363\31\10\3\0\0\n\4e164\6\340\241\\\363\31\10\3\0\0\377\377|\345\264*"..., 1028) = 1028
22:51:09.970741 fsync(8)                = 0
22:51:09.979056 close(8)                = 0
22:51:09.979157 munmap(0x7fdd373fd000, 4096) = 0
22:51:09.979251 rename("temp-12936.rdb", "dump.rdb") = 0
22:51:09.980247 open("/usr/log/redis/redis.log", O_WRONLY|O_CREAT|O_APPEND, 0666) = 8
22:51:09.980338 fstat(8, {st_mode=S_IFREG|0644, st_size=1320763638, ...}) = 0
22:51:09.980410 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fdd373fd000
22:51:09.980481 fstat(8, {st_mode=S_IFREG|0644, st_size=1320763638, ...}) = 0
22:51:09.980545 lseek(8, 1320763638, SEEK_SET) = 1320763638
22:51:09.980611 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=388, ...}) = 0
22:51:09.980717 write(8, "[12936] 15 Apr 22:51:09.980 * DB saved on disk\n", 47) = 47
22:51:09.980822 close(8)                = 0
22:51:09.980885 munmap(0x7fdd373fd000, 4096) = 0
22:51:09.980964 open("/usr/log/redis/redis.log", O_WRONLY|O_CREAT|O_APPEND, 0666) = 8
22:51:09.981037 fstat(8, {st_mode=S_IFREG|0644, st_size=1320763685, ...}) = 0
22:51:09.981101 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fdd373fd000
22:51:09.981167 fstat(8, {st_mode=S_IFREG|0644, st_size=1320763685, ...}) = 0
22:51:09.981229 lseek(8, 1320763685, SEEK_SET) = 1320763685
22:51:09.981292 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=388, ...}) = 0
22:51:09.981380 write(8, "[12936] 15 Apr 22:51:09.981 * Removing the pid file.\n", 53) = 53
22:51:09.981462 close(8)                = 0
22:51:09.981522 munmap(0x7fdd373fd000, 4096) = 0
22:51:09.981599 unlink("/var/run/redis.pid") = 0
22:51:09.981756 close(4)                = 0
22:51:09.981847 close(5)                = 0
22:51:09.981939 open("/usr/log/redis/redis.log", O_WRONLY|O_CREAT|O_APPEND, 0666) = 4
22:51:09.982038 fstat(4, {st_mode=S_IFREG|0644, st_size=1320763738, ...}) = 0
22:51:09.982115 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fdd373fd000
22:51:09.982198 fstat(4, {st_mode=S_IFREG|0644, st_size=1320763738, ...}) = 0
22:51:09.982270 lseek(4, 1320763738, SEEK_SET) = 1320763738
22:51:09.982345 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=388, ...}) = 0
22:51:09.982477 write(4, "[12936] 15 Apr 22:51:09.982 # Redis is now ready to exit, bye bye...\n", 69) = 69
22:51:09.982575 close(4)                = 0
22:51:09.982636 munmap(0x7fdd373fd000, 4096) = 0
22:51:09.982764 exit_group(0)           = ?
22:51:09.984301 +++ exited with 0 +++
[root@xnu_205 redis]# 
[root@xnu_205 redis]#

问题三:终端设备通过 HTTP 协议经由 nginx 访问后端的 api 服务器时,TCP 连接行为诡异

...
Apr 20 18:21:48 localhost kernel: possible SYN flooding on port 80. Sending cookies.
Apr 20 18:24:37 localhost kernel: possible SYN flooding on port 80. Sending cookies.
Apr 20 18:25:50 localhost kernel: possible SYN flooding on port 80. Sending cookies.
Apr 20 18:27:02 localhost kernel: possible SYN flooding on port 80. Sending cookies.
Apr 20 18:29:01 localhost kernel: possible SYN flooding on port 80. Sending cookies.
Apr 20 18:30:14 localhost kernel: possible SYN flooding on port 80. Sending cookies.
Apr 20 18:31:28 localhost kernel: possible SYN flooding on port 80. Sending cookies.
Apr 20 18:32:44 localhost kernel: possible SYN flooding on port 80. Sending cookies.
Apr 20 18:35:33 localhost kernel: possible SYN flooding on port 80. Sending cookies.
Apr 20 18:37:06 localhost kernel: possible SYN flooding on port 80. Sending cookies.
Apr 20 18:37:52 localhost ntpd_intres[1732]: host name not found: 0.centos.pool.ntp.org
Apr 20 18:38:12 localhost ntpd_intres[1732]: host name not found: 1.centos.pool.ntp.org
Apr 20 18:38:20 localhost kernel: possible SYN flooding on port 80. Sending cookies.
Apr 20 18:38:32 localhost ntpd_intres[1732]: host name not found: 2.centos.pool.ntp.org
Apr 20 18:38:52 localhost ntpd_intres[1732]: host name not found: 3.centos.pool.ntp.org
Apr 20 18:39:29 localhost kernel: possible SYN flooding on port 80. Sending cookies.
Apr 20 18:40:43 localhost kernel: possible SYN flooding on port 80. Sending cookies.
...

相关实践学习
基于Redis实现在线游戏积分排行榜
本场景将介绍如何基于Redis数据库实现在线游戏中的游戏玩家积分排行榜功能。
云数据库 Redis 版使用教程
云数据库Redis版是兼容Redis协议标准的、提供持久化的内存数据库服务,基于高可靠双机热备架构及可无缝扩展的集群架构,满足高读写性能场景及容量需弹性变配的业务需求。 产品详情:https://www.aliyun.com/product/kvstore &nbsp; &nbsp; ------------------------------------------------------------------------- 阿里云数据库体验:数据库上云实战 开发者云会免费提供一台带自建MySQL的源数据库&nbsp;ECS 实例和一台目标数据库&nbsp;RDS实例。跟着指引,您可以一步步实现将ECS自建数据库迁移到目标数据库RDS。 点击下方链接,领取免费ECS&amp;RDS资源,30分钟完成数据库上云实战!https://developer.aliyun.com/adc/scenario/51eefbd1894e42f6bb9acacadd3f9121?spm=a2c6h.13788135.J_3257954370.9.4ba85f24utseFl
目录
相关文章
【原创】玩笑程序1,看看你是否能点中
【原创】玩笑程序1,看看你是否能点中
【原创】玩笑程序2,看看你是否能点中
【原创】玩笑程序2,看看你是否能点中
|
JavaScript
作者经历过的各种报错处理--干货持续分享
作者经历过的各种报错处理--干货持续分享
56 0
|
消息中间件 JavaScript 小程序
发现一个Spring事务的巨坑bug,可是官方都不承认?大家来评评理!
发现一个Spring事务的巨坑bug,可是官方都不承认?大家来评评理!
|
Web App开发 SQL 安全
项目作者操作不当,5.4 万 Star 归零;Go 1.18.1 发布 | 思否周刊
项目作者操作不当,5.4 万 Star 归零;Go 1.18.1 发布 | 思否周刊
140 0
|
IDE Java 程序员
我要狠狠的反驳“公司禁止使用 Lombok ”的观点!
经常在其它各个地方在说公司禁止使用Lombok,我一直不明白为什么不让用,今天看到一篇文章列举了一下“缺点”,这里我只想狠狠地反驳,看到列举的理由我竟无言以对。
我要狠狠的反驳“公司禁止使用 Lombok ”的观点!