
调整以下参数,可以大幅度改善Redis集群的稳定性: 为何大压力下要这样调整? 最重要的原因之一Redis的主从复制,两者复制共享同一线程,虽然是异步复制的,但因为是单线程,所以也十分有限。如果主从间的网络延迟不是在0.05左右,比如达到0.6,甚至1.2等,那么情况是非常糟糕的,因此同一Redis集群一定要部署在同一机房内。 这些参数的具体值,要视具体的压力而定,而且和消息的大小相关,比如一条200~500KB的流水数据可能比较大,主从复制的压力也会相应增大,而10字节左右的消息,则压力要小一些。大压力环境中开启appendfsync是十分不可取的,容易导致整个集群不可用,在不可用之前的典型表现是QPS毛刺明显。 这么做的目的是让Redis集群尽可能的避免master正常时触发主从切换,特别是容纳的数据量很大时,和大压力结合在一起,集群会雪崩。 当Redis日志中,出现大量如下信息,即可能意味着相关的参数需要调整了: 22135:M 06 Sep 14:17:05.388 * FAIL message received from 1d07e208db56cfd7395950ca66e03589278b8e12 about e438a338e9d9834a6745c12931950da87e360ca2 22135:M 06 Sep 14:17:07.551 * FAIL message received from ae8f6e7e0ab16b04414c8f3d08b58c0aa268b467 about d6eb06e9d118c120d3961a659972a1d0191a8652 22135:M 06 Sep 14:17:08.438 # Failover auth granted to f7d6b2c72fa3b801e7dcfe0219e73383d143dd0f for epoch 285 (We can vote for this slave) 有投票资格的node: 1)为master 2)至少有一个slot 3)投票node的epoch不能小于node自己当前的epoch(reqEpoch 4)node没有投票过该epoch(already voted for epoch) 5)投票node不能为master(it is a master node) 6)投票node必须有一个master(I don't know its master) 7)投票node的master处于fail状态(its master is up) 22135:M 06 Sep 14:17:19.844 # Failover auth denied to 534b93af6ba45a7033dbf38c8f47cd688514125a: already voted for epoch 285 如果一个node又联系上了,则它当是一个slave,或者无slots的master时,直接清除FAIL标志;但如果是一个master,则当“(now - node->fail_time) > (server.cluster_node_timeout * CLUSTER_FAIL_UNDO_TIME_MULT)”时,也清除FAIL标志,定义在cluster.h中(cluster.h:#define CLUSTER_FAIL_UNDO_TIME_MULT 2 /* Undo fail if master is back. */) 22135:M 06 Sep 14:17:29.243 * Clear FAIL state for node d6eb06e9d118c120d3961a659972a1d0191a8652: master without slots is reachable again. 如果消息类型为fail。 22135:M 06 Sep 14:17:31.995 * FAIL message received from f7d6b2c72fa3b801e7dcfe0219e73383d143dd0f about 1ba437fa1683a8caafd38ff977e5fbabdaf84fd6 22135:M 06 Sep 14:17:32.496 * FAIL message received from 1d07e208db56cfd7395950ca66e03589278b8e12 about d7942cfe636b25219c6d56aa72828fcfde2ee261 22135:M 06 Sep 14:17:32.968 # Failover auth granted to 938d9ae2de278938beda1d39185608b02d3b31ec for epoch 286 22135:M 06 Sep 14:17:33.177 # Failover auth granted to d9dadf3342006e2c92def3071ca0a76390be62b0 for epoch 287 22135:M 06 Sep 14:17:36.336 * Clear FAIL state for node 1ba437fa1683a8caafd38ff977e5fbabdaf84fd6: master without slots is reachable again. 22135:M 06 Sep 14:17:36.855 * Clear FAIL state for node d7942cfe636b25219c6d56aa72828fcfde2ee261: master without slots is reachable again. 22135:M 06 Sep 14:17:38.419 * Clear FAIL state for node e438a338e9d9834a6745c12931950da87e360ca2: is reachable again and nobody is serving its slots after some time. 22135:M 06 Sep 14:17:54.954 * FAIL message received from ae8f6e7e0ab16b04414c8f3d08b58c0aa268b467 about 7990d146cece7dc83eaf08b3e12cbebb2223f5f8 22135:M 06 Sep 14:17:56.697 * FAIL message received from 1d07e208db56cfd7395950ca66e03589278b8e12 about fbe774cdbd2acd24f9f5ea90d61c607bdf800eb5 22135:M 06 Sep 14:17:57.705 # Failover auth granted to e1c202d89ffe1c61b682e28071627635974c84a7 for epoch 288 22135:M 06 Sep 14:17:57.890 * Clear FAIL state for node 7990d146cece7dc83eaf08b3e12cbebb2223f5f8: slave is reachable again. 22135:M 06 Sep 14:17:57.892 * Clear FAIL state for node fbe774cdbd2acd24f9f5ea90d61c607bdf800eb5: master without slots is reachable again.
// 下列代码输出什么? #include <iostream> #include <string> // typedef basic_ostream<char> ostream; class A { private: int m1,m2; public: A(int a, int b) { m1=a;m2=b; } operator std::string() const { return "str"; } operator int() const { return 2018; } }; int main() { A a(1,2); std::cout << a; return 0; }; 答案是2018, 因为类basic_ostream有成员函数operator<<(int), 而没有成员函数operator<<(const std::string&), 优先调用同名的成员函数,故输出2018,相关源代码如下: // 名字空间std中的全局函数 /usr/include/c++/4.8.2/bits/basic_string.h: template<typename _CharT, typename _Traits, typename _Alloc> inline basic_ostream<_CharT, _Traits>& operator <<(basic_ostream<_CharT, _Traits>& __os, const basic_string<_CharT, _Traits, _Alloc>& __str) { return __ostream_insert(__os, __str.data(), __str.size()); } // 类basic_ostream的成员函数 // std::cout为名字空间std中的类basic_ostream的一个实例 ostream: __ostream_type& basic_ostream::operator<<(int __n); // 下列代码有什么问题,如何修改? #include <iostream> #include <string> class A { public: int m1,m2; public: A(int a, int b) { m1=a;m2=b; } std::ostream& operator <<(std::ostream& os) { os << m1 << m2; return os; } }; int main() { A a(1,2); std::cout << a; return 0; }; 类basic_ostream没有成员函数“operator <<(const A&)”, 也不存在全局的: operator <<(const basic_ostream&, const A&) 而只有左操作数是自己时才会调用成员重载操作符, 都不符合,所以语法错误。 有两种修改方式: 1) 将“std::cout << a”改成“a.operator <<(std::cout)”, 2) 或定义全局的: std::ostream& operator<<(std::ostream& os, const A& a) { os << a.m1 << a.m2; return os; }
当Linux服务器的TIME_WAIT过多时, 通常会想到去修改参数降低TIME_WAIT时长, 以减少TIME_WAIT数量,但Linux并没有提供这样的接口, 除非重新编译内核。 Linux默认的TIME_WAIT时长一般是60秒, 定义在内核的include/net/tcp.h文件中: #define TCP_TIMEWAIT_LEN (60*HZ) /* how long to wait to destroy TIME-WAIT state, * about 60 seconds */ #define TCP_FIN_TIMEOUT TCP_TIMEWAIT_LEN /* BSD style FIN_WAIT2 deadlock breaker. * It used to be 3min, new value is 60sec, * to combine FIN-WAIT-2 timeout with * TIME-WAIT timer. */ 注意tcp_fin_timeout不是TIME_WAIT时间: # cat /proc/sys/net/ipv4/tcp_fin_timeout 60 tcp_fin_timeout实为FIN_WAIT_2状态的时长, Linux没有提供修改TIME_WAIT时长接口,除非修改宏的定义重新编译内核。 但Windows可以修改注册表中的TcpTimedWaitDelay值来控制TIME_WAIT时长。 RTO:超时重传(Retransmission Timeout) TIME_WAIT是一个常见经常的问题,相关内容(/etc/sysctl.conf或/proc/sys/net/ipv4): 1) net.ipv4.tcp_timestamps 为1表示开启TCP时间戳,用来计算往返时间RTT(Round-Trip Time)和防止序列号回绕 2) net.ipv4.tcp_tw_reuse 为1表示允许将TIME-WAIT的句柄重新用于新的TCP连接 3) net.ipv4.tcp_tw_recycle 为1表示开启TCP连接中TIME-WAIT的快速回收,NAT环境可能导致DROP掉SYN包(回复RST) 4) net.ipv4.tcp_fin_timeout FIN_WAIT_2状态的超时时长 5) net.ipv4.tcp_syncookies 为1时SYN Cookies,当SYN等待队列溢出时启用cookies来处理,可防范少量SYN攻击 6) net.ipv4.tcp_max_tw_buckets 保持TIME_WAIT套接字的最大个数,超过这个数字TIME_WAIT套接字将立刻被清除并打印警告信息 7) net.ipv4.ip_local_port_range 8) net.ipv4.tcp_max_syn_backlog 端口最大backlog内核限制,防止占用过大内核内存 9) net.ipv4.tcp_syn_retries 对一个新建连接,内核要发送多少个SYN连接请求才决定放弃,不应该大于255 10) net.ipv4.tcp_retries1 放弃回应一个TCP连接请求前﹐需要进行多少次重试,RFC规定最低的数值是3,这也是默认值 11) net.ipv4.tcp_retries2 在丢弃激活(已建立通讯状况)的TCP连接之前﹐需要进行多少次重试,默认值为15 12) net.ipv4.tcp_synack_retries TCP三次握手的SYN/ACK阶段重试次数,缺省5 13) net.ipv4.tcp_max_orphans 不属于任何进程(已经从进程上下文中删除)的sockets最大个数,超过这个值会被立即RESET,并同时显示警告信息 14) net.ipv4.tcp_orphan_retries 孤儿sockets废弃前重试的次数,缺省值是7 15) net.ipv4.tcp_mem 内核分配给TCP连接的内存,单位是page: 第一个数字表示TCP使用的page少于此值时,内核不进行任何处理(干预), 第二个数字表示TCP使用的page超过此值时,内核进入“memory pressure”压力模式, 第三个数字表示TCP使用的page超过些值时,报“Out of socket memory”错误,TCP 连接将被拒绝 16) net.ipv4.tcp_rmem 为每个TCP连接分配的读缓冲区内存大小,单位是byte 17) net.ipv4.tcp_wmem 为每个TCP连接分配的写缓冲区内存大小,单位是byte: 第一个数字表示,为TCP连接分配的最小内存, 第二个数字表示,为TCP连接分配的缺省内存, 第三个数字表示,为TCP连接分配的最大内存(net.core.wmem_max可覆盖该值) 18) net.ipv4.tcp_keepalive_time 当keepalive起用的时候,TCP发送keepalive消息的频度,单位为秒,缺省是7200秒(即2小时) 19) net.ipv4.tcp_keepalive_intvl keepalive探测包的发送间隔 20) net.ipv4.tcp_keepalive_probes 如果对方不予应答,探测包的发送次数 代码中可通过SO_LINGER来控制。
结论: 待确认是否为redis的BUG,原因是进程实际占用的内存远小于配置的最大内存,所以不会是内存不够需要淘汰。 CPU百分百redis-server进程集群状态: slave 临时解决办法: 使用gdb将d.ht[0].used的值改为0 问题原因: dictGetRandomKey()过程中, 无法走到分支“if (dictSize(d) == 0) return NULL;”, 导致函数dbRandomKey()进入死循环。 版本: Redis server v=3.2.0 sha=00000000:0 malloc=jemalloc-4.0.3 bits=64 build=9894db3ef433c070 现象1:CPU百分百 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 25636 redis 20 0 38492 4096 1360 R 100.0 0.0 2578:10 redis-server 现象2:大量CLOSE_WAIT状态连接: tcp 2417 0 1.49.26.98:11382 1.49.26.98:37268 CLOSE_WAIT - tcp 2521 0 1.49.26.98:11382 1.49.26.98:35141 CLOSE_WAIT - tcp 2521 0 1.49.26.98:11382 1.49.26.98:57181 CLOSE_WAIT - 进程状态: redis 25636 30.0 0.0 38492 4096 ? Rsl 3月23 2579:55 /data/redis/bin/redis-server *:1382 [cluster] 最大内存配置(1G): maxmemory 1073741824 运行日志: 25636:S 28 Mar 00:21:24.526 - 1 clients connected (0 slaves), 1312384 bytes in use 25636:S 28 Mar 00:21:29.531 - DB 0: 1 keys (1 volatile) in 8 slots HT. 25636:S 28 Mar 00:21:29.531 - 1 clients connected (0 slaves), 1312384 bytes in use 25636:S 28 Mar 00:21:32.585 - Accepted 1.118.14.7:58132 调用栈: #0 dictGenHashFunction (key=<optimized out>, len=5) at dict.c:123 #1 0x00000000004232e6 in dictFind (d=0x7f71c2a17240, key=key@entry=0x7f71c2a15001) at dict.c:499 #2 0x000000000043a00a in dbRandomKey (db=0x7f71c2a24800) at db.c:176 #3 0x000000000043a0a2 in randomkeyCommand (c=0x7f71c2aae1c0) at db.c:355 #4 0x0000000000426b95 in call (c=c@entry=0x7f71c2aae1c0, flags=flags@entry=15) at server.c:2221 #5 0x0000000000429ba7 in processCommand (c=0x7f71c2aae1c0) at server.c:2500 #6 0x0000000000436515 in processInputBuffer (c=0x7f71c2aae1c0) at networking.c:1296 #7 0x0000000000421338 in aeProcessEvents (eventLoop=eventLoop@entry=0x7f71c2a2e050, flags=flags@entry=3) at ae.c:412 #8 0x00000000004215eb in aeMain (eventLoop=0x7f71c2a2e050) at ae.c:455 #9 0x000000000041e5df in main (argc=2, argv=0x7ffef34b2418) at server.c:4079 #0 0x00007f71c2fbc3a2 in random () from /lib64/libc.so.6 #1 0x0000000000423745 in dictGetRandomKey (d=0x7f71c2a171e0) at dict.c:646 #2 0x0000000000439fc0 in dbRandomKey (db=0x7f71c2a24800) at db.c:171 #3 0x000000000043a0a2 in randomkeyCommand (c=0x7f71c2aae1c0) at db.c:355 #4 0x0000000000426b95 in call (c=c@entry=0x7f71c2aae1c0, flags=flags@entry=15) at server.c:2221 #5 0x0000000000429ba7 in processCommand (c=0x7f71c2aae1c0) at server.c:2500 #6 0x0000000000436515 in processInputBuffer (c=0x7f71c2aae1c0) at networking.c:1296 #7 0x0000000000421338 in aeProcessEvents (eventLoop=eventLoop@entry=0x7f71c2a2e050, flags=flags@entry=3) at ae.c:412 #8 0x00000000004215eb in aeMain (eventLoop=0x7f71c2a2e050) at ae.c:455 #9 0x000000000041e5df in main (argc=2, argv=0x7ffef34b2418) at server.c:4079 #0 0x00007f71c30e17e4 in __memcmp_sse4_1 () from /lib64/libc.so.6 #1 0x0000000000424219 in dictSdsKeyCompare (privdata=<optimized out>, key1=<optimized out>, key2=<optimized out>) at server.c:445 #2 0x000000000042331d in dictFind (d=0x7f71c2a17240, key=0x7f71c2a27e73) at dict.c:504 #3 0x0000000000439494 in getExpire (db=0x7f71c2a24800, key=0x7f71c2a27e60) at db.c:824 #4 0x0000000000439c4f in expireIfNeeded (db=0x7f71c2a24800, key=0x7f71c2a27e60) at db.c:858 #5 0x000000000043a01a in dbRandomKey (db=0x7f71c2a24800) at db.c:177 #6 0x000000000043a0a2 in randomkeyCommand (c=0x7f71c2aae1c0) at db.c:355 #7 0x0000000000426b95 in call (c=c@entry=0x7f71c2aae1c0, flags=flags@entry=15) at server.c:2221 #8 0x0000000000429ba7 in processCommand (c=0x7f71c2aae1c0) at server.c:2500 #9 0x0000000000436515 in processInputBuffer (c=0x7f71c2aae1c0) at networking.c:1296 #10 0x0000000000421338 in aeProcessEvents (eventLoop=eventLoop@entry=0x7f71c2a2e050, flags=flags@entry=3) at ae.c:412 #11 0x00000000004215eb in aeMain (eventLoop=0x7f71c2a2e050) at ae.c:455 #12 0x000000000041e5df in main (argc=2, argv=0x7ffef34b2418) at server.c:4079 #0 dictGetRandomKey (d=<optimized out>) at dict.c:663 #1 0x0000000000439fc0 in dbRandomKey (db=0x7f71c2a24800) at db.c:171 #2 0x000000000043a0a2 in randomkeyCommand (c=0x7f71c2aae1c0) at db.c:355 #3 0x0000000000426b95 in call (c=c@entry=0x7f71c2aae1c0, flags=flags@entry=15) at server.c:2221 #4 0x0000000000429ba7 in processCommand (c=0x7f71c2aae1c0) at server.c:2500 #5 0x0000000000436515 in processInputBuffer (c=0x7f71c2aae1c0) at networking.c:1296 #6 0x0000000000421338 in aeProcessEvents (eventLoop=eventLoop@entry=0x7f71c2a2e050, flags=flags@entry=3) at ae.c:412 #7 0x00000000004215eb in aeMain (eventLoop=0x7f71c2a2e050) at ae.c:455 #8 0x000000000041e5df in main (argc=2, argv=0x7ffef34b2418) at server.c:4079 猜测: 达到最大内存,进入淘汰keys逻辑,但没有keys符合淘汰,从而死循环。 相关代码: /* Return a random key from the currently selected database. */ void randomkeyCommand(client *c) { robj *key; if ((key = dbRandomKey(c->db)) == NULL) { addReply(c,shared.nullbulk); return; } addReplyBulk(c,key); decrRefCount(key); } /* Return a random key, in form of a Redis object. * If there are no keys, NULL is returned. * * The function makes sure to return keys not already expired. */ robj *dbRandomKey(redisDb *db) { dictEntry *de; while(1) { // CPU百分百的原因,是这里死循环了 sds key; robj *keyobj; de = dictGetRandomKey(db->dict); if (de == NULL) return NULL; key = dictGetKey(de); keyobj = createStringObject(key,sdslen(key)); if (dictFind(db->expires,key)) { if (expireIfNeeded(db,keyobj)) { decrRefCount(keyobj); continue; /* search for another key. This expired. */ } } return keyobj; } } void call(client *c, int flags) { long long dirty, start, duration; int client_old_flags = c->flags; /* Sent the command to clients in MONITOR mode, only if the commands are * not generated from reading an AOF. */ if (listLength(server.monitors) && !server.loading && !(c->cmd->flags & (CMD_SKIP_MONITOR|CMD_ADMIN))) { replicationFeedMonitors(c,server.monitors,c->db->id,c->argv,c->argc); } /* Initialization: clear the flags that must be set by the command on * demand, and initialize the array for additional commands propagation. */ c->flags &= ~(CLIENT_FORCE_AOF|CLIENT_FORCE_REPL|CLIENT_PREVENT_PROP); redisOpArrayInit(&server.also_propagate); /* Call the command. */ dirty = server.dirty; start = ustime(); c->cmd->proc(c); duration = ustime()-start; dirty = server.dirty-dirty; if (dirty < 0) dirty = 0; 。。。。。。 } /* With multiplexing we need to take per-client state. * Clients are taken in a linked list. */ typedef struct client { 。。。。。。 struct redisCommand *cmd, *lastcmd; /* Last command executed. */ 。。。。。。 }; typedef void redisCommandProc(client *c); typedef int *redisGetKeysProc(struct redisCommand *cmd, robj **argv, int argc, int *numkeys); struct redisCommand { char *name; redisCommandProc *proc; int arity; char *sflags; /* Flags as string representation, one char per flag. */ int flags; /* The actual flags, obtained from the 'sflags' field. */ /* Use a function to determine keys arguments in a command line. * Used for Redis Cluster redirect. */ redisGetKeysProc *getkeys_proc; /* What keys should be loaded in background when calling this command? */ int firstkey; /* The first argument that's a key (0 = no keys) */ int lastkey; /* The last argument that's a key */ int keystep; /* The step between first and last key */ long long microseconds, calls; }; /* This is our hash table structure. Every dictionary has two of this as we * implement incremental rehashing, for the old to the new table. */ typedef struct dictht { dictEntry **table; unsigned long size; unsigned long sizemask; unsigned long used; } dictht; typedef struct dict { dictType *type; void *privdata; dictht ht[2]; long rehashidx; /* rehashing not in progress if rehashidx == -1 */ int iterators; /* number of iterators currently running */ } dict; /* Return a random entry from the hash table. Useful to * implement randomized algorithms */ dictEntry *dictGetRandomKey(dict *d) { dictEntry *he, *orighe; unsigned int h; int listlen, listele; // (gdb) p *d // $1 = {type = 0x71d940 <dbDictType>, privdata = 0x0, ht = {{table = 0x7f71c2a1e480, size = 8, sizemask = 7, used = 1}, {table = 0x0, size = 0, sizemask = 0, used = 0}}, rehashidx = -1, iterators = 0} // // (gdb) p d.ht[0] // $3 = {table = 0x7f71c2a1e480, size = 8, sizemask = 7, used = 1} // (gdb) p d.ht[1] // $4 = {table = 0x0, size = 0, sizemask = 0, used = 0} // // (gdb) set variable d.ht[0].used=0 // (gdb) p d.ht[0].used // $7 = 0 // #define dictSize(d) ((d)->ht[0].used+(d)->ht[1].used) if (dictSize(d) == 0) return NULL; if (dictIsRehashing(d)) _dictRehashStep(d); if (dictIsRehashing(d)) { do { /* We are sure there are no elements in indexes from 0 * to rehashidx-1 */ h = d->rehashidx + (random() % (d->ht[0].size + d->ht[1].size - d->rehashidx)); he = (h >= d->ht[0].size) ? d->ht[1].table[h - d->ht[0].size] : d->ht[0].table[h]; } while(he == NULL); } else { do { h = random() & d->ht[0].sizemask; he = d->ht[0].table[h]; } while(he == NULL); } /* Now we found a non empty bucket, but it is a linked * list and we need to get a random element from the list. * The only sane way to do so is counting the elements and * select a random index. */ listlen = 0; orighe = he; while(he) { he = he->next; listlen++; } listele = random() % listlen; he = orighe; while(listele--) he = he->next; return he; } /* This function performs just a step of rehashing, and only if there are * no safe iterators bound to our hash table. When we have iterators in the * middle of a rehashing we can't mess with the two hash tables otherwise * some element can be missed or duplicated. * * This function is called by common lookup or update operations in the * dictionary so that the hash table automatically migrates from H1 to H2 * while it is actively used. */ static void _dictRehashStep(dict *d) { if (d->iterators == 0) dictRehash(d,1); } 进程内存(问题解决,退出死循环后才能看到,但结果和ps看到一致): # Memory used_memory:1375320 used_memory_human:1.31M used_memory_rss:4321280 used_memory_rss_human:4.12M used_memory_peak:2468448 used_memory_peak_human:2.35M total_system_memory:33453797376 total_system_memory_human:31.16G used_memory_lua:34816 used_memory_lua_human:34.00K maxmemory:1073741824 maxmemory_human:1.00G maxmemory_policy:allkeys-lru mem_fragmentation_ratio:3.14 mem_allocator:jemalloc-4.0.3
问题复现步骤: 1) 输入字符串: { "V":0.12345678 } 2) 字符串转成cJSON对象 3) 调用cJSON_Print将cJSON对象再转成字符串 4) 再将字符串转成cJSON对象 5) 保留8位精度方式调用printf打印值,输出变成:0.123456 问题的原因出在cJSON的print_number函数: static char *print_number(cJSON *item) { char *str; double d = item->valuedouble; if (fabs(((double) item->valueint) - d) && d >= INT_MIN) { str = (char*) cJSON_malloc(21); /* 2^64+1 can be represented in 21 chars. */ if (str) sprintf(str, "%d", item->valueint); } else { str = (char*) cJSON_malloc(64); /* This is a nice tradeoff. */ if (str) { if (fabs(floor(d) - d) sprintf(str, "%.0f", d); else if (fabs(d) 1.0e9) sprintf(str, "%e", d); else sprintf(str, "%f", d); } } return str; } 最后一个sprintf调用没有指定保留的精度,默认为6位,这就是问题的原因。 注:float的精度为6~7位有效数字,double的精度为15~16位。