-------------------------------------------------------------------------------------------------正文---------------------------------------------------------------------------------------------------------------
场景 :
Crash发生时的数据库版本: MySQL-5.7.12, 官方标注在5.7.17进行了fix;
开启半同步的主从架构中, 从库开启半同步, 启动/重启slave线程导致Master实例Crash;
结论 :
mysql bug, 附上bug单链接: https://bugs.mysql.com/bug.php?id=79865
问题描述(摘抄):
主要问题就出在tcp连接的select方法, 通常, 操作系统通过宏FD_SET_SIZE来声明一个进程中select能操作的文件描述符的最大数据, 然而通常情况下, 这个FD_SET_SIZE的值仅为1024;
实际上, 用epoll或者poll会比较少, select貌似是用的很少的;
问题复现 :
准备一套MySQL-5.7.12的主从架构, 开启半同步:
为了能尽量简单的启用大量的文件描述符, 这里利用MyISAM分区表的"特性";
这时候在主库上连续执行select语句多次(>5);
这时候看一下主库的文件描述符数量;
那么现在在开启半同步的从库上重启一下slave, 同时tail一下主库的日志;
在重启线程几秒钟之后, 主库就发生了Crash;
PS: 在测试的过程中, 多次执行了select语句, 然后确认主库的半同步状态也是ON的情况下迅速在从库上重启slave, 基本是必现的;
PPS: MyISAM表在open的时候会同时打开所有的分区文件, 所以能比较方便的模拟占用大量文件描述符的情景;
(MyISAM分区表: http://blog.itpub.net/29510932/viewspace-2134679/)
PPPPPPPS: _(:з」∠)_
附上测试用的脚本与Crash的信息
场景 :
Crash发生时的数据库版本: MySQL-5.7.12, 官方标注在5.7.17进行了fix;
开启半同步的主从架构中, 从库开启半同步, 启动/重启slave线程导致Master实例Crash;
结论 :
mysql bug, 附上bug单链接: https://bugs.mysql.com/bug.php?id=79865
问题描述(摘抄):
Description: From 5.7,semi-sync add Ack_receiver thread for listening slave ack,which use select(). But select() can only listen socket fd between 1 and __FD_SET_SIZE(my os is 1024), when socket fd is bigger than __FD_SET_SIZE, select() has no effect, and can never get ack from slave,then semi-sync can't run normally.even more,select() use array store fds, when use FD_SET store fd which is bigger than __FD_SET_SIZE, array will overflow,so mysqld may crash。
主要问题就出在tcp连接的select方法, 通常, 操作系统通过宏FD_SET_SIZE来声明一个进程中select能操作的文件描述符的最大数据, 然而通常情况下, 这个FD_SET_SIZE的值仅为1024;
实际上, 用epoll或者poll会比较少, select貌似是用的很少的;
问题复现 :
准备一套MySQL-5.7.12的主从架构, 开启半同步:
为了能尽量简单的启用大量的文件描述符, 这里利用MyISAM分区表的"特性";
这时候在主库上连续执行select语句多次(>5);
这时候看一下主库的文件描述符数量;
那么现在在开启半同步的从库上重启一下slave, 同时tail一下主库的日志;
在重启线程几秒钟之后, 主库就发生了Crash;
PS: 在测试的过程中, 多次执行了select语句, 然后确认主库的半同步状态也是ON的情况下迅速在从库上重启slave, 基本是必现的;
PPS: MyISAM表在open的时候会同时打开所有的分区文件, 所以能比较方便的模拟占用大量文件描述符的情景;
(MyISAM分区表: http://blog.itpub.net/29510932/viewspace-2134679/)
PPPPPPPS: _(:з」∠)_
附上测试用的脚本与Crash的信息
点击(此处)折叠或打开
- CREATE TABLE `myisam_t` (
- `id` int(11) DEFAULT NULL
- ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4
- /*!50100 PARTITION BY HASH (id)
- PARTITIONS 2000 */
点击(此处)折叠或打开
- 2017-04-28T22:10:00.731611+08:00 5092 [Note] Start binlog_dump to master_thread_id(5092) slave_server(13043), pos(, 4)
- 2017-04-28T22:10:01.648365+08:00 5092 [Note] Start semi-sync binlog_dump to slave (server_id: 13043), pos(, 4)
- *** buffer overflow detected ***: /usr/sbin/mysqld terminated
- ======= Backtrace: =========
- /lib/x86_64-linux-gnu/libc.so.6(+0x731af)[0x7fcdfc7981af]
- /lib/x86_64-linux-gnu/libc.so.6(__fortify_fail+0x37)[0x7fcdfc81dcf7]
- /lib/x86_64-linux-gnu/libc.so.6(+0xf6f10)[0x7fcdfc81bf10]
- /lib/x86_64-linux-gnu/libc.so.6(+0xf8c67)[0x7fcdfc81dc67]
- /usr/lib/mysql/plugin/semisync_master.so(_ZN12Ack_receiver17get_slave_socketsEP6fd_set+0x83)[0x7fcc73d4a493]
- /usr/lib/mysql/plugin/semisync_master.so(_ZN12Ack_receiver3runEv+0x603)[0x7fcc73d4aaf3]
- /usr/lib/mysql/plugin/semisync_master.so(ack_receive_handler+0x19)[0x7fcc73d4aba9]
- /usr/sbin/mysqld(pfs_spawn_thread+0x1b4)[0xe90784]
- /lib/x86_64-linux-gnu/libpthread.so.0(+0x80a4)[0x7fcdfdf650a4]
- /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7fcdfc80d87d]
点击(此处)折叠或打开
- 14:10:01 UTC - mysqld got signal 6 ;
- This could be because you hit a bug. It is also possible that this binary
- or one of the libraries it was linked against is corrupt, improperly built,
- or misconfigured. This error can also be caused by malfunctioning hardware.
- Attempting to collect some information that could help diagnose the problem.
- As this is a crash and something is definitely wrong, the information
- collection process might fail.
-
- key_buffer_size=8388608
- read_buffer_size=131072
- max_used_connections=5
- max_threads=9999
- thread_count=8
- connection_count=2
- It is possible that mysqld could use up to
- key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 21899362 K bytes of memory
- Hope that's ok; if not, decrease some variables in the equation.
-
- Thread pointer: 0x0
- Attempting backtrace. You can use the following information to find out
- where mysqld died. If you see no messages after this, something went
- terribly wrong...
- stack_bottom = 0 thread_stack 0x40000
- /usr/sbin/mysqld(my_print_stacktrace+0x2c)[0xe77fec]
- /usr/sbin/mysqld(handle_fatal_signal+0x459)[0x7a7019]
- /lib/x86_64-linux-gnu/libpthread.so.0(+0xf8d0)[0x7fcdfdf6c8d0]
- /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x37)[0x7fcdfc75a067]
- /lib/x86_64-linux-gnu/libc.so.6(abort+0x148)[0x7fcdfc75b448]
- /lib/x86_64-linux-gnu/libc.so.6(+0x731b4)[0x7fcdfc7981b4]
- /lib/x86_64-linux-gnu/libc.so.6(__fortify_fail+0x37)[0x7fcdfc81dcf7]
- /lib/x86_64-linux-gnu/libc.so.6(+0xf6f10)[0x7fcdfc81bf10]
- /lib/x86_64-linux-gnu/libc.so.6(+0xf8c67)[0x7fcdfc81dc67]
- /usr/lib/mysql/plugin/semisync_master.so(_ZN12Ack_receiver17get_slave_socketsEP6fd_set+0x83)[0x7fcc73d4a493]
- /usr/lib/mysql/plugin/semisync_master.so(_ZN12Ack_receiver3runEv+0x603)[0x7fcc73d4aaf3]
- /usr/lib/mysql/plugin/semisync_master.so(ack_receive_handler+0x19)[0x7fcc73d4aba9]
- /usr/sbin/mysqld(pfs_spawn_thread+0x1b4)[0xe90784]
- /lib/x86_64-linux-gnu/libpthread.so.0(+0x80a4)[0x7fcdfdf650a4]
- /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7fcdfc80d87d]
- The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
- information that should help you find out what is causing the crash.