第一次遇到MySQL 将错误信息写入 slave-relay-log.index 中,slave io thread 启动成功,而sql thread 失败的案例,记录下来。
【现象】
关于 mysqld got signal 11 的错误案例 请见 《mysqld got signal 11 案例一则 》 ,这里遇到Error counting relay log space 报错,于是检查 slave-relay-log.index 文件
由于MySQL slave 在启动时需要检查relay log index 文件中的relay log信息,并进行applay到本地,由于该文件包含无误信息导致MySQL无法识别读取不到relay log ,sql thread 启动报错。
【解决】
检查 发现错误信息写入relay-log.index文件中,清理slave-relay-log.index 中的异常信息,MySQL 会自动将slave sql thread起来 ,除非刚刚开始问题定位不准导致误判。
(none)@3008 21:24:19>
(none)@3008 21:24:51>show slave status \G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
.........
Master_Log_File: mysql-bin.001221
Read_Master_Log_Pos: 319331681
Relay_Log_File: slave-relay.023504
Relay_Log_Pos: 319331826
Relay_Master_Log_File: mysql-bin.001221
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
.....
Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
1 row in set (0.00 sec)
【现象】
生产环境突然报警,slave sql进程停止,登陆服务器检查,master-error.log 包含如下信息:
#tail -f /home/mysql/data3008/mysql/master-error.log
140507 20:59:29 [ERROR] log 10:44:23 UTC - mysqld got signal 11 ; listed in the index, but failed to stat
140507 20:59:29 [ERROR] Error counting relay log space
140507 21:04:29 [ERROR] log 10:44:23 UTC - mysqld got signal 11 ; listed in the index, but failed to stat
140507 21:04:29 [ERROR] Error counting relay log space
140507 21:09:29 [ERROR] log 10:44:23 UTC - mysqld got signal 11 ; listed in the index, but failed to stat
140507 21:09:29 [ERROR] Error counting relay log space
140507 21:14:29 [ERROR] log 10:44:23 UTC - mysqld got signal 11 ; listed in the index, but failed to stat
140507 21:14:29 [ERROR] Error counting relay log space
140507 21:15:29 [ERROR] log 10:44:23 UTC - mysqld got signal 11 ; listed in the index, but failed to stat
140507 21:15:29 [ERROR] Error counting relay log space
关于 mysqld got signal 11 的错误案例 请见 《mysqld got signal 11 案例一则 》 ,这里遇到Error counting relay log space 报错,于是检查 slave-relay-log.index 文件
#more slave-relay-log.index
/home/mysql/data3008/mysql/slave-relay.023481
/home/mysql/data3008/mysql/slave-relay.023482
10:44:23 UTC - mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also beay fail.
key_buffer_size=16777216
read_buffer_size=262144
max_used_connections=10
max_threads=230
thread_count=8
connection_count=7
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 136546 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
Thread pointer: 0x5580000
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 2b74017d3e58 thread_stack 0x40000
/u01/mysql/bin/mysqld(my_print_stacktrace+0x29) [0x903c24]
/u01/mysql/bin/mysqld(handle_fatal_signal+0x3f6) [0x703916]
/lib64/libpthread.so.0() [0x313f80f4a0]
/u01/mysql/bin/mysqld(Query_cache::free_memory_block(Query_cache_block*)+0x58) [0x73c5e0]
/u01/mysql/bin/mysqld(Query_cache::free_query_internal(Query_cache_block*)+0x164) [0x73ca8a]
/u01/mysql/bin/mysqld() [0x73db00]
/u01/mysql/bin/mysqld(query_cache_insert(st_net*, char const*, unsigned long)+0x1e7) [0x740b2b]
/u01/mysql/bin/mysqld(net_real_write+0x39) [0x5e2531]
/u01/mysql/bin/mysqld() [0x5e29a3]
/u01/mysql/bin/mysqld(my_net_write+0xda) [0x5e2f91]
/u01/mysql/bin/mysqld(Protocol::write()+0x1e) [0x5e4192]
/u01/mysql/bin/mysqld(select_send::send_data(List&)+0x17c) [0x5dd204]
/u01/mysql/bin/mysqld() [0x652f13]
/u01/mysql/bin/mysqld() [0x654a73]
/u01/mysql/bin/mysqld(sub_select(JOIN*, st_join_table*, bool)+0x81) [0x65a2c3]
/u01/mysql/bin/mysqld() [0x65f5ed]
/u01/mysql/bin/mysqld(JOIN::exec()+0x466) [0x675e58]
/u01/mysql/bin/mysqld(mysql_select(THD*, Item***, TABLE_LIST*, unsigned int, List&, Item*, unsigned int, st_order*, st_order*, Item*, st_order*, unsigned long lon
g, select_result*, st_select_lex_unit*, st_select_lex*)+0x700) [0x671ff7]
/u01/mysql/bin/mysqld(handle_select(THD*, st_lex*, select_result*, unsigned long)+0x18b) [0x677f22]
/u01/mysql/bin/mysqld() [0x601685]
/u01/mysql/bin/mysqld(mysql_execute_command(THD*)+0x18ee) [0x6066a4]
/u01/mysql/bin/mysqld(mysql_parse(THD*, char*, unsigned int, char const**)+0x419) [0x60a81c]
/u01/mysql/bin/mysqld(dispatch_command(enum_server_command, THD*, char*, unsigned int)+0xe94) [0x60b6c5]
/u01/mysql/bin/mysqld(do_command(THD*)+0x107) [0x60c16c]
/u01/mysql/bin/mysqld(handle_one_connection+0x237) [0x5fe1de]
/lib64/libpthread.so.0() [0x313f8077f1]
/lib64/libc.so.6(clone+0x6d) [0x313f4e570d]
由于MySQL slave 在启动时需要检查relay log index 文件中的relay log信息,并进行applay到本地,由于该文件包含无误信息导致MySQL无法识别读取不到relay log ,sql thread 启动报错。
【解决】
检查 发现错误信息写入relay-log.index文件中,清理slave-relay-log.index 中的异常信息,MySQL 会自动将slave sql thread起来 ,除非刚刚开始问题定位不准导致误判。
(none)@3008 21:24:19>
(none)@3008 21:24:51>show slave status \G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
.........
Master_Log_File: mysql-bin.001221
Read_Master_Log_Pos: 319331681
Relay_Log_File: slave-relay.023504
Relay_Log_Pos: 319331826
Relay_Master_Log_File: mysql-bin.001221
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
.....
Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
1 row in set (0.00 sec)
【疑问】
什么导致MySQL 将错误日志写入relay_log.index 中的?
什么导致MySQL 将错误日志写入relay_log.index 中的?