生产环境总会遇到由于各种原因导致的主从复制不一致的情况,导致slave出现 1032报错。为了使主从关系能够稳定的运行,大多时候可以选择修复1032 报错 ,先跳过去 ,然后使用 percona 的工具 pt-table-checksum 和 pt-table-sync 进行校验和修复 。
修复1032 error的脚本如下:
使用该脚本需要注意的是
sql_slave_skip_counter = 1
该参数是跳过一个事务 ,是跳过一个事务,是 跳过一个事务,重要的事情说三遍,如果你的一个事务里面包含了多个dml操作 比如4个update,第二个update出现问题,使用该参数跳过的结果是 整个事务都会被跳过去,会导致数据不一致。
binlog为row模式:
binlog为 statement格式
所以修复1032之后务必使用上面提供的工具(当然如果你们的工作环境有更好的工具也可以) 修复数据不一致。
推荐一下关于 sql_slave_skip_counter 的参考资料
[1] MySQL小误区:关于set global sql_slave_skip_counter=N 命令的一些点
[2] Another reason why SQL_SLAVE_SKIP_COUNTER is bad in MySQL
如果您觉得从这篇文章受益,可以赞助 北在南方 一瓶饮料 ^_^
修复1032 error的脚本如下:
- #!/bin/sh
- # fetch port 1032 error recored to /tmp/record.bashc.1032.$port
- # parament port
- if [ -z "$1" ] ; then
- PORT=3001
- else
- PORT=$1
- fi
- tmpfile="/tmp/record.bashc.1032.$PORT"
- touch $tmpfile
- if [ -f $tmpfile ] ; then
- rm -f $tmpfile
- fi
- while true ; do
- mysql -uroot -h127.0.0.1 -P$PORT -Ae 'SHOW SLAVE STATUS\G' | grep -i Slave_SQL_Running | grep -i no > /dev/null
- if [ $? -eq 0 ] ; then
- # whether 1032 ?
- mysql -uroot -h127.0.0.1 -P$PORT -Ae 'SHOW SLAVE STATUS\G' | grep -i Last_SQL_Errno | grep -i 1032 > /dev/null
- if [ $? -eq 0 ] ; then
- table=$(mysql -uroot -h127.0.0.1 -P$PORT -Ae 'SHOW SLAVE STATUS\G' | grep Last_SQL_Error | awk -F 'on table ' '{print $2}' | awk -F ';' '{print $1}')
- grep "$table" $tmpfile > /dev/null
- if [ $? -eq 0 ] ; then
- echo "Error $table is already exists , can't record $tmpfile , Errorno 1032"
- mysql -h127.0.0.1 -P$PORT -Ae 'STOP SLAVE ; SET GLOBAL sql_slave_skip_counter = 1 ; SELECT SLEEP(0.1) ; START SLAVE'
- else
- echo "Error $table is not exists record it to $tmpfile Errorno 1032"
- echo $table >> $tmpfile
- mysql -uroot -h127.0.0.1 -P$PORT -Ae 'STOP SLAVE ; SET GLOBAL sql_slave_skip_counter = 1 ; SELECT SLEEP(0.1) ; START SLAVE'
- fi
- else
- echo "Error , is not 1032 error , please maunal fix , infor $(mysql -h127.0.0.1 -P$PORT -Ae 'SHOW SLAVE STATUS\G' | grep Last_SQL_Error)"
- fi
- else
- echo "IS OK , behind master : $(mysql -h127.0.0.1 -P$PORT -Ae 'SHOW SLAVE STATUS\G' | grep Seconds_Behind_Master | awk '{print $2}')"
- fi
- sleep 0.2
- done
该参数是跳过一个事务 ,是跳过一个事务,是 跳过一个事务,重要的事情说三遍,如果你的一个事务里面包含了多个dml操作 比如4个update,第二个update出现问题,使用该参数跳过的结果是 整个事务都会被跳过去,会导致数据不一致。
binlog为row模式:
- master> select * from t;
- +----+-----+
- | id | pid |
- +----+-----+
- | 1 | 1 |
- | 2 | 2 |
- | 3 | 3 |
- +----+-----+
- 3 rows in set (0.00 sec)
- slave> select * from t;
- +----+-----+
- | id | pid |
- +----+-----+
- | 1 | 1 |
- | 3 | 3 |
- +----+-----+
- 2 rows in set (0.00 sec)
- master> BEGIN;
- Query OK, 0 rows affected (0.00 sec)
- master> DELETE FROM t WHERE id = 1;
- Query OK, 1 row affected (0.00 sec)
- master> DELETE FROM t WHERE id = 2;
- Query OK, 1 row affected (0.00 sec)
- master> DELETE FROM t WHERE id = 3;
- Query OK, 1 row affected (0.00 sec)
- master> COMMIT;
- Query OK, 0 rows affected (0.01 sec)
-
-
- slave> show slave status G
- *************************** 1. row ***************************
- ...
- Last_SQL_Errno: 1032
- Last_SQL_Error: Could not execute Delete_rows event on table test.t; Can't find record in 't', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log mysql-bin.000002, end_log_pos 333
- ...
- 1 row in set (0.00 sec)
- master> select * from t;
- +----+-----+
- | id | pid |
- +----+-----+
- | 4 | 1 |
- | 6 | 3 |
- +----+-----+
- 2 rows in set (0.00 sec)
- slave> select * from t;
- +----+-----+
- | id | pid |
- +----+-----+
- | 4 | 1 |
- | 5 | 2 |
- | 6 | 3 |
- +----+-----+
- 3 rows in set (0.00 sec)
- master> BEGIN;
- Query OK, 0 rows affected (0.00 sec)
- master> delete from t where id = 4;
- Query OK, 1 row affected (0.00 sec)
- master> insert into t values (5,2);
- Query OK, 1 row affected (0.00 sec)
- master> delete from t where id = 6;
- Query OK, 1 row affected (0.00 sec)
- master> COMMIT;
- Query OK, 0 rows affected (0.15 sec)
- slave> show slave status G
- *************************** 1. row ***************************
- ...
- Last_SQL_Errno: 1062
- Last_SQL_Error: Error 'Duplicate entry '5' for key 'PRIMARY'' on query. Default database: 'test'. Query: 'insert into t values (5,2)'
- ...
- 1 row in set (0.00 sec)
- slave> stop slave; set global sql_slave_skip_counter = 1; start slave;
- Query OK, 0 rows affected (0.05 sec)
- slave> select * from t;
- +----+-----+
- | id | pid |
- +----+-----+
- | 4 | 1 |
- | 5 | 2 |
- | 6 | 3 |
- +----+-----+
- 3 rows in set (0.00 sec)
推荐一下关于 sql_slave_skip_counter 的参考资料
[1] MySQL小误区:关于set global sql_slave_skip_counter=N 命令的一些点
[2] Another reason why SQL_SLAVE_SKIP_COUNTER is bad in MySQL
如果您觉得从这篇文章受益,可以赞助 北在南方 一瓶饮料 ^_^