某天,看到了一篇数据库大牛微信号推送的文章,内容是关于sql_slave_skip_counter参数的,文章指出阿里P6、P7也不一定都了解这个参数,所以尝试学习一下该参数。
set global sql_slave_skip_counter=N中N是什么意思呢?仅仅是事务的个数吗?还是说SQL的个数呢?
下面我们一起探寻结果。
该参数最常见场景莫过于在主从复制中由于种种原因导致的主从数据不一致,从必须跳过该事务后才能正常恢复正常状态
但是在5.7后的MyS
QL中不太常用该参数,因为
该参数与gtid_mode=on不可同时使用。所以如果想用该参数
的话可以修改
gtid_mode为on_
permissive、
off_permissive、off
官方文档:
This option is incompatible with GTID-based replication, and must not be set to a nonzero value when
--gtid-mode=ON
. In MySQL 5.7.1 and later, trying to do so is specifically disallowed. (Bug
#15833516) If you need to skip transactions when employing GTIDs, use
gtid_executed
from the master instead
下面为一场景(
binlog_format=ROW
):
newdba@ceshi 03:22:18>show variables like 'gtid_mode';
+---------------+---------------+
| Varciable_name | Value |
+---------------+---------------+
| gtid_mode | ON_PERMISSIVE
|
+---------------+---------------+
newdba@ceshi 04:10:03>show slave status\G
*************************** 1. row ***************************
Slave_IO_Running: Yes
Slave_SQL_Running: No
Last_SQL_Errno: 1032
Last_SQL_Error: Coordinator stopped because there were error(s) in the worker(s). The most recent failure being: Worker 8 failed executing transaction 'c54c8c7b-cf59-11e7-a7ff-00163e06ff5b:267' at master log mysql-bin.000003, end_log_pos 65931. See error log and/or performance_schema.replication_applier_status_by_worker table for more details about this failure or others, if any.
1 row in set (0.00 sec)
遇到这种问题正常思路当然是解决问题,那么就需要跳过该问题事务,使得复制恢复正常。操作过程为:
newdba@ceshi 04:10:54>set global sql_slave_skip_counter=1;
Query OK, 0 rows affected (0.00 sec)
newdba@ceshi 04:12:47>stop slave;
Query OK, 0 rows affected (0.00 sec)
newdba@ceshi 04:12:53>start slave;
Query OK, 0 rows affected (0.04 sec)
newdba@ceshi 04:12:58>show slave status\G
*************************** 1. row ***************************
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
1 row in set (0.00 sec)
如上的操作可以完美解决遇到的问题,但是有没有想过参数sql_slave_skip_counter为其他值会出现什么情况呢?或者说一个事务中有多条SQL,那么sql_slave_skip_counter=2是不是跳过两个事务呢?下面是一个简单的测试(newdba和percona是不同机器的账号):
newdba@ceshi 04:09:09>select * from c1;
+------+------+------+
| a | b | c |
+------+------+------+
| 1 | 1 | 1 |
| 2 | 2 | 2 |
| 3 | 3 | 3 |
+------+------+------+
3 rows in set (0.00 sec)
newdba@ceshi 04:09:14>delete from c1 where a=2;
Query OK, 1 row affected (0.01 sec)
percona@ceshi 04:09:09>select * from c1;
+------+------+------+
| a | b | c |
+------+------+------+
| 1 | 1 | 1 |
| 2 | 2 | 2 |
| 3 | 3 | 3 |
+------+------+------+
3 rows in set (0.00 sec)
percona
@ceshi 04:13:31>update c1 set b=3 where a=2;
Query OK, 1 row affected (0.01 sec)
Rows matched: 1 Changed: 1 Warnings: 0
percona@ceshi 04:48:54>begin;
Query OK, 0 rows affected (0.00 sec)
percona@ceshi 04:49:19>insert into c1 select 2,2,2;
Query OK, 1 row affected (0.00 sec)
Records: 1 Duplicates: 0 Warnings: 0
percona@ceshi 04:49:35>delete from c1 where a=3;
Query OK, 1 row affected (0.00 sec)
percona@ceshi 04:49:48>insert into c1 select 4,4,4;
Query OK, 1 row affected (0.00 sec)
Records: 1 Duplicates: 0 Warnings: 0
percona@ceshi 04:49:55>commit;
Query OK, 0 rows affected (0.01 sec)
percona@ceshi 04:50:00>begin;
Query OK, 0 rows affected (0.00 sec)
percona@ceshi 04:50:08>insert into c1 select 5,5,5;
Query OK, 1 row affected (0.00 sec)
Records: 1 Duplicates: 0 Warnings: 0
percona@ceshi 04:50:16>commit;
Query OK, 0 rows affected (0.00 sec)
percona@ceshi 04:51:00>begin;
Query OK, 0 rows affected (0.00 sec)
percona@ceshi 04:51:08>insert into c1 select 6,6,6;
Query OK, 1 row affected (0.00 sec)
Records: 1 Duplicates: 0 Warnings: 0
percona@ceshi 04:51:16>commit;
Query OK, 0 rows affected (0.00 sec)
percona@ceshi 04:50:20>select * from c1;
+------+------+------+
| a | b | c |
+------+------+------+
| 1 | 1 | 1 |
| 2 | 2 | 2 |
| 4 | 4 | 4 |
| 5 | 5 | 5 |
| 6 | 6 | 6 |
+------+------+------+
5 rows in set (0.00 sec)
newdba@ceshi 04:13:06>show slave status\G
*************************** 1. row ***************************
Slave_IO_Running: Yes
Slave_SQL_Running: No
Replicate_Wild_Ignore_Table:
Last_Errno: 1032
Last_Error: Coordinator stopped because there were error(s) in the worker(s). The most recent failure being: Worker 8 failed executing transaction 'c54c8c7b-cf59-11e7-a7ff-00163e06ff5b:268' at master log mysql-bin.000003, end_log_pos 66292. See error log and/or performance_schema.replication_applier_status_by_worker table for more details about this failure or others, if any.
1 row in set (0.00 sec)
newdba@ceshi 04:18:52>set global sql_slave_skip_counter=2;
Query OK, 0 rows affected (0.00 sec)
newdba@ceshi 04:20:32>stop slave;
Query OK, 0 rows affected (0.01 sec)
newdba@ceshi 04:20:39>start slave;
Query OK, 0 rows affected (0.04 sec)
newdba@ceshi 04:48:03>select * from c1;
+------+------+------+
| a | b | c |
+------+------+------+
| 1 | 1 | 1 |
| 5 | 5 | 5 |
| 6 | 6 | 6 |
+------+------+------+
3 rows in set (0.01 sec)
newdba@ceshi 04:20:46>show slave status\G
*************************** 1. row ***************************
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
1 row in set (0.00 sec)
从上面的测试来看呢,结果是即使
sql_slave_skip_counter=2也不代表跳过两个事务。所以
sql_slave_skip_counter跳过的不是事务,
那么N代表什么意思?官网上给出的是EVENT。
EVENT是什么呢?
event为
show binlog events in 'mysql-bin.000003';显示结果中的
每一行为一个event
percona@ceshi 05:54:20>show binlog events in 'mysql-bin.000003';
| mysql-bin.000003 | 83878 | Gtid | 1 | 83943 | SET @@SESSION.GTID_NEXT= 'c54c8c7b-cf59-11e7-a7ff-00163e06ff5b:325' |
| mysql-bin.000003 | 83943 | Query | 1 | 84016 | BEGIN |
| mysql-bin.000003 | 84016 | Table_map | 1 | 84064 | table_id: 358 (ceshi.c1) |
| mysql-bin.000003 | 84064 | Write_rows | 1 | 84112 | table_id: 358 flags: STMT_END_F |
| mysql-bin.000003 | 84112 | Xid | 1 | 84143 | COMMIT /* xid=191480 */ |
| mysql-bin.000003 | 84143 | Gtid | 1 | 84208 | SET @@SESSION.GTID_NEXT= 'c54c8c7b-cf59-11e7-a7ff-00163e06ff5b:326' |
| mysql-bin.000003 | 84208 | Query | 1 | 84281 | BEGIN |
| mysql-bin.000003 | 84281 | Table_map | 1 | 84329 | table_id: 358 (ceshi.c1) |
| mysql-bin.000003 | 84329 | Write_rows | 1 | 84377 | table_id: 358 flags: STMT_END_F |
| mysql-bin.000003 | 84377 | Xid | 1 | 84408 | COMMIT /* xid=191483 */ |
| mysql-bin.000003 | 84408 | Gtid | 1 | 84473 | SET @@SESSION.GTID_NEXT= 'c54c8c7b-cf59-11e7-a7ff-00163e06ff5b:327' |
| mysql-bin.000003 | 84473 | Query | 1 | 84546 | BEGIN |
| mysql-bin.000003 | 84546 | Table_map | 1 | 84594 | table_id: 358 (ceshi.c1) |
| mysql-bin.000003 | 84594 | Write_rows | 1 | 84642 | table_id: 358 flags: STMT_END_F |
| mysql-bin.000003 | 84642 | Xid | 1 | 84673 | COMMIT /* xid=191486 */ |
+------------------+-------+----------------+-----------+-------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
1174 rows in set (0.00
sec)
总结:
参数sql_slave_skip_counter跳过的是event,不是单个事务(
一个事务多个event:begin; insert...;update...;insert...;commit(不是5个event)
)。当只有1条SQL时,可能会有很多个event,N会跳过N条event。当在一个事务中有许多event时,跳过的点在事务中时,会继续跳过该事务剩下event,执行下一个事务;所以
sql_slave_skip_counter=1(常用且不易错)时,每次跳过的是一个事务,无论该事务有多少个event
注:slave_exec_mode参数修改为'IDEMPOTENT'时,可以避免上面情况的发生
newdba@ceshi 04:21:16>show variables like 'slave_exec_mode';
+-----------------+--------+
| Variable_name | Value |
+-----------------+--------+
| slave_exec_mode | STRICT |
+-----------------+--------+
1 row in set (0.01 sec)
newdba@ceshi 04:30:18>set global slave_exec_mode='IDEMPOTENT';
Query OK, 0 rows affected (0.01 sec)
newdba@ceshi 04:30:26>show variables like 'slave_exec_mode';
+-----------------+------------+
| Variable_name | Value |
+-----------------+------------+
| slave_exec_mode | IDEMPOTENT |
+-----------------+------------+
1 row in set (0.01 sec)
重复以上操作
newdba@ceshi 04:32:54>show slave status\G
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
1 row in set (0.00 sec)
newdba@ceshi 04:33:41>select * from c1;
+------+------+------+
| a | b | c |
+------+------+------+
| 1 | 1 | 1 |
| 3 | 3 | 3 |
| 4 | 4 | 4 |
| 5 | 5 | 5 |
| 6 | 6 | 6 |
+------+------+------+
5 rows in set (0.00 sec)