MySQL下xtrabackup与MTS造成的死锁

本文涉及的产品
云数据库 RDS MySQL,集群系列 2核4GB
推荐场景:
搭建个人博客
RDS MySQL Serverless 基础系列,0.5-2RCU 50GB
RDS MySQL Serverless 高可用系列,价值2615元额度,1个月
简介: 测试背景 MySQL的物理备份在企业版中是有的,但是community版本的话是不提供这个版本的,但是percona作为MySQL的技术服务公司,开源了自己的物理备份工具--xtrabackup。

测试背景

MySQL的物理备份在企业版中是有的,但是community版本的话是不提供这个版本的,但是percona作为MySQL的技术服务公司,开源了自己的物理备份工具--xtrabackup。本文主要是针对生产环境使用xtrabackup遇到的坑。

测试环境

  • Centos 7.4
  • MySQL 5.7.25
  • Xtrabackup 2.4.12

准备工作

  • 在slave上使用物理备份
  • MySQL开启MTS

生产环境

现象

image

分析

  • 会话4为Coordinator线程分发的事务还不能执行,进入waiting for dependent transaction to commit的状态,会话3290897、4、3线程之间形成死锁,其中会话3290897占有全局读锁,获取全局commit锁的时候进入阻塞,等待会话4释放事务涉及到表的commit锁;会话3、4的事务可以并行复制,会话4占有表级commit锁,但是事务对应的binlog在后面(在MTS下,slave_preserve_commit_order=1时,保证事务是按照relay log中记录的顺序来回放 ),阻塞等待会话3先提交进入waiting for preceding transaction to commit的状态;会话3事务执行时提交要获得表级commit锁,但已经被会话3290897占有,所以阻塞等待。这样形成了3290897->4->3->3290897的死锁。

解决方案1:设置ftwrl-wait-timeout

参数解析

* --ftwrl-wait-query-type=all|update
 * 该选项指定flush table with read lock;语句执行之前必须等待什么类型的查询执行完成,默认值是all,有效值为:all和update,其中update包含UPDATE、ALTER、REPLACE、INSERT等类型语句。
 * 该选项执行等待的时间由选项--ftwrl-wait-threshold设置,默认值为60秒,要注意:如果--ftwrl-wait-timeout选项设置为非零值,则--ftwrl-wait-threshold选项不起作用,以--ftwrl-wait-timeout选项设置的值为准,如果--ftwrl-wait-timeout选项为0秒(默认值),则以--ftwrl-wait-threshold选项指定的值为准
 * 执行该选项需要有process和super权限

* --ftwrl-wait-timeout=SECONDS
 * 该选项指定innobackupex阻塞执行flush table with read lock;语句的时长以等待查询执行完成,如果该选项指定一个非零值,则如果超过指定时间之后仍然有查询在执行,则报错终止备份过程,如果使用零值,则不等待查询执行完成,立即执行flush table with read lock;语句
 * 该选项默认值为0

* --ftwrl-wait-threshold=SECONDS
 * 该选项指定innobackupex检测超过该选项指定的阀值的时间运行的查询,如果选项--ftwrl-wait-timeout指定为一个非零值,则--ftwrl-wait-threshold选项不会执行FTWRL,直到长时间执行的查询执行完成并退出之后才会执行FTWRL,如果--ftwrl-wait-timeout指定为一个零值,则--ftwrl-wait-threshold选项不起作用。个人觉得从官方文档上的解释来看, 以--ftwrl-wait-timeout选项指定的值为准即可,--ftwrl-wait-threshold选项在--ftwrl-wait-timeout选项为零值与非零值时都不起作用。
 * --ftwrl-wait-threshold选项默认值为60秒

ftwrl-wait-timeout=20

root@localhost : (none) 07:35:44> show processlist;
+----+-------------+-----------+------+---------+-------+--------------------------------------------------------+-----------------------------------+
| Id | User | Host | db | Command | Time | State | Info |
+----+-------------+-----------+------+---------+-------+--------------------------------------------------------+-----------------------------------+
| 4 | system user | | NULL | Connect | 34367 | executing | NULL |
| 7 | system user | | NULL | Connect | 34257 | Slave has read all relay log; waiting for more updates | NULL |
| 8 | system user | | NULL | Connect | 34367 | Waiting for an event from Coordinator | NULL |
| 9 | system user | | NULL | Connect | 34367 | Waiting for an event from Coordinator | NULL |
| 10 | system user | | NULL | Connect | 34367 | Waiting for an event from Coordinator | NULL |
| 11 | system user | | NULL | Connect | 34367 | Waiting for an event from Coordinator | NULL |
| 12 | system user | | NULL | Connect | 34367 | Waiting for an event from Coordinator | NULL |
| 13 | system user | | NULL | Connect | 34367 | Waiting for an event from Coordinator | NULL |
| 14 | system user | | NULL | Connect | 34367 | Waiting for an event from Coordinator | NULL |
| 15 | system user | | NULL | Connect | 34366 | Waiting for an event from Coordinator | NULL |
| 16 | system user | | NULL | Connect | 34366 | Waiting for an event from Coordinator | NULL |
| 17 | system user | | NULL | Connect | 34366 | Waiting for an event from Coordinator | NULL |
| 18 | system user | | NULL | Connect | 34366 | Waiting for an event from Coordinator | NULL |
| 19 | system user | | NULL | Connect | 34366 | Waiting for an event from Coordinator | NULL |
| 20 | system user | | NULL | Connect | 34366 | Waiting for an event from Coordinator | NULL |
| 21 | system user | | NULL | Connect | 34366 | Waiting for an event from Coordinator | NULL |
| 22 | system user | | NULL | Connect | 34366 | Waiting for an event from Coordinator | NULL |
| 23 | system user | | NULL | Connect | 34366 | Waiting for an event from Coordinator | NULL |
| 45 | root | localhost | test | Query | 23 | User sleep | select sleep(200),id from sbtest1 |
| 55 | root | localhost | NULL | Query | 0 | starting | show processlist |
+----+-------------+-----------+------+---------+-------+--------------------------------------------------------+-----------------------------------+
20 rows in set (0.00 sec)

# /usr/local/xtrabackup-2.4/bin/innobackupex --defaults-file=/etc/mysql/my.cnf --user=powdba --password=abc123 --host=127.0.0.1 --ftwrl-wait-timeout=20 /data/backup/
...
190319 19:57:40 >> log scanned up to (5875859480)
190319 19:57:40 Waiting 20 seconds for queries running longer than 60 seconds to finish
190319 19:57:40 Waiting for query 45 (duration 138 sec): select sleep(200),id from sbtest1190319 19:57:41 >> log scanned up to (5875859480)
190319 19:57:41 Waiting for query 45 (duration 139 sec): select sleep(200),id from sbtest1190319 19:57:42 >> log scanned up to (5875859480)
190319 19:57:42 Waiting for query 45 (duration 140 sec): select sleep(200),id from sbtest1190319 19:57:43 >> log scanned up to (5875859480)
190319 19:57:43 Waiting for query 45 (duration 141 sec): select sleep(200),id from sbtest1190319 19:57:44 >> log scanned up to (5875859480)
190319 19:57:44 Waiting for query 45 (duration 142 sec): select sleep(200),id from sbtest1190319 19:57:45 >> log scanned up to (5875859480)
190319 19:57:45 Waiting for query 45 (duration 143 sec): select sleep(200),id from sbtest1190319 19:57:46 >> log scanned up to (5875859480)
190319 19:57:46 Waiting for query 45 (duration 144 sec): select sleep(200),id from sbtest1190319 19:57:47 >> log scanned up to (5875859480)
190319 19:57:47 Waiting for query 45 (duration 145 sec): select sleep(200),id from sbtest1190319 19:57:48 >> log scanned up to (5875859480)
190319 19:57:48 Waiting for query 45 (duration 146 sec): select sleep(200),id from sbtest1190319 19:57:49 >> log scanned up to (5875859480)
190319 19:57:49 Waiting for query 45 (duration 147 sec): select sleep(200),id from sbtest1190319 19:57:50 >> log scanned up to (5875859480)
190319 19:57:50 Waiting for query 45 (duration 148 sec): select sleep(200),id from sbtest1190319 19:57:51 >> log scanned up to (5875859480)
190319 19:57:51 Waiting for query 45 (duration 149 sec): select sleep(200),id from sbtest1190319 19:57:52 >> log scanned up to (5875859480)
190319 19:57:52 Waiting for query 45 (duration 150 sec): select sleep(200),id from sbtest1190319 19:57:53 >> log scanned up to (5875859480)
190319 19:57:53 Waiting for query 45 (duration 151 sec): select sleep(200),id from sbtest1190319 19:57:54 >> log scanned up to (5875859480)
190319 19:57:54 Waiting for query 45 (duration 152 sec): select sleep(200),id from sbtest1190319 19:57:55 >> log scanned up to (5875859480)
190319 19:57:55 Waiting for query 45 (duration 153 sec): select sleep(200),id from sbtest1190319 19:57:56 >> log scanned up to (5875859480)
190319 19:57:56 Waiting for query 45 (duration 154 sec): select sleep(200),id from sbtest1190319 19:57:57 >> log scanned up to (5875859480)
190319 19:57:57 Waiting for query 45 (duration 155 sec): select sleep(200),id from sbtest1190319 19:57:58 >> log scanned up to (5875859480)
190319 19:57:58 Waiting for query 45 (duration 156 sec): select sleep(200),id from sbtest1190319 19:57:59 >> log scanned up to (5875859480)
190319 19:57:59 Waiting for query 45 (duration 157 sec): select sleep(200),id from sbtest1190319 19:58:00 >> log scanned up to (5875859480)
190319 19:58:00 Waiting for query 45 (duration 158 sec): select sleep(200),id from sbtest1190319 19:58:01 >> log scanned up to (5875859480)
190319 19:58:01 Unable to obtain lock. Please try again later.
备份期间其他session
root@localhost : test 07:57:45> show processlist;
+----+-------------+-----------------+------+---------+-------+--------------------------------------------------------+-----------------------------------+
| Id | User | Host | db | Command | Time | State | Info |
+----+-------------+-----------------+------+---------+-------+--------------------------------------------------------+-----------------------------------+
| 4 | system user | | NULL | Connect | 34494 | executing | NULL |
| 7 | system user | | NULL | Connect | 34384 | Slave has read all relay log; waiting for more updates | NULL |
| 8 | system user | | NULL | Connect | 34494 | Waiting for an event from Coordinator | NULL |
| 9 | system user | | NULL | Connect | 34494 | Waiting for an event from Coordinator | NULL |
| 10 | system user | | NULL | Connect | 34494 | Waiting for an event from Coordinator | NULL |
| 11 | system user | | NULL | Connect | 34494 | Waiting for an event from Coordinator | NULL |
| 12 | system user | | NULL | Connect | 34494 | Waiting for an event from Coordinator | NULL |
| 13 | system user | | NULL | Connect | 34494 | Waiting for an event from Coordinator | NULL |
| 14 | system user | | NULL | Connect | 34494 | Waiting for an event from Coordinator | NULL |
| 15 | system user | | NULL | Connect | 34493 | Waiting for an event from Coordinator | NULL |
| 16 | system user | | NULL | Connect | 34493 | Waiting for an event from Coordinator | NULL |
| 17 | system user | | NULL | Connect | 34493 | Waiting for an event from Coordinator | NULL |
| 18 | system user | | NULL | Connect | 34493 | Waiting for an event from Coordinator | NULL |
| 19 | system user | | NULL | Connect | 34493 | Waiting for an event from Coordinator | NULL |
| 20 | system user | | NULL | Connect | 34493 | Waiting for an event from Coordinator | NULL |
| 21 | system user | | NULL | Connect | 34493 | Waiting for an event from Coordinator | NULL |
| 22 | system user | | NULL | Connect | 34493 | Waiting for an event from Coordinator | NULL |
| 23 | system user | | NULL | Connect | 34493 | Waiting for an event from Coordinator | NULL |
| 45 | root | localhost | test | Query | 150 | User sleep | select sleep(200),id from sbtest1 |
| 55 | root | localhost | test | Query | 0 | starting | show processlist |
| 57 | powdba | 127.0.0.1:35810 | NULL | Sleep | 1 | | NULL |
+----+-------------+-----------------+------+---------+-------+--------------------------------------------------------+-----------------------------------+
21 rows in set (0.00 sec)

ftwrl-wait-timeout=20

root@localhost : (none) 07:55:44> show processlist;
+----+-------------+-----------+------+---------+-------+--------------------------------------------------------+-----------------------------------+
| Id | User | Host | db | Command | Time | State | Info |
+----+-------------+-----------+------+---------+-------+--------------------------------------------------------+-----------------------------------+
| 4 | system user | | NULL | Connect | 34367 | executing | NULL |
| 7 | system user | | NULL | Connect | 34257 | Slave has read all relay log; waiting for more updates | NULL |
| 8 | system user | | NULL | Connect | 34367 | Waiting for an event from Coordinator | NULL |
| 9 | system user | | NULL | Connect | 34367 | Waiting for an event from Coordinator | NULL |
| 10 | system user | | NULL | Connect | 34367 | Waiting for an event from Coordinator | NULL |
| 11 | system user | | NULL | Connect | 34367 | Waiting for an event from Coordinator | NULL |
| 12 | system user | | NULL | Connect | 34367 | Waiting for an event from Coordinator | NULL |
| 13 | system user | | NULL | Connect | 34367 | Waiting for an event from Coordinator | NULL |
| 14 | system user | | NULL | Connect | 34367 | Waiting for an event from Coordinator | NULL |
| 15 | system user | | NULL | Connect | 34366 | Waiting for an event from Coordinator | NULL |
| 16 | system user | | NULL | Connect | 34366 | Waiting for an event from Coordinator | NULL |
| 17 | system user | | NULL | Connect | 34366 | Waiting for an event from Coordinator | NULL |
| 18 | system user | | NULL | Connect | 34366 | Waiting for an event from Coordinator | NULL |
| 19 | system user | | NULL | Connect | 34366 | Waiting for an event from Coordinator | NULL |
| 20 | system user | | NULL | Connect | 34366 | Waiting for an event from Coordinator | NULL |
| 21 | system user | | NULL | Connect | 34366 | Waiting for an event from Coordinator | NULL |
| 22 | system user | | NULL | Connect | 34366 | Waiting for an event from Coordinator | NULL |
| 23 | system user | | NULL | Connect | 34366 | Waiting for an event from Coordinator | NULL |
| 45 | root | localhost | test | Query | 1 | User sleep | select sleep(200),id from sbtest1 |
| 55 | root | localhost | NULL | Query | 0 | starting | show processlist |
+----+-------------+-----------+------+---------+-------+--------------------------------------------------------+-----------------------------------+
20 rows in set (0.00 sec)


# /usr/local/xtrabackup-2.4/bin/innobackupex --defaults-file=/etc/mysql/my.cnf --user=powdba --password=abc123 --host=127.0.0.1 --ftwrl-wait-timeout=20 /data/backup/
...
190319 20:00:29 >> log scanned up to (5875859480)
190319 20:00:29 Waiting 20 seconds for queries running longer than 60 seconds to finish
190319 20:00:29 Executing FLUSH TABLES WITH READ LOCK...
190319 20:00:30 >> log scanned up to (5875859480)
190319 20:00:31 >> log scanned up to (5875859480)
190319 20:00:32 >> log scanned up to (5875859480)
190319 20:00:33 >> log scanned up to (5875859480)
190319 20:00:34 >> log scanned up to (5875859480)
190319 20:00:35 >> log scanned up to (5875859480)
190319 20:00:36 >> log scanned up to (5875859480)
190319 20:00:37 >> log scanned up to (5875859480)
190319 20:00:38 >> log scanned up to (5875859480)
190319 20:00:39 >> log scanned up to (5875859480)
190319 20:00:40 >> log scanned up to (5875859480)
190319 20:00:41 >> log scanned up to (5875859480)
190319 20:00:42 >> log scanned up to (5875859480)
190319 20:00:43 >> log scanned up to (5875859480)
190319 20:00:44 >> log scanned up to (5875859480)
190319 20:00:45 >> log scanned up to (5875859480)
190319 20:00:46 >> log scanned up to (5875859480)
190319 20:00:47 >> log scanned up to (5875859480)
190319 20:00:48 >> log scanned up to (5875859480)
190319 20:00:49 >> log scanned up to (5875859480)
190319 20:00:50 >> log scanned up to (5875859480)
190319 20:00:51 >> log scanned up to (5875859480)
190319 20:00:52 >> log scanned up to (5875859480)
备份期间其他session
root@localhost : test 08:00:32> show processlist;
+----+-------------+-----------------+------+---------+-------+--------------------------------------------------------+-----------------------------------+
| Id | User | Host | db | Command | Time | State | Info |
+----+-------------+-----------------+------+---------+-------+--------------------------------------------------------+-----------------------------------+
| 4 | system user | | NULL | Connect | 34655 | executing | NULL |
| 7 | system user | | NULL | Connect | 34545 | Slave has read all relay log; waiting for more updates | NULL |
| 8 | system user | | NULL | Connect | 34655 | Waiting for an event from Coordinator | NULL |
| 9 | system user | | NULL | Connect | 34655 | Waiting for an event from Coordinator | NULL |
| 10 | system user | | NULL | Connect | 34655 | Waiting for an event from Coordinator | NULL |
| 11 | system user | | NULL | Connect | 34655 | Waiting for an event from Coordinator | NULL |
| 12 | system user | | NULL | Connect | 34655 | Waiting for an event from Coordinator | NULL |
| 13 | system user | | NULL | Connect | 34655 | Waiting for an event from Coordinator | NULL |
| 14 | system user | | NULL | Connect | 34655 | Waiting for an event from Coordinator | NULL |
| 15 | system user | | NULL | Connect | 34654 | Waiting for an event from Coordinator | NULL |
| 16 | system user | | NULL | Connect | 34654 | Waiting for an event from Coordinator | NULL |
| 17 | system user | | NULL | Connect | 34654 | Waiting for an event from Coordinator | NULL |
| 18 | system user | | NULL | Connect | 34654 | Waiting for an event from Coordinator | NULL |
| 19 | system user | | NULL | Connect | 34654 | Waiting for an event from Coordinator | NULL |
| 20 | system user | | NULL | Connect | 34654 | Waiting for an event from Coordinator | NULL |
| 21 | system user | | NULL | Connect | 34654 | Waiting for an event from Coordinator | NULL |
| 22 | system user | | NULL | Connect | 34654 | Waiting for an event from Coordinator | NULL |
| 23 | system user | | NULL | Connect | 34654 | Waiting for an event from Coordinator | NULL |
| 45 | root | localhost | test | Query | 34 | User sleep | select sleep(200),id from sbtest1 |
| 55 | root | localhost | test | Query | 0 | starting | show processlist |
| 59 | powdba | 127.0.0.1:35812 | NULL | Query | 4 | Waiting for table flush | FLUSH TABLES WITH READ LOCK |
+----+-------------+-----------------+------+---------+-------+--------------------------------------------------------+-----------------------------------+
21 rows in set (0.00 sec)

###一直处于等待中
root@localhost : test 08:00:33> select * from sbtest1 limit 10;

小结

  • 设置--ftwrl-wait-timeout=20时。在执行flush table with read lock时,如果堵塞SQL此时执行时间大于参数ftwrl-wait-timeout(即10s) ,xtrabackup线程在执行flush table with read lock时会超时退出。
  • 设置--ftwrl-wait-timeout=20时。在执行flush table with read lock时,如果堵塞SQL此时执行时间小于参数ftwrl-wait-timeout(即10s) ,xtrabackup线程在执行flush table with read lock时不会超时退出,会一直备份,但是不影响业务运行。

风险

  • 当一条长查询刚开始执行,还没有到达ftwrl-wait-threshold指定的时间时,备份的FTWRL发送到了数据库,这时xtrabackup不会认为这条长查询是一条长查询。

解决方案2:设置kill-long-queries-timeout

参数解析

  • --kill-long-queries-timeout=SECONDS

    • 该选项指定innobackupex在执行FLUSH TABLES WITH READ LOCK时碰到阻塞其获得锁的查询时,等待该参数指定的秒数之后,如果仍然有查询在运行,则执行kill掉这些查询
    • 默认值为0,表示innobackupex 不启用尝试kill掉任何查询的功能
  • --kill-long-query-type=all|select

    • 该选项指定哪些类型的查询在指定的查询时间之后还没有执行完成时被kill掉,以释放阻塞加全局读锁的锁,默认值为all,有效值有:all和select
    • 执行该选项需要有process和super权限
    • 执行kill的超时时间由选项--kill-long-queries-timeout指定,默认值为0,表示禁止kill掉超时查询的功能

kill-long-queries-timeout=20

# /usr/local/xtrabackup-2.4/bin/innobackupex --defaults-file=/etc/mysql/my.cnf --user=powdba --password=abc123 --host=127.0.0.1 --kill-long-queries-timeout=20 /data/backup/
...
190319 20:07:43 Executing FLUSH TABLES WITH READ LOCK...
190319 20:07:43 Kill query timeout 20 seconds.
190319 20:07:44 >> log scanned up to (5875859480)
190319 20:07:45 >> log scanned up to (5875859480)
190319 20:07:46 >> log scanned up to (5875859480)
190319 20:07:47 >> log scanned up to (5875859480)
190319 20:07:48 >> log scanned up to (5875859480)
190319 20:07:49 >> log scanned up to (5875859480)
190319 20:07:50 >> log scanned up to (5875859480)
190319 20:07:51 >> log scanned up to (5875859480)
190319 20:07:52 >> log scanned up to (5875859480)
190319 20:07:53 >> log scanned up to (5875859480)
190319 20:07:54 >> log scanned up to (5875859480)
190319 20:07:55 >> log scanned up to (5875859480)
190319 20:07:56 >> log scanned up to (5875859480)
190319 20:07:57 >> log scanned up to (5875859480)
190319 20:07:58 >> log scanned up to (5875859480)
190319 20:07:59 >> log scanned up to (5875859480)
190319 20:08:00 >> log scanned up to (5875859480)
190319 20:08:01 >> log scanned up to (5875859480)
190319 20:08:02 >> log scanned up to (5875859480)
190319 20:08:03 Connecting to MySQL server host: 127.0.0.1, user: powdba, password: set, port: 3306, socket: /var/lib/mysql/data/sock/mysql.sock
190319 20:08:03 Killing query 45 (duration 61 sec): select sleep(200),id from sbtest1
190319 20:08:03 Kill query thread stopped
190319 20:08:03 Starting to backup non-InnoDB tables and files

root@localhost : test 08:07:01> select sleep(200),id from sbtest1;
ERROR 2013 (HY000): Lost connection to MySQL server during query
ERROR 2006 (HY000): MySQL server has gone away
No connection. Trying to reconnect...
Connection id: 64
Current database: test
...
备份期间其他会话
root@localhost : test 08:13:38> show processlist;
+----+-------------+-----------------+------+---------+-------+--------------------------------------------------------+-----------------------------------+
| Id | User | Host | db | Command | Time | State | Info |
+----+-------------+-----------------+------+---------+-------+--------------------------------------------------------+-----------------------------------+
| 4 | system user | | NULL | Connect | 35442 | executing | NULL |
| 7 | system user | | NULL | Connect | 35332 | Slave has read all relay log; waiting for more updates | NULL |
| 8 | system user | | NULL | Connect | 35442 | Waiting for an event from Coordinator | NULL |
| 9 | system user | | NULL | Connect | 35442 | Waiting for an event from Coordinator | NULL |
| 10 | system user | | NULL | Connect | 35442 | Waiting for an event from Coordinator | NULL |
| 11 | system user | | NULL | Connect | 35442 | Waiting for an event from Coordinator | NULL |
| 12 | system user | | NULL | Connect | 35442 | Waiting for an event from Coordinator | NULL |
| 13 | system user | | NULL | Connect | 35442 | Waiting for an event from Coordinator | NULL |
| 14 | system user | | NULL | Connect | 35442 | Waiting for an event from Coordinator | NULL |
| 15 | system user | | NULL | Connect | 35441 | Waiting for an event from Coordinator | NULL |
| 16 | system user | | NULL | Connect | 35441 | Waiting for an event from Coordinator | NULL |
| 17 | system user | | NULL | Connect | 35441 | Waiting for an event from Coordinator | NULL |
| 18 | system user | | NULL | Connect | 35441 | Waiting for an event from Coordinator | NULL |
| 19 | system user | | NULL | Connect | 35441 | Waiting for an event from Coordinator | NULL |
| 20 | system user | | NULL | Connect | 35441 | Waiting for an event from Coordinator | NULL |
| 21 | system user | | NULL | Connect | 35441 | Waiting for an event from Coordinator | NULL |
| 22 | system user | | NULL | Connect | 35441 | Waiting for an event from Coordinator | NULL |
| 23 | system user | | NULL | Connect | 35441 | Waiting for an event from Coordinator | NULL |
| 55 | root | localhost | test | Query | 0 | starting | show processlist |
| 70 | root | localhost | test | Query | 40 | User sleep | select sleep(200),id from sbtest1 |
| 71 | powdba | 127.0.0.1:35826 | NULL | Query | 8 | Waiting for table flush | FLUSH TABLES WITH READ LOCK |
+----+-------------+-----------------+------+---------+-------+--------------------------------------------------------+-----------------------------------+
21 rows in set (0.00 sec)

### 再次其他表数据时会堵塞,这是flush table的作用
root@localhost : test 08:13:40> select * from test limit 10;
+----+------+------+------+
| id | sid | mid | name |
+----+------+------+------+
| 1 | 86 | 0 | OU |
| 8 | 38 | 51 | Ai |
| 15 | 82 | 24 | AM |
| 22 | 68 | 47 | T7 |
| 29 | 83 | 72 | tj |
| 36 | 48 | 35 | gw |
| 43 | 89 | 73 | uJ |
| 50 | 0 | 63 | aX |
| 57 | 42 | 83 | iU |
| 64 | 43 | 7 | 3b |
+----+------+------+------

### 再次查找该表数据时会堵塞
root@localhost : test 08:11:58> select * from sbtest1 limit 10;

小结

  • 当使用xtrabackup参数kill-long-queries-timeout时,会在超过等待时间后kill掉堵塞的SQL
  • 在堵塞flush table with read lock期间,可以访问非堵塞SQL所在的表,不能访问堵塞SQL所在的表。

风险

  • 这种方式对堵塞SQL影响很大,会强制kill掉,对业务会有一定的影响

解决方案3:safe-slave-backup-timeout

参数解析

  • --safe-slave-backup-timeout=SECONDS

    • 该选项在启用 --safe-slave-backup选项时,等待状态变量Slave_open_temp_tables 转变为0值的等待时间,默认值为300秒,如果超过指定的时间状态变量Slave_open_temp_tables 还未转变为0值,则启动复制并报错终止备份

风险

  • 这种方式通过关闭sql thread的方式避免死锁的发生,但是会加大主从的延迟

解决方案4:使用脚本kill xtrabackup query的线程

image

风险:

kill query时,如果存在flush tables with read lock被一条SQL阻塞,那么kill query后,这个SQL不进行完或者被kill掉,那么之后对这条SQL所对应的表做的任何操作都会被阻塞,需要等待这条SQL执行完毕或者被kill掉或者等待lock_wait_timeout时间。

相关实践学习
如何在云端创建MySQL数据库
开始实验后,系统会自动创建一台自建MySQL的 源数据库 ECS 实例和一台 目标数据库 RDS。
全面了解阿里云能为你做什么
阿里云在全球各地部署高效节能的绿色数据中心,利用清洁计算为万物互联的新世界提供源源不断的能源动力,目前开服的区域包括中国(华北、华东、华南、香港)、新加坡、美国(美东、美西)、欧洲、中东、澳大利亚、日本。目前阿里云的产品涵盖弹性计算、数据库、存储与CDN、分析与搜索、云通信、网络、管理与监控、应用服务、互联网中间件、移动服务、视频服务等。通过本课程,来了解阿里云能够为你的业务带来哪些帮助     相关的阿里云产品:云服务器ECS 云服务器 ECS(Elastic Compute Service)是一种弹性可伸缩的计算服务,助您降低 IT 成本,提升运维效率,使您更专注于核心业务创新。产品详情: https://www.aliyun.com/product/ecs
目录
相关文章
|
1月前
|
SQL 关系型数据库 MySQL
MySQL死锁及源码分析!
MySQL死锁及源码分析!
MySQL死锁及源码分析!
|
1月前
|
SQL 关系型数据库 MySQL
案例剖析:MySQL唯一索引并发插入导致死锁!
案例剖析:MySQL唯一索引并发插入导致死锁!
112 0
案例剖析:MySQL唯一索引并发插入导致死锁!
|
1月前
|
SQL 关系型数据库 MySQL
案例剖析,MySQL共享锁引发的死锁问题!
案例剖析,MySQL共享锁引发的死锁问题!
|
3月前
|
SQL 关系型数据库 MySQL
遇到mysql数据库死锁,你会怎么排查?
遇到mysql数据库死锁,你会怎么排查?
264 0
|
19天前
|
SQL 算法 关系型数据库
面试:什么是死锁,如何避免或解决死锁;MySQL中的死锁现象,MySQL死锁如何解决
面试:什么是死锁,死锁产生的四个必要条件,如何避免或解决死锁;数据库锁,锁分类,控制事务;MySQL中的死锁现象,MySQL死锁如何解决
|
29天前
|
关系型数据库 MySQL 数据库
一个 MySQL 数据库死锁的案例和解决方案
本文介绍了一个 MySQL 数据库死锁的案例和解决方案。
65 3
|
2月前
|
监控 关系型数据库 MySQL
MySQL锁机制与解决死锁问题
MySQL锁机制与解决死锁问题
294 5
|
1月前
|
监控 关系型数据库 MySQL
一次彻底讲清如何处理mysql 的死锁问题
【10月更文挑战第16天】本文详细介绍了如何处理 MySQL 中的死锁问题,涵盖死锁的概念、原因、检测方法及解决策略,强调通过优化事务设计、调整数据库参数、手动处理和预防措施等手段,有效减少死锁,提升数据库性能与稳定性。
274 0
|
3月前
|
存储 SQL 关系型数据库
深入MySQL锁机制:原理、死锁解决及Java防范技巧
深入MySQL锁机制:原理、死锁解决及Java防范技巧
|
3月前
|
SQL JavaScript 关系型数据库
Mysql索引不当引发死锁问题
本文通过真实案例解析了MySQL在高并发环境下出现死锁的问题。数据库表`t_award`包含多个索引,但在执行特定SQL语句时遭遇索引失效,导致更新操作变慢并引发死锁。分析发现,联合索引`(pool_id, identifier, status, is_redeemed)`因`identifier`允许为空值而导致索引部分失效。此外,`pool_id`上的普通索引产生的间隙锁在高并发下加剧了死锁风险。为解决此问题,文中提出了调整索引顺序至`(pool_id, status, is_redeemed, identifier)`等方案来优化索引使用,进而减轻死锁现象。
下一篇
无影云桌面