MySQL下xtrabackup与MTS造成的死锁

本文涉及的产品
RDS MySQL Serverless 基础系列,0.5-2RCU 50GB
云数据库 RDS MySQL,集群系列 2核4GB
推荐场景:
搭建个人博客
云数据库 RDS MySQL,高可用系列 2核4GB
简介: 测试背景 MySQL的物理备份在企业版中是有的,但是community版本的话是不提供这个版本的,但是percona作为MySQL的技术服务公司,开源了自己的物理备份工具--xtrabackup。

测试背景

MySQL的物理备份在企业版中是有的,但是community版本的话是不提供这个版本的,但是percona作为MySQL的技术服务公司,开源了自己的物理备份工具--xtrabackup。本文主要是针对生产环境使用xtrabackup遇到的坑。

测试环境

  • Centos 7.4
  • MySQL 5.7.25
  • Xtrabackup 2.4.12

准备工作

  • 在slave上使用物理备份
  • MySQL开启MTS

生产环境

现象

image

分析

  • 会话4为Coordinator线程分发的事务还不能执行,进入waiting for dependent transaction to commit的状态,会话3290897、4、3线程之间形成死锁,其中会话3290897占有全局读锁,获取全局commit锁的时候进入阻塞,等待会话4释放事务涉及到表的commit锁;会话3、4的事务可以并行复制,会话4占有表级commit锁,但是事务对应的binlog在后面(在MTS下,slave_preserve_commit_order=1时,保证事务是按照relay log中记录的顺序来回放 ),阻塞等待会话3先提交进入waiting for preceding transaction to commit的状态;会话3事务执行时提交要获得表级commit锁,但已经被会话3290897占有,所以阻塞等待。这样形成了3290897->4->3->3290897的死锁。

解决方案1:设置ftwrl-wait-timeout

参数解析

* --ftwrl-wait-query-type=all|update
 * 该选项指定flush table with read lock;语句执行之前必须等待什么类型的查询执行完成,默认值是all,有效值为:all和update,其中update包含UPDATE、ALTER、REPLACE、INSERT等类型语句。
 * 该选项执行等待的时间由选项--ftwrl-wait-threshold设置,默认值为60秒,要注意:如果--ftwrl-wait-timeout选项设置为非零值,则--ftwrl-wait-threshold选项不起作用,以--ftwrl-wait-timeout选项设置的值为准,如果--ftwrl-wait-timeout选项为0秒(默认值),则以--ftwrl-wait-threshold选项指定的值为准
 * 执行该选项需要有process和super权限

* --ftwrl-wait-timeout=SECONDS
 * 该选项指定innobackupex阻塞执行flush table with read lock;语句的时长以等待查询执行完成,如果该选项指定一个非零值,则如果超过指定时间之后仍然有查询在执行,则报错终止备份过程,如果使用零值,则不等待查询执行完成,立即执行flush table with read lock;语句
 * 该选项默认值为0

* --ftwrl-wait-threshold=SECONDS
 * 该选项指定innobackupex检测超过该选项指定的阀值的时间运行的查询,如果选项--ftwrl-wait-timeout指定为一个非零值,则--ftwrl-wait-threshold选项不会执行FTWRL,直到长时间执行的查询执行完成并退出之后才会执行FTWRL,如果--ftwrl-wait-timeout指定为一个零值,则--ftwrl-wait-threshold选项不起作用。个人觉得从官方文档上的解释来看, 以--ftwrl-wait-timeout选项指定的值为准即可,--ftwrl-wait-threshold选项在--ftwrl-wait-timeout选项为零值与非零值时都不起作用。
 * --ftwrl-wait-threshold选项默认值为60秒

ftwrl-wait-timeout=20

root@localhost : (none) 07:35:44> show processlist;
+----+-------------+-----------+------+---------+-------+--------------------------------------------------------+-----------------------------------+
| Id | User | Host | db | Command | Time | State | Info |
+----+-------------+-----------+------+---------+-------+--------------------------------------------------------+-----------------------------------+
| 4 | system user | | NULL | Connect | 34367 | executing | NULL |
| 7 | system user | | NULL | Connect | 34257 | Slave has read all relay log; waiting for more updates | NULL |
| 8 | system user | | NULL | Connect | 34367 | Waiting for an event from Coordinator | NULL |
| 9 | system user | | NULL | Connect | 34367 | Waiting for an event from Coordinator | NULL |
| 10 | system user | | NULL | Connect | 34367 | Waiting for an event from Coordinator | NULL |
| 11 | system user | | NULL | Connect | 34367 | Waiting for an event from Coordinator | NULL |
| 12 | system user | | NULL | Connect | 34367 | Waiting for an event from Coordinator | NULL |
| 13 | system user | | NULL | Connect | 34367 | Waiting for an event from Coordinator | NULL |
| 14 | system user | | NULL | Connect | 34367 | Waiting for an event from Coordinator | NULL |
| 15 | system user | | NULL | Connect | 34366 | Waiting for an event from Coordinator | NULL |
| 16 | system user | | NULL | Connect | 34366 | Waiting for an event from Coordinator | NULL |
| 17 | system user | | NULL | Connect | 34366 | Waiting for an event from Coordinator | NULL |
| 18 | system user | | NULL | Connect | 34366 | Waiting for an event from Coordinator | NULL |
| 19 | system user | | NULL | Connect | 34366 | Waiting for an event from Coordinator | NULL |
| 20 | system user | | NULL | Connect | 34366 | Waiting for an event from Coordinator | NULL |
| 21 | system user | | NULL | Connect | 34366 | Waiting for an event from Coordinator | NULL |
| 22 | system user | | NULL | Connect | 34366 | Waiting for an event from Coordinator | NULL |
| 23 | system user | | NULL | Connect | 34366 | Waiting for an event from Coordinator | NULL |
| 45 | root | localhost | test | Query | 23 | User sleep | select sleep(200),id from sbtest1 |
| 55 | root | localhost | NULL | Query | 0 | starting | show processlist |
+----+-------------+-----------+------+---------+-------+--------------------------------------------------------+-----------------------------------+
20 rows in set (0.00 sec)

# /usr/local/xtrabackup-2.4/bin/innobackupex --defaults-file=/etc/mysql/my.cnf --user=powdba --password=abc123 --host=127.0.0.1 --ftwrl-wait-timeout=20 /data/backup/
...
190319 19:57:40 >> log scanned up to (5875859480)
190319 19:57:40 Waiting 20 seconds for queries running longer than 60 seconds to finish
190319 19:57:40 Waiting for query 45 (duration 138 sec): select sleep(200),id from sbtest1190319 19:57:41 >> log scanned up to (5875859480)
190319 19:57:41 Waiting for query 45 (duration 139 sec): select sleep(200),id from sbtest1190319 19:57:42 >> log scanned up to (5875859480)
190319 19:57:42 Waiting for query 45 (duration 140 sec): select sleep(200),id from sbtest1190319 19:57:43 >> log scanned up to (5875859480)
190319 19:57:43 Waiting for query 45 (duration 141 sec): select sleep(200),id from sbtest1190319 19:57:44 >> log scanned up to (5875859480)
190319 19:57:44 Waiting for query 45 (duration 142 sec): select sleep(200),id from sbtest1190319 19:57:45 >> log scanned up to (5875859480)
190319 19:57:45 Waiting for query 45 (duration 143 sec): select sleep(200),id from sbtest1190319 19:57:46 >> log scanned up to (5875859480)
190319 19:57:46 Waiting for query 45 (duration 144 sec): select sleep(200),id from sbtest1190319 19:57:47 >> log scanned up to (5875859480)
190319 19:57:47 Waiting for query 45 (duration 145 sec): select sleep(200),id from sbtest1190319 19:57:48 >> log scanned up to (5875859480)
190319 19:57:48 Waiting for query 45 (duration 146 sec): select sleep(200),id from sbtest1190319 19:57:49 >> log scanned up to (5875859480)
190319 19:57:49 Waiting for query 45 (duration 147 sec): select sleep(200),id from sbtest1190319 19:57:50 >> log scanned up to (5875859480)
190319 19:57:50 Waiting for query 45 (duration 148 sec): select sleep(200),id from sbtest1190319 19:57:51 >> log scanned up to (5875859480)
190319 19:57:51 Waiting for query 45 (duration 149 sec): select sleep(200),id from sbtest1190319 19:57:52 >> log scanned up to (5875859480)
190319 19:57:52 Waiting for query 45 (duration 150 sec): select sleep(200),id from sbtest1190319 19:57:53 >> log scanned up to (5875859480)
190319 19:57:53 Waiting for query 45 (duration 151 sec): select sleep(200),id from sbtest1190319 19:57:54 >> log scanned up to (5875859480)
190319 19:57:54 Waiting for query 45 (duration 152 sec): select sleep(200),id from sbtest1190319 19:57:55 >> log scanned up to (5875859480)
190319 19:57:55 Waiting for query 45 (duration 153 sec): select sleep(200),id from sbtest1190319 19:57:56 >> log scanned up to (5875859480)
190319 19:57:56 Waiting for query 45 (duration 154 sec): select sleep(200),id from sbtest1190319 19:57:57 >> log scanned up to (5875859480)
190319 19:57:57 Waiting for query 45 (duration 155 sec): select sleep(200),id from sbtest1190319 19:57:58 >> log scanned up to (5875859480)
190319 19:57:58 Waiting for query 45 (duration 156 sec): select sleep(200),id from sbtest1190319 19:57:59 >> log scanned up to (5875859480)
190319 19:57:59 Waiting for query 45 (duration 157 sec): select sleep(200),id from sbtest1190319 19:58:00 >> log scanned up to (5875859480)
190319 19:58:00 Waiting for query 45 (duration 158 sec): select sleep(200),id from sbtest1190319 19:58:01 >> log scanned up to (5875859480)
190319 19:58:01 Unable to obtain lock. Please try again later.
备份期间其他session
root@localhost : test 07:57:45> show processlist;
+----+-------------+-----------------+------+---------+-------+--------------------------------------------------------+-----------------------------------+
| Id | User | Host | db | Command | Time | State | Info |
+----+-------------+-----------------+------+---------+-------+--------------------------------------------------------+-----------------------------------+
| 4 | system user | | NULL | Connect | 34494 | executing | NULL |
| 7 | system user | | NULL | Connect | 34384 | Slave has read all relay log; waiting for more updates | NULL |
| 8 | system user | | NULL | Connect | 34494 | Waiting for an event from Coordinator | NULL |
| 9 | system user | | NULL | Connect | 34494 | Waiting for an event from Coordinator | NULL |
| 10 | system user | | NULL | Connect | 34494 | Waiting for an event from Coordinator | NULL |
| 11 | system user | | NULL | Connect | 34494 | Waiting for an event from Coordinator | NULL |
| 12 | system user | | NULL | Connect | 34494 | Waiting for an event from Coordinator | NULL |
| 13 | system user | | NULL | Connect | 34494 | Waiting for an event from Coordinator | NULL |
| 14 | system user | | NULL | Connect | 34494 | Waiting for an event from Coordinator | NULL |
| 15 | system user | | NULL | Connect | 34493 | Waiting for an event from Coordinator | NULL |
| 16 | system user | | NULL | Connect | 34493 | Waiting for an event from Coordinator | NULL |
| 17 | system user | | NULL | Connect | 34493 | Waiting for an event from Coordinator | NULL |
| 18 | system user | | NULL | Connect | 34493 | Waiting for an event from Coordinator | NULL |
| 19 | system user | | NULL | Connect | 34493 | Waiting for an event from Coordinator | NULL |
| 20 | system user | | NULL | Connect | 34493 | Waiting for an event from Coordinator | NULL |
| 21 | system user | | NULL | Connect | 34493 | Waiting for an event from Coordinator | NULL |
| 22 | system user | | NULL | Connect | 34493 | Waiting for an event from Coordinator | NULL |
| 23 | system user | | NULL | Connect | 34493 | Waiting for an event from Coordinator | NULL |
| 45 | root | localhost | test | Query | 150 | User sleep | select sleep(200),id from sbtest1 |
| 55 | root | localhost | test | Query | 0 | starting | show processlist |
| 57 | powdba | 127.0.0.1:35810 | NULL | Sleep | 1 | | NULL |
+----+-------------+-----------------+------+---------+-------+--------------------------------------------------------+-----------------------------------+
21 rows in set (0.00 sec)

ftwrl-wait-timeout=20

root@localhost : (none) 07:55:44> show processlist;
+----+-------------+-----------+------+---------+-------+--------------------------------------------------------+-----------------------------------+
| Id | User | Host | db | Command | Time | State | Info |
+----+-------------+-----------+------+---------+-------+--------------------------------------------------------+-----------------------------------+
| 4 | system user | | NULL | Connect | 34367 | executing | NULL |
| 7 | system user | | NULL | Connect | 34257 | Slave has read all relay log; waiting for more updates | NULL |
| 8 | system user | | NULL | Connect | 34367 | Waiting for an event from Coordinator | NULL |
| 9 | system user | | NULL | Connect | 34367 | Waiting for an event from Coordinator | NULL |
| 10 | system user | | NULL | Connect | 34367 | Waiting for an event from Coordinator | NULL |
| 11 | system user | | NULL | Connect | 34367 | Waiting for an event from Coordinator | NULL |
| 12 | system user | | NULL | Connect | 34367 | Waiting for an event from Coordinator | NULL |
| 13 | system user | | NULL | Connect | 34367 | Waiting for an event from Coordinator | NULL |
| 14 | system user | | NULL | Connect | 34367 | Waiting for an event from Coordinator | NULL |
| 15 | system user | | NULL | Connect | 34366 | Waiting for an event from Coordinator | NULL |
| 16 | system user | | NULL | Connect | 34366 | Waiting for an event from Coordinator | NULL |
| 17 | system user | | NULL | Connect | 34366 | Waiting for an event from Coordinator | NULL |
| 18 | system user | | NULL | Connect | 34366 | Waiting for an event from Coordinator | NULL |
| 19 | system user | | NULL | Connect | 34366 | Waiting for an event from Coordinator | NULL |
| 20 | system user | | NULL | Connect | 34366 | Waiting for an event from Coordinator | NULL |
| 21 | system user | | NULL | Connect | 34366 | Waiting for an event from Coordinator | NULL |
| 22 | system user | | NULL | Connect | 34366 | Waiting for an event from Coordinator | NULL |
| 23 | system user | | NULL | Connect | 34366 | Waiting for an event from Coordinator | NULL |
| 45 | root | localhost | test | Query | 1 | User sleep | select sleep(200),id from sbtest1 |
| 55 | root | localhost | NULL | Query | 0 | starting | show processlist |
+----+-------------+-----------+------+---------+-------+--------------------------------------------------------+-----------------------------------+
20 rows in set (0.00 sec)


# /usr/local/xtrabackup-2.4/bin/innobackupex --defaults-file=/etc/mysql/my.cnf --user=powdba --password=abc123 --host=127.0.0.1 --ftwrl-wait-timeout=20 /data/backup/
...
190319 20:00:29 >> log scanned up to (5875859480)
190319 20:00:29 Waiting 20 seconds for queries running longer than 60 seconds to finish
190319 20:00:29 Executing FLUSH TABLES WITH READ LOCK...
190319 20:00:30 >> log scanned up to (5875859480)
190319 20:00:31 >> log scanned up to (5875859480)
190319 20:00:32 >> log scanned up to (5875859480)
190319 20:00:33 >> log scanned up to (5875859480)
190319 20:00:34 >> log scanned up to (5875859480)
190319 20:00:35 >> log scanned up to (5875859480)
190319 20:00:36 >> log scanned up to (5875859480)
190319 20:00:37 >> log scanned up to (5875859480)
190319 20:00:38 >> log scanned up to (5875859480)
190319 20:00:39 >> log scanned up to (5875859480)
190319 20:00:40 >> log scanned up to (5875859480)
190319 20:00:41 >> log scanned up to (5875859480)
190319 20:00:42 >> log scanned up to (5875859480)
190319 20:00:43 >> log scanned up to (5875859480)
190319 20:00:44 >> log scanned up to (5875859480)
190319 20:00:45 >> log scanned up to (5875859480)
190319 20:00:46 >> log scanned up to (5875859480)
190319 20:00:47 >> log scanned up to (5875859480)
190319 20:00:48 >> log scanned up to (5875859480)
190319 20:00:49 >> log scanned up to (5875859480)
190319 20:00:50 >> log scanned up to (5875859480)
190319 20:00:51 >> log scanned up to (5875859480)
190319 20:00:52 >> log scanned up to (5875859480)
备份期间其他session
root@localhost : test 08:00:32> show processlist;
+----+-------------+-----------------+------+---------+-------+--------------------------------------------------------+-----------------------------------+
| Id | User | Host | db | Command | Time | State | Info |
+----+-------------+-----------------+------+---------+-------+--------------------------------------------------------+-----------------------------------+
| 4 | system user | | NULL | Connect | 34655 | executing | NULL |
| 7 | system user | | NULL | Connect | 34545 | Slave has read all relay log; waiting for more updates | NULL |
| 8 | system user | | NULL | Connect | 34655 | Waiting for an event from Coordinator | NULL |
| 9 | system user | | NULL | Connect | 34655 | Waiting for an event from Coordinator | NULL |
| 10 | system user | | NULL | Connect | 34655 | Waiting for an event from Coordinator | NULL |
| 11 | system user | | NULL | Connect | 34655 | Waiting for an event from Coordinator | NULL |
| 12 | system user | | NULL | Connect | 34655 | Waiting for an event from Coordinator | NULL |
| 13 | system user | | NULL | Connect | 34655 | Waiting for an event from Coordinator | NULL |
| 14 | system user | | NULL | Connect | 34655 | Waiting for an event from Coordinator | NULL |
| 15 | system user | | NULL | Connect | 34654 | Waiting for an event from Coordinator | NULL |
| 16 | system user | | NULL | Connect | 34654 | Waiting for an event from Coordinator | NULL |
| 17 | system user | | NULL | Connect | 34654 | Waiting for an event from Coordinator | NULL |
| 18 | system user | | NULL | Connect | 34654 | Waiting for an event from Coordinator | NULL |
| 19 | system user | | NULL | Connect | 34654 | Waiting for an event from Coordinator | NULL |
| 20 | system user | | NULL | Connect | 34654 | Waiting for an event from Coordinator | NULL |
| 21 | system user | | NULL | Connect | 34654 | Waiting for an event from Coordinator | NULL |
| 22 | system user | | NULL | Connect | 34654 | Waiting for an event from Coordinator | NULL |
| 23 | system user | | NULL | Connect | 34654 | Waiting for an event from Coordinator | NULL |
| 45 | root | localhost | test | Query | 34 | User sleep | select sleep(200),id from sbtest1 |
| 55 | root | localhost | test | Query | 0 | starting | show processlist |
| 59 | powdba | 127.0.0.1:35812 | NULL | Query | 4 | Waiting for table flush | FLUSH TABLES WITH READ LOCK |
+----+-------------+-----------------+------+---------+-------+--------------------------------------------------------+-----------------------------------+
21 rows in set (0.00 sec)

###一直处于等待中
root@localhost : test 08:00:33> select * from sbtest1 limit 10;

小结

  • 设置--ftwrl-wait-timeout=20时。在执行flush table with read lock时,如果堵塞SQL此时执行时间大于参数ftwrl-wait-timeout(即10s) ,xtrabackup线程在执行flush table with read lock时会超时退出。
  • 设置--ftwrl-wait-timeout=20时。在执行flush table with read lock时,如果堵塞SQL此时执行时间小于参数ftwrl-wait-timeout(即10s) ,xtrabackup线程在执行flush table with read lock时不会超时退出,会一直备份,但是不影响业务运行。

风险

  • 当一条长查询刚开始执行,还没有到达ftwrl-wait-threshold指定的时间时,备份的FTWRL发送到了数据库,这时xtrabackup不会认为这条长查询是一条长查询。

解决方案2:设置kill-long-queries-timeout

参数解析

  • --kill-long-queries-timeout=SECONDS

    • 该选项指定innobackupex在执行FLUSH TABLES WITH READ LOCK时碰到阻塞其获得锁的查询时,等待该参数指定的秒数之后,如果仍然有查询在运行,则执行kill掉这些查询
    • 默认值为0,表示innobackupex 不启用尝试kill掉任何查询的功能
  • --kill-long-query-type=all|select

    • 该选项指定哪些类型的查询在指定的查询时间之后还没有执行完成时被kill掉,以释放阻塞加全局读锁的锁,默认值为all,有效值有:all和select
    • 执行该选项需要有process和super权限
    • 执行kill的超时时间由选项--kill-long-queries-timeout指定,默认值为0,表示禁止kill掉超时查询的功能

kill-long-queries-timeout=20

# /usr/local/xtrabackup-2.4/bin/innobackupex --defaults-file=/etc/mysql/my.cnf --user=powdba --password=abc123 --host=127.0.0.1 --kill-long-queries-timeout=20 /data/backup/
...
190319 20:07:43 Executing FLUSH TABLES WITH READ LOCK...
190319 20:07:43 Kill query timeout 20 seconds.
190319 20:07:44 >> log scanned up to (5875859480)
190319 20:07:45 >> log scanned up to (5875859480)
190319 20:07:46 >> log scanned up to (5875859480)
190319 20:07:47 >> log scanned up to (5875859480)
190319 20:07:48 >> log scanned up to (5875859480)
190319 20:07:49 >> log scanned up to (5875859480)
190319 20:07:50 >> log scanned up to (5875859480)
190319 20:07:51 >> log scanned up to (5875859480)
190319 20:07:52 >> log scanned up to (5875859480)
190319 20:07:53 >> log scanned up to (5875859480)
190319 20:07:54 >> log scanned up to (5875859480)
190319 20:07:55 >> log scanned up to (5875859480)
190319 20:07:56 >> log scanned up to (5875859480)
190319 20:07:57 >> log scanned up to (5875859480)
190319 20:07:58 >> log scanned up to (5875859480)
190319 20:07:59 >> log scanned up to (5875859480)
190319 20:08:00 >> log scanned up to (5875859480)
190319 20:08:01 >> log scanned up to (5875859480)
190319 20:08:02 >> log scanned up to (5875859480)
190319 20:08:03 Connecting to MySQL server host: 127.0.0.1, user: powdba, password: set, port: 3306, socket: /var/lib/mysql/data/sock/mysql.sock
190319 20:08:03 Killing query 45 (duration 61 sec): select sleep(200),id from sbtest1
190319 20:08:03 Kill query thread stopped
190319 20:08:03 Starting to backup non-InnoDB tables and files

root@localhost : test 08:07:01> select sleep(200),id from sbtest1;
ERROR 2013 (HY000): Lost connection to MySQL server during query
ERROR 2006 (HY000): MySQL server has gone away
No connection. Trying to reconnect...
Connection id: 64
Current database: test
...
备份期间其他会话
root@localhost : test 08:13:38> show processlist;
+----+-------------+-----------------+------+---------+-------+--------------------------------------------------------+-----------------------------------+
| Id | User | Host | db | Command | Time | State | Info |
+----+-------------+-----------------+------+---------+-------+--------------------------------------------------------+-----------------------------------+
| 4 | system user | | NULL | Connect | 35442 | executing | NULL |
| 7 | system user | | NULL | Connect | 35332 | Slave has read all relay log; waiting for more updates | NULL |
| 8 | system user | | NULL | Connect | 35442 | Waiting for an event from Coordinator | NULL |
| 9 | system user | | NULL | Connect | 35442 | Waiting for an event from Coordinator | NULL |
| 10 | system user | | NULL | Connect | 35442 | Waiting for an event from Coordinator | NULL |
| 11 | system user | | NULL | Connect | 35442 | Waiting for an event from Coordinator | NULL |
| 12 | system user | | NULL | Connect | 35442 | Waiting for an event from Coordinator | NULL |
| 13 | system user | | NULL | Connect | 35442 | Waiting for an event from Coordinator | NULL |
| 14 | system user | | NULL | Connect | 35442 | Waiting for an event from Coordinator | NULL |
| 15 | system user | | NULL | Connect | 35441 | Waiting for an event from Coordinator | NULL |
| 16 | system user | | NULL | Connect | 35441 | Waiting for an event from Coordinator | NULL |
| 17 | system user | | NULL | Connect | 35441 | Waiting for an event from Coordinator | NULL |
| 18 | system user | | NULL | Connect | 35441 | Waiting for an event from Coordinator | NULL |
| 19 | system user | | NULL | Connect | 35441 | Waiting for an event from Coordinator | NULL |
| 20 | system user | | NULL | Connect | 35441 | Waiting for an event from Coordinator | NULL |
| 21 | system user | | NULL | Connect | 35441 | Waiting for an event from Coordinator | NULL |
| 22 | system user | | NULL | Connect | 35441 | Waiting for an event from Coordinator | NULL |
| 23 | system user | | NULL | Connect | 35441 | Waiting for an event from Coordinator | NULL |
| 55 | root | localhost | test | Query | 0 | starting | show processlist |
| 70 | root | localhost | test | Query | 40 | User sleep | select sleep(200),id from sbtest1 |
| 71 | powdba | 127.0.0.1:35826 | NULL | Query | 8 | Waiting for table flush | FLUSH TABLES WITH READ LOCK |
+----+-------------+-----------------+------+---------+-------+--------------------------------------------------------+-----------------------------------+
21 rows in set (0.00 sec)

### 再次其他表数据时会堵塞,这是flush table的作用
root@localhost : test 08:13:40> select * from test limit 10;
+----+------+------+------+
| id | sid | mid | name |
+----+------+------+------+
| 1 | 86 | 0 | OU |
| 8 | 38 | 51 | Ai |
| 15 | 82 | 24 | AM |
| 22 | 68 | 47 | T7 |
| 29 | 83 | 72 | tj |
| 36 | 48 | 35 | gw |
| 43 | 89 | 73 | uJ |
| 50 | 0 | 63 | aX |
| 57 | 42 | 83 | iU |
| 64 | 43 | 7 | 3b |
+----+------+------+------

### 再次查找该表数据时会堵塞
root@localhost : test 08:11:58> select * from sbtest1 limit 10;

小结

  • 当使用xtrabackup参数kill-long-queries-timeout时,会在超过等待时间后kill掉堵塞的SQL
  • 在堵塞flush table with read lock期间,可以访问非堵塞SQL所在的表,不能访问堵塞SQL所在的表。

风险

  • 这种方式对堵塞SQL影响很大,会强制kill掉,对业务会有一定的影响

解决方案3:safe-slave-backup-timeout

参数解析

  • --safe-slave-backup-timeout=SECONDS

    • 该选项在启用 --safe-slave-backup选项时,等待状态变量Slave_open_temp_tables 转变为0值的等待时间,默认值为300秒,如果超过指定的时间状态变量Slave_open_temp_tables 还未转变为0值,则启动复制并报错终止备份

风险

  • 这种方式通过关闭sql thread的方式避免死锁的发生,但是会加大主从的延迟

解决方案4:使用脚本kill xtrabackup query的线程

image

风险:

kill query时,如果存在flush tables with read lock被一条SQL阻塞,那么kill query后,这个SQL不进行完或者被kill掉,那么之后对这条SQL所对应的表做的任何操作都会被阻塞,需要等待这条SQL执行完毕或者被kill掉或者等待lock_wait_timeout时间。

相关实践学习
如何在云端创建MySQL数据库
开始实验后,系统会自动创建一台自建MySQL的 源数据库 ECS 实例和一台 目标数据库 RDS。
全面了解阿里云能为你做什么
阿里云在全球各地部署高效节能的绿色数据中心,利用清洁计算为万物互联的新世界提供源源不断的能源动力,目前开服的区域包括中国(华北、华东、华南、香港)、新加坡、美国(美东、美西)、欧洲、中东、澳大利亚、日本。目前阿里云的产品涵盖弹性计算、数据库、存储与CDN、分析与搜索、云通信、网络、管理与监控、应用服务、互联网中间件、移动服务、视频服务等。通过本课程,来了解阿里云能够为你的业务带来哪些帮助     相关的阿里云产品:云服务器ECS 云服务器 ECS(Elastic Compute Service)是一种弹性可伸缩的计算服务,助您降低 IT 成本,提升运维效率,使您更专注于核心业务创新。产品详情: https://www.aliyun.com/product/ecs
目录
相关文章
|
19天前
|
SQL 关系型数据库 MySQL
遇到mysql数据库死锁,你会怎么排查?
遇到mysql数据库死锁,你会怎么排查?
53 0
|
26天前
|
存储 SQL 关系型数据库
深入MySQL锁机制:原理、死锁解决及Java防范技巧
深入MySQL锁机制:原理、死锁解决及Java防范技巧
|
26天前
|
SQL JavaScript 关系型数据库
Mysql索引不当引发死锁问题
本文通过真实案例解析了MySQL在高并发环境下出现死锁的问题。数据库表`t_award`包含多个索引,但在执行特定SQL语句时遭遇索引失效,导致更新操作变慢并引发死锁。分析发现,联合索引`(pool_id, identifier, status, is_redeemed)`因`identifier`允许为空值而导致索引部分失效。此外,`pool_id`上的普通索引产生的间隙锁在高并发下加剧了死锁风险。为解决此问题,文中提出了调整索引顺序至`(pool_id, status, is_redeemed, identifier)`等方案来优化索引使用,进而减轻死锁现象。
|
1月前
|
Oracle 关系型数据库 MySQL
Mysql和Oracle数据库死锁查看以及解决
【8月更文挑战第11天】本文介绍了解决MySQL与Oracle数据库死锁的方法。MySQL可通过`SHOW ENGINE INNODB STATUS`查看死锁详情,并自动回滚一个事务解除死锁;也可手动KILL事务。Oracle则通过查询V$LOCK与V$SESSION视图定位死锁,并用`ALTER SYSTEM KILL SESSION`命令终止相关会话。预防措施包括遵循ACID原则、优化索引及拆分大型事务。
|
17天前
|
监控 关系型数据库 MySQL
MySQL死锁是什么
【8月更文挑战第26天】MySQL死锁是指两个或多个事务在执行过程中,因争夺锁资源而造成的相互等待的现象,若无外力干涉,它们都将无法继续执行。这种相互等待的情况会导致整个系统陷入停滞状态,影响数据库的性能和稳定性。
35 0
|
2月前
|
SQL 存储 关系型数据库
细说 MySQL 死锁
【7月更文挑战第26天】MySQL 死锁
28 4
|
2月前
|
SQL 存储 关系型数据库
细说 MySQL 死锁
死锁检查在MySQL 8.0中涉及三个主要步骤:构造锁等待图、初始化事务权重和提升权重。首先,当事务进入锁等待状态时,信息会被记录到内存中的`waiting_threads`,形成快照数组。接着,对这个数组进行排序,构造出锁等待图,表示事务间的等待关系。然后,初始化所有等待事务的权重为1,如果一个事务在其他事务等待后进入等待,其权重会被提升,以避免长时间等待。最后,根据锁等待图,提升那些同时阻塞其他事务的权重,但不包括参与死锁的事务。权重更新后,死锁检查线程将依据这些信息来检测和解决死锁。
64 15
|
2月前
|
SQL 算法 关系型数据库
(十)全解MySQL之死锁问题分析、事务隔离与锁机制的底层原理剖析
经过《MySQL锁机制》、《MySQL-MVCC机制》两篇后,咱们已经大致了解MySQL中处理并发事务的手段,不过对于锁机制、MVCC机制都并未与之前说到的《MySQL事务机制》产生关联关系,同时对于MySQL锁机制的实现原理也未曾剖析,因此本篇作为事务、锁、MVCC这三者的汇总篇,会在本章中补全之前空缺的一些细节,同时也会将锁、MVCC机制与事务机制之间的关系彻底理清楚。
|
23天前
|
关系型数据库 MySQL 机器人
【MySQL】两个脚本自动化搞定 MySQL 备份恢复--XtraBackup
【MySQL】两个脚本自动化搞定 MySQL 备份恢复--XtraBackup
|
2月前
|
缓存 监控 关系型数据库
MySQL PXC 集群死锁分析案例
前不久一个系统死锁导致部分业务受到影响,今次补上详细的节点日志分析过程。
49 1

热门文章

最新文章