一、死锁现象描述
1.1 基本环境信息
1、数据库版本以及隔离级别
mysql>select version();
+---------------------+
| version() |
+---------------------+
| 5.7.26-log |
+---------------------+
返回行数:[1],耗时:3 ms.
mysql>show variables like '%iso%';
+-------------------------+-----------------+
| Variable_name | Value |
+-------------------------+-----------------+
| transaction_isolation | READ-COMMITTED |
| tx_isolation | READ-COMMITTED |
+-------------------------+-----------------+
返回行数:[2],耗时:7 ms.
2、发生死锁表结构信息
该表结构贴出了本地测试的表结构,与线上保持一致
mysql> show create table deadlock_test\G
*************************** 1. row ***************************
Table: deadlock_test
Create Table: CREATE TABLE `deadlock_test` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`info_uuid` varchar(50) DEFAULT NULL,
`view_date` date DEFAULT NULL,
`view_count` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `idx_view_date` (`view_date`),
KEY `idx_info_uuid` (`info_uuid`)
) ENGINE=InnoDB AUTO_INCREMENT=908189 DEFAULT CHARSET=utf8
1 row in set (0.00 sec)
3、死锁日志
死锁日志中对部分信息做了替换,不影响问题分析。
------------------------
LATEST DETECTED DEADLOCK
------------------------
2020-05-14 07:25:04 0x7f588e4f9700
*** (1) TRANSACTION:
TRANSACTION 51481020, ACTIVE 0 sec fetching rows
mysql tables in use 3, locked 3
LOCK WAIT 15 lock struct(s), heap size 1136, 4 row lock(s)
MySQL thread id 2557156, OS thread handle 140017821800192, query id 538638931 192.168.22.51 update_user updating
UPDATE deadlock_test
SET VIEW_COUNT = VIEW_COUNT + 1
WHERE INFO_UUID = '1a973b987d1d11eaa8c07cd30ad3aa18'
AND VIEW_DATE = '2020-05-14'
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 1879 page no 9047 n bits 304 index PRIMARY of table `db1`.`deadlock_test` trx id 51481020 lock_mode X locks rec but not gap waiting
Record lock, heap no 22 PHYSICAL RECORD: n_fields 6; compact format; info bits 0
0: len 4; hex 800dbba4; asc ;;
1: len 6; hex 0000031189af; asc ;;
2: len 7; hex 5b000001552b82; asc [ U+ ;;
3: len 30; hex 323763336366316536336661313165613862396137636433306165303037; asc 27c3cf1e63fa11ea8b9a7cd30ae007; (total 32 bytes);
4: len 3; hex 8fc8ae; asc ;;
5: len 4; hex 80000222; asc ";;
*** (2) TRANSACTION:
TRANSACTION 51481021, ACTIVE 0 sec fetching rows
mysql tables in use 3, locked 3
34 lock struct(s), heap size 3520, 3 row lock(s)
MySQL thread id 2557158, OS thread handle 140018321430272, query id 538638932 192.168.22.51 update_user updating
UPDATE deadlock_test
SET VIEW_COUNT = VIEW_COUNT + 1
WHERE INFO_UUID = '27c3cf1e63fa11ea8b9a7cd30ae0074a'
AND VIEW_DATE = '2020-05-14'
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 1879 page no 9047 n bits 304 index PRIMARY of table `db1`.`deadlock_test` trx id 51481021 lock_mode X locks rec but not gap
Record lock, heap no 22 PHYSICAL RECORD: n_fields 6; compact format; info bits 0
0: len 4; hex 800dbba4; asc ;;
1: len 6; hex 0000031189af; asc ;;
2: len 7; hex 5b000001552b82; asc [ U+ ;;
3: len 30; hex 323763336366316536336661313165613862396137636433306165303037; asc 27c3cf1e63fa11ea8b9a7cd30ae007; (total 32 bytes);
4: len 3; hex 8fc8ae; asc ;;
5: len 4; hex 80000222; asc ";;
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 1879 page no 8442 n bits 1272 index idx_view_date of table `db1`.`deadlock_test` trx id 51481021 lock_mode X locks rec but not gap waiting
Record lock, heap no 853 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
0: len 3; hex 8fc8ae; asc ;;
1: len 4; hex 800dbba4; asc ;;
*** WE ROLL BACK TRANSACTION (1)
二、原因分析
2.1 死锁日志分析
1、事务1
##业务SQL id=900006
UPDATE deadlock_test
SET VIEW_COUNT = VIEW_COUNT + 1
WHERE INFO_UUID = '1a973b987d1d11eaa8c07cd30ad3aa18'
AND VIEW_DATE = '2020-05-14'
##等待锁
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
-- 等待deadlock_test表在primary key上的行锁
RECORD LOCKS space id 1879 page no 9047 n bits 304 index PRIMARY of table `db1`.`deadlock_test` trx id 51481020 lock_mode X locks rec but not gap waiting
Record lock, heap no 22 PHYSICAL RECORD: n_fields 6; compact format; info bits 0
-- 锁定的行锁记录为 (id=900004)
0: len 4; hex 800dbba4; asc ;;
1: len 6; hex 0000031189af; asc ;;
2: len 7; hex 5b000001552b82; asc [ U+ ;;
##持有锁
-- 从死锁发生的情况分析,说明该事务持有了(view_date='2020-05-14',id=900004)该记录的锁。
##MySQL处理方案
回滚
2、事务2
##业务SQL id=900004
UPDATE deadlock_test
SET VIEW_COUNT = VIEW_COUNT + 1
WHERE INFO_UUID = '27c3cf1e63fa11ea8b9a7cd30ae0074a'
AND VIEW_DATE = '2020-05-14'
##持有锁
*** (2) HOLDS THE LOCK(S):
-- 持有deadlock_test表在parimary key上的行锁
RECORD LOCKS space id 1879 page no 9047 n bits 304 index PRIMARY of table `db1`.`deadlock_test` trx id 51481021 lock_mode X locks rec but not gap
Record lock, heap no 22 PHYSICAL RECORD: n_fields 6; compact format; info bits 0
-- 持有的行锁记录为:(id=900004)
0: len 4; hex 800dbba4; asc ;;
1: len 6; hex 0000031189af; asc ;;
2: len 7; hex 5b000001552b82; asc [ U+ ;;
##等待锁
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
-- 等待deadlock_test表在idx_view_date上的行锁
RECORD LOCKS space id 1879 page no 8442 n bits 1272 index idx_view_date of table `db1`.`deadlock_test` trx id 51481021 lock_mode X locks rec but not gap waiting
Record lock, heap no 853 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
-- 等待的行锁记录为:(view_date='2020-05-14',id=900004)
0: len 3; hex 8fc8ae; asc ;;
1: len 4; hex 800dbba4; asc ;;
##MySQL处理方案
不回滚,执行成功
3、发生死锁SQL以及数据分布分析
1)业务SQL执行计划
从SQL执行的执行计划中我们可以看到,SQL使用的是index_merge,也就是说MySQL会分别使用idx_info_uuid、index_view_date,然后取其交集。
mysql>explain
select * from deadlock_test WHERE INFO_UUID = '27c3cf1e63fa11ea8b9a7cd30ae0074a'
AND VIEW_DATE = '2020-05-14'
+--------------+-----------------------+-------------------------------+----------------------+----------------+-------------------------------+-------------------------------+-------------------+---------------+----------------+--------------------+-------------------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+--------------+-----------------------+-------------------------------+----------------------+----------------+-------------------------------+-------------------------------+-------------------+---------------+----------------+--------------------+-------------------------------------------------------------+
| 1 | SIMPLE | deadlock_test | | index_merge | idx_info_uuid ,idx_view_date | idx_info_uuid ,idx_view_date | 153,4 | | 1 | 100 | Using intersect(idx_info_uuid ,idx_view_date); Using where |
+--------------+-----------------------+-------------------------------+----------------------+----------------+-------------------------------+-------------------------------+-------------------+---------------+----------------+--------------------+-------------------------------------------------------------+
返回行数:[1],耗时:5 ms.
2)业务SQL数据分布
从数据分布来看其实INFO_UUID字段的过滤性已经比较好了,但是MySQL仍然选择了使用INFO_UUID字段与VIEW_DATE两个字段的单列索引去做index merge操作。并且从这里我们可以看到,如果INFO_UUID与VIEW_DATE如果复合索引做条件过滤,基本可以定位到只有1条记录。
mysql>select count(*) from deadlock_test WHERE INFO_UUID = '27c3cf1e63fa11ea8b9a7cd30ae0074a'
+--------------------+
| count(*) |
+--------------------+
| 28 |
+--------------------+
返回行数:[1],耗时:4 ms.
mysql>select count(*) from deadlock_test WHERE INFO_UUID = '1a973b987d1d11eaa8c07cd30ad3aa18'
+--------------------+
| count(*) |
+--------------------+
| 9 |
+--------------------+
返回行数:[1],耗时:4 ms.
mysql>select count(*) from deadlock_test WHERE VIEW_DATE = '2020-05-14'
+--------------------+
| count(*) |
+--------------------+
| 8224 |
+--------------------+
返回行数:[1],耗时:6 ms.
mysql>select * from deadlock_test WHERE INFO_UUID = '1a973b987d1d11eaa8c07cd30ad3aa18' AND VIEW_DATE = '2020-05-14'
+--------------+----------------------------------+---------------------+----------------------+
| ID | INFO_UUID | VIEW_DATE | VIEW_COUNT |
+--------------+----------------------------------+---------------------+----------------------+
| 900006 | 1a973b987d1d11eaa8c07cd30ad3aa18 | 2020-05-14 | xxx |
+--------------+----------------------------------+---------------------+----------------------+
返回行数:[1],耗时:4 ms.
mysql>select * from deadlock_test WHERE INFO_UUID = '27c3cf1e63fa11ea8b9a7cd30ae0074a' AND VIEW_DATE = '2020-05-14'
+--------------+----------------------------------+---------------------+----------------------+
| ID | INFO_UUID | VIEW_DATE | VIEW_COUNT |
+--------------+----------------------------------+---------------------+----------------------+
| 900004 | 27c3cf1e63fa11ea8b9a7cd30ae0074a | 2020-05-14 | xxx |
+--------------+----------------------------------+---------------------+----------------------+
返回行数:[1],耗时:4 ms.
4、初步的分析判断
1)通过RDS的SQL审计日志,我们发现以上两个业务SQL并没有放在事务里去执行;
2)事务1中执行的业务SQL中,需要变更的记录为id=900006,死锁日志中我们却发现需要等待primary key索引中id=900004的锁;
3)事务2中已经持有了primary key中id=900004的锁,他们却仍然需要等待idx_view_date上(view_date,id)的锁;
4)上述2、3两点都是比较奇怪的,但是结合SQL执行计划中index_merge的影响,我们推测SQL在执行中是否需要对idx_info_uuid和idx_view_date进行扫描加锁,一定程度上扩大了锁范围,从而导致了事务1事务2中业务SQL并发请求时出现了死锁的情况
三、测试验证
3.1 数据准备
CREATE TABLE `deadlock_test` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`info_uuid` varchar(50) DEFAULT NULL,
`view_date` date DEFAULT NULL,
`view_count` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `idx_view_date` (`view_date`),
KEY `idx_info_uuid` (`info_uuid`)
) ENGINE=InnoDB AUTO_INCREMENT=30000 DEFAULT CHARSET=utf8
mysql> select * from deadlock_test where info_uuid = 'aa381' and VIEW_DATE = '2020-05-04';
+-------+-----------+------------+------------+
| id | info_uuid | view_date | view_count |
+-------+-----------+------------+------------+
| 13908 | aa381 | 2020-05-04 | 77 |
+-------+-----------+------------+------------+
1 row in set (0.02 sec)
mysql> select * from deadlock_test where info_uuid = 'aa269' and VIEW_DATE = '2020-05-04';
+-------+-----------+------------+------------+
| id | info_uuid | view_date | view_count |
+-------+-----------+------------+------------+
| 29033 | aa269 | 2020-05-04 | 5 |
+-------+-----------+------------+------------+
1 row in set (0.05 sec)
3.2 模拟index_merge下加锁的情况
Session A | Session B |
---|---|
begin; | - |
select id from deadlock_test force index(idx_info_uuid) where info_uuid = 'aa269' and VIEW_DATE = '2020-05-04' for update; | begin; |
· | select id from deadlock_test force index(idx_info_uuid) where info_uuid = 'aa381' and VIEW_DATE = '2020-05-04' for update; |
select id from deadlock_test force index(idx_view_date) where info_uuid = 'aa269' and VIEW_DATE = '2020-05-04' for update; --被hang住 | - |
· | select id from deadlock_test force index(idx_view_date) where info_uuid = 'aa381' and VIEW_DATE = '2020-05-04' for update; --死锁发生 |
3.3 死锁日志查看
------------------------
LATEST DETECTED DEADLOCK
------------------------
2020-05-14 22:31:06 0x70000fc2f000
*** (1) TRANSACTION:
TRANSACTION 39033, ACTIVE 30 sec fetching rows
mysql tables in use 1, locked 1
LOCK WAIT 58 lock struct(s), heap size 8400, 4 row lock(s)
MySQL thread id 5145, OS thread handle 123145565908992, query id 183203 localhost root Sending data
select id from deadlock_test force index(idx_view_date) where info_uuid = 'aa269' and VIEW_DATE = '2020-05-04' for update
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 291 page no 65 n bits 488 index PRIMARY of table `db1`.`deadlock_test` trx id 39033 lock_mode X locks rec but not gap waiting
Record lock, heap no 324 PHYSICAL RECORD: n_fields 6; compact format; info bits 0
0: len 4; hex 80003654; asc 6T;;
1: len 6; hex 000000009857; asc W;;
2: len 7; hex 2a000001ed2e34; asc * .4;;
3: len 5; hex 6161333831; asc aa381;;
4: len 3; hex 8fc8a4; asc ;;
5: len 4; hex 8000004d; asc M;;
*** (2) TRANSACTION:
TRANSACTION 39034, ACTIVE 18 sec fetching rows
mysql tables in use 1, locked 1
58 lock struct(s), heap size 8400, 3 row lock(s)
MySQL thread id 5153, OS thread handle 123145566744576, query id 183204 localhost root Sending data
select id from deadlock_test force index(idx_view_date) where info_uuid = 'aa381' and VIEW_DATE = '2020-05-04' for update
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 291 page no 65 n bits 488 index PRIMARY of table `db1`.`deadlock_test` trx id 39034 lock_mode X locks rec but not gap
Record lock, heap no 324 PHYSICAL RECORD: n_fields 6; compact format; info bits 0
0: len 4; hex 80003654; asc 6T;;
1: len 6; hex 000000009857; asc W;;
2: len 7; hex 2a000001ed2e34; asc * .4;;
3: len 5; hex 6161333831; asc aa381;;
4: len 3; hex 8fc8a4; asc ;;
5: len 4; hex 8000004d; asc M;;
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 291 page no 37 n bits 1272 index idx_view_date of table `db1`.`deadlock_test` trx id 39034 lock_mode X locks rec but not gap waiting
Record lock, heap no 692 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
0: len 3; hex 8fc8a4; asc ;;
1: len 4; hex 80003654; asc 6T;;
*** WE ROLL BACK TRANSACTION (2)
3.4 锁等待分析
为了分析锁等待的情况,我们把innodb_deadlock_detect参数关闭,MySQL不做死锁检测,这时候我们就可以查看相应的一些锁等待信息
1、Session A 执行通过idx_view_date索引扫描被hang住时刻的锁等待信息
可以看到Session A的第二步已经被SessionB的第一步hang住,此时需要获取的锁资源就是primary key索引上id=13908的行锁。
mysql> SELECT r.trx_id waiting_trx_id, r.trx_mysql_thread_id waiting_thread, r.trx_query waiting_query, b.trx_id blocking_trx_id, b.trx_mysql_thread_id blocking_thread, b.trx_query blocking_query FROM information_schema.innodb_lock_waits w INNER JOIN information_schema.innodb_trx b ON b.trx_id = w.blocking_trx_id INNER JOIN information_schema.innodb_trx r ON r.trx_id = w.requesting_trx_id\G
*************************** 1. row ***************************
waiting_trx_id: 39036
waiting_thread: 5145
waiting_query: select id from deadlock_test force index(idx_view_date) where info_uuid = 'aa269' and VIEW_DATE = '2020-05-04' for update
blocking_trx_id: 39037
blocking_thread: 5153
blocking_query: NULL
1 row in set, 1 warning (0.00 sec)
mysql> select * from information_schema.innodb_lock_waits ; +-------------------+-------------------+-----------------+------------------+
| requesting_trx_id | requested_lock_id | blocking_trx_id | blocking_lock_id |
+-------------------+-------------------+-----------------+------------------+
| 39036 | 39184:291:65:324 | 39037 | 39185:291:65:324 |
+-------------------+-------------------+-----------------+------------------+
1 row in set, 1 warning (0.00 sec)
mysql> select * from information_Schema.innodb_locks;
+------------------+-------------+-----------+-----------+-----------------------+------------+------------+-----------+----------+-----------+
| lock_id | lock_trx_id | lock_mode | lock_type | lock_table | lock_index | lock_space | lock_page | lock_rec | lock_data |
+------------------+-------------+-----------+-----------+-----------------------+------------+------------+-----------+----------+-----------+
| 39184:291:65:324 | 39036 | X | RECORD | `db1`.`deadlock_test` | PRIMARY | 291 | 65 | 324 | 13908 |
| 39185:291:65:324 | 39037 | X | RECORD | `db1`.`deadlock_test` | PRIMARY | 291 | 65 | 324 | 13908 |
+------------------+-------------+-----------+-----------+-----------------------+------------+------------+-----------+----------+-----------+
2 rows in set, 1 warning (0.01 sec)
2、Session B 执行执行通过idx_view_date索引扫描被hang住时刻的锁等待信息
可以看到,两个事务锁与被锁形成了一个死锁闭环,若MySQL死锁检测打开的情况下,会立刻对其中权重较低的事务进行回滚。Session B通过idx_view_date扫描时需要获取的锁资源为(1034404, 13908),但是该资源已经被Session A持有。
mysql> SELECT r.trx_id waiting_trx_id, r.trx_mysql_thread_id waiting_thread, r.trx_query waiting_query, b.trx_id blocking_trx_id, b.trx_mysql_thread_id blocking_thread, b.trx_query blocking_query FROM information_schema.innodb_lock_waits w INNER JOIN information_schema.innodb_trx b ON b.trx_id = w.blocking_trx_id INNER JOIN information_schema.innodb_trx r ON r.trx_id = w.requesting_trx_id\G
*************************** 1. row ***************************
waiting_trx_id: 39037
waiting_thread: 5153 (rollback)
waiting_query: select id from deadlock_test force index(idx_view_date) where info_uuid = 'aa381' and VIEW_DATE = '2020-05-04' for update
blocking_trx_id: 39036
blocking_thread:
blocking_query: select id from deadlock_test force index(idx_view_date) where info_uuid = 'aa269' and VIEW_DATE = '2020-05-04' for update
*************************** 2. row ***************************
waiting_trx_id: 39036
waiting_thread: 5145
waiting_query: select id from deadlock_test force index(idx_view_date) where info_uuid = 'aa269' and VIEW_DATE = '2020-05-04' for update
blocking_trx_id: 39037
blocking_thread: 5153
blocking_query: select id from deadlock_test force index(idx_view_date) where info_uuid = 'aa381' and VIEW_DATE = '2020-05-04' for update
2 rows in set, 1 warning (0.00 sec)
mysql> select * from information_schema.innodb_lock_waits ; +-------------------+-------------------+-----------------+------------------+
| requesting_trx_id | requested_lock_id | blocking_trx_id | blocking_lock_id |
+-------------------+-------------------+-----------------+------------------+
| 39037 | 39037:291:37:692 | 39036 | 39036:291:37:692 |
| 39036 | 39036:291:65:324 | 39037 | 39037:291:65:324 |
+-------------------+-------------------+-----------------+------------------+
2 rows in set, 1 warning (0.00 sec)
mysql> select * from information_Schema.innodb_locks;
+------------------+-------------+-----------+-----------+-----------------------+---------------+------------+-----------+----------+----------------+
| lock_id | lock_trx_id | lock_mode | lock_type | lock_table | lock_index | lock_space | lock_page | lock_rec | lock_data |
+------------------+-------------+-----------+-----------+-----------------------+---------------+------------+-----------+----------+----------------+
| 39037:291:37:692 | 39037 | X | RECORD | `db1`.`deadlock_test` | idx_view_date | 291 | 37 | 692 | 1034404, 13908 |
| 39036:291:37:692 | 39036 | X | RECORD | `db1`.`deadlock_test` | idx_view_date | 291 | 37 | 692 | 1034404, 13908 |
| 39036:291:65:324 | 39036 | X | RECORD | `db1`.`deadlock_test` | PRIMARY | 291 | 65 | 324 | 13908 |
| 39037:291:65:324 | 39037 | X | RECORD | `db1`.`deadlock_test` | PRIMARY | 291 | 65 | 324 | 13908 |
+------------------+-------------+-----------+-----------+-----------------------+---------------+------------+-----------+----------+----------------+
4 rows in set, 1 warning (0.00 sec)
3.5 整体分析处理
1、案例分析
Session A:(5145)
1) select id from deadlock_test force index(idx_info_uuid) where info_uuid = 'aa269' and VIEW_DATE = '2020-05-04' for update; 锁资源:idx_info_uuid索引的(idx_info_uuid='aa269',id=29033)获取成功,primary索引的(id=29033)获取成功
3) select id from deadlock_test force index(idx_view_date) where info_uuid = 'aa269' and VIEW_DATE = '2020-05-04' for update; 锁资源:idx_view_date索引的(view_date='2020-05-04',id=13908)获取成功,primary索引的(id=13908)获取 被Session B的2)被hang住 --view_date索引从头到尾扫描,扫描到该记录时被hang住。
Session B:(5153) -- rollback
2) select id from deadlock_test force index(idx_info_uuid) where info_uuid = 'aa381' and VIEW_DATE = '2020-05-04' for update; 锁资源:idx_info_uuid索引的(idx_info_uuid='aa381',id=13908)获取成功,primary索引的(id=13908)获取成功
4) select id from deadlock_test force index(idx_view_date) where info_uuid = 'aa381' and VIEW_DATE = '2020-05-04' for update; 锁资源:扫描idx_view_date索引的(view_date='2020-05-04',id=13908)获取被SessionA的3)hang住 --view_date索引从头到尾扫描,扫描到该记录时被hang住
2、处理方法
基于以上的分析,我们可以推断MySQL在使用index_merge进行扫描加锁的期间,一定程度上会扩大锁的范围。为了避免MySQL使用index_merge,我们可以将其调整为复合索引来进行优化。