一 前言
死锁,其实是一个很有意思也很有挑战的技术问题,大概每个DBA都会在工作过程中遇见。关于死锁我会持续写一个系列的案例分析,希望能够对想了解死锁的朋友有所帮助。本文源于我们的生产案例:并发申请gap锁导致的死锁案例,与之前的 死锁案例一不同,本案例是因为RR模式下两个事务中的sql可以获取同一个gap锁,导致对方事务的insert 相互等待,导致死锁的。
二 案例分析
2.1 测试环境准备
Percona server 5.6.24 事务隔离级别为RR
CREATE TABLE `t4` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT ,
`kdt_id` int(11) unsigned NOT NULL ,
`admin_id` int(11) unsigned NOT NULL ,
`biz` varchar(20) NOT NULL DEFAULT '1' ,
`role_id` int(11) unsigned NOT NULL ,
`shop_id` int(11) unsigned NOT NULL DEFAULT '0' ,
`operator` varchar(20) NOT NULL DEFAULT '0' ,
`operator_id` int(11) NOT NULL DEFAULT '0' ,
`create_time` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT '创建时间',
`update_time` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT '更新时间',
PRIMARY KEY (`id`),
UNIQUE KEY `uniq_kid_aid_biz_rid` (`kdt_id`,`admin_id`,`role_id`,`biz`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8;
INSERT INTO `t4` (`id`, `kdt_id`, `admin_id`, `biz`, `role_id`, `shop_id`, `operator`, `operator_id`, `create_time`, `update_time`)
VALUES
(1,10,1,'retail',1,0,'0',0,'2017-05-09 15:55:26','2017-05-09 15:55:26'),
(2,20,1,'retail',1,0,'0',0,'2017-05-09 15:55:40','2017-05-09 15:55:40'),
(3,30,1,'retail',1,0,'0',0,'2017-05-09 15:55:55','2017-05-09 15:55:55'),
(4,40,1,'retail',1,0,'0',0,'2017-05-09 15:56:06','2017-05-09 15:56:06'),
(5,50,1,'retail',1,0,'0',0,'2017-05-09 15:56:16','2017-05-09 15:56:16');
2.2 本测试案例场景是两个事务删除不存的行,然后在insert记录。
T2 |
T1 |
test [RW] 02:50:27 >begin; Query OK, 0 rows affected (0.00 sec) |
test [RW] 02:50:27 >begin; Query OK, 0 rows affected (0.00 sec) |
test [RW] 02:50:34 >delete from t4 where kdt_id = 15 and admin_id = 1 and biz = 'retail' and role_id = '1'; |
|
test [RW] 02:50:41 >delete from t4 where kdt_id = 18 and admin_id = 2 and biz = 'retail' and role_id = '1'; |
|
test [RW] 02:50:43 >insert into t4(`kdt_id`, `admin_id`, `biz`, `role_id`, `shop_id`, `operator`, `operator_id`, `create_time`, `update_time`) -> VALUES('18', '2', 'retail', '2', '0', '0', '0', CURRENT_TIMESTAMP,CURRENT_TIMESTAMP); |
|
test [RW] 02:51:02 >INSERT INTO t4(`kdt_id`, `admin_id`, `biz`, `role_id`, `shop_id`, `operator`, `operator_id`, `create_time`, `update_time`) -> VALUES ('15', '1', 'retail', '2', '0', '0', '0', CURRENT_TIMESTAMP, CURRENT_TIMESTAMP); ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction |
2.3 死锁日志
------------------------
LATEST DETECTED DEADLOCK
------------------------
2017-09-11 14:51:03 7f78eaf25700
*** (1) TRANSACTION:
TRANSACTION 462308535, ACTIVE 20 sec inserting
mysql tables in use 1, locked 1
LOCK WAIT 3 lock struct(s), heap size 360, 2 row lock(s), undo log entries 1
MySQL thread id 3584515, OS thread handle 0x7f78ea5f5700, query id 780258123 localhost root update
insert into t4(`kdt_id`, `admin_id`, `biz`, `role_id`, `shop_id`, `operator`, `operator_id`, `create_time`, `update_time`)
VALUES('18', '2', 'retail', '2', '0', '0', '0', CURRENT_TIMESTAMP, CURRENT_TIMESTAMP)
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 225 page no 4 n bits 72 index `uniq_kid_aid_biz_rid` of table `test`.`t4` trx id 462308535 lock_mode X locks gap before rec insert intention waiting
*** (2) TRANSACTION:
TRANSACTION 462308534, ACTIVE 29 sec inserting, thread declared inside InnoDB 5000
mysql tables in use 1, locked 1
3 lock struct(s), heap size 360, 2 row lock(s), undo log entries 1
MySQL thread id 3584572, OS thread handle 0x7f78eaf25700, query id 780258153 localhost root update
INSERT INTO t4(`kdt_id`, `admin_id`, `biz`, `role_id`, `shop_id`, `operator`, `operator_id`, `create_time`, `update_time`)
VALUES ('15', '1', 'retail', '2', '0', '0', '0', CURRENT_TIMESTAMP, CURRENT_TIMESTAMP)
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 225 page no 4 n bits 72 index `uniq_kid_aid_biz_rid` of table `test`.`t4` trx id 462308534 lock_mode X locks gap before rec
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 225 page no 4 n bits 72 index `uniq_kid_aid_biz_rid` of table `test`.`t4` trx id 462308534 lock_mode X locks gap before rec insert intention waiting
*** WE ROLL BACK TRANSACTION (2)
2.4 死锁日志分析 首先根据《死锁案例一》 和《一个最不可思议的MySQL死锁分析》中强调 delete 不存在的记录是要加上GAP锁,事务日志中显示Lock_mode X wait . a. T2 delete from t4 where kdt_id = 15 and admin_id = 1 and biz = 'retail' and role_id = '1'; 符合条件的记录不存在,导致T2 先持有了(lock_mode X locks gap before rec) 锁住[(2,20,1,'retail',1,0)-(3,30,1,'retail',1,0)]的区间 ,防止符合条件的记录插入。 b. T1的delete 于T2的delete一样 同样申请了 (lock_mode X locks gap before rec) 锁住[(2,20,1,'retail',1,0)-(3,30,1,'retail',1,0)]的区间 。
It is also worth noting here that conflicting locks can be held on a gap by different transactions.
For example, transaction A can hold a shared gap lock (gap S-lock) on a gap
while transaction B holds an exclusive gap lock (gap X-lock) on the same gap.
The reason conflicting gap locks are allowed is that if a record is purged from an index, the gap locks held on the record by different transactions must be merged.
c. T1 的insert 语句申请插入意向锁,但是插入意向锁和T2持有的X GAP (lock_mode X locks gap before rec) 冲突,故等待T2中的GAP 锁释放。
Gap locks in InnoDB are “purely inhibitive”, which means they only stop other
transactions from inserting to the gap.
They do not prevent different transactions from taking gap locks on the same gap.
Thus, a gap X-lock has the same effect as a gap S-lock.
d. T2 的insert 语句申请插入意向锁,但是插入意向锁和T1持有 X GAP (lock_mode X locks gap before rec) 冲突,故等待T1中的GAP 锁释放。 T1(INSERT )等待T2(DELETE),T2(INSERT)等待T1(DELETE) 故而循环等待,出现死锁。 有兴趣的读者朋友可以测试一下 delete 存在记录的场景。
2.6 如何解决呢?
a 先select 检查一下看看是否存在,然后在删除。这里也存在两个或者多个会话并发执行同一个select where条件的,这里需要开发同学做处理。
b 使用insert into on deuplicate key语法不存在则插入,而不是先删除,再插入。
三 小结
RR事务隔离级别和GAP锁是导致死锁的常见原因,但是业务逻辑设计不合理也会出发死锁,本文的案例通过修改业务逻辑最终将死锁解决。