MYSQL INNODB replace into 死锁 及 next key lock 浅析
2017-06-29
2818
简介:
原创:全文带入了大量自我认知和理解,可能错误,因为水平有限,但是代表我努力分析过。一、问题提出问题是由姜大师提出的、问题如下:表:mysql> show create table c \G*************************** 1.
原创:全文带入了大量自我认知和理解,可能错误,因为水平有限,但是代表我努力分析过。一、问题提出问题是由姜大师提出的、问题如下:表:mysql> show create table c \G*************************** 1. row *************************** Table: cCreate Table: CREATE TABLE `c` ( `a` int(11) NOT NULL AUTO_INCREMENT, `b` int(11) DEFAULT NULL, PRIMARY KEY (`a`), UNIQUE KEY `b` (`b`)) ENGINE=InnoDB 1 row in set (0.01 sec)开启两个会话不断的执行replace into c values(NULL,1);会触发死锁。问死锁触发的原因。我使用的环境:MYSQL 5.7.14 debug版本、隔离级别RR、自动提交,很显然这里的c表中的可以select出来的记录始终是1条只是a列不断的增大,但是这里实际存储空间确不止1条,因为从heap no来看二级索引中,heap no 已经到了7,也就是有至少7(7-1)条记录,只是其他记录标记为del并且被purge线程放到了page free_list中。二、准备工作和使用方法1、稍微修改了源码关于锁的打印部分,我们知道每个事物下显示锁内存结构lock struct会连接成一个链表,只要按照顺序打印出内存lock struct就打印出了 所有关于这个事物显示锁全部信息和加锁顺序如下:
-
---TRANSACTION 184771, ACTIVE 45 sec
-
4 lock struct(s), heap size 1160, 3 row lock(s)
-
MySQL thread id 2, OS thread handle 140737154311936, query id 642 localhost root cleaning up
-
---lock strcut(1):(Add by gaopeng) In modify Version I force check all REC_LOCK/TAB_LOCK for this Trx
-
TABLE LOCK table `test`.`c4` trx id 184771 lock mode IX
-
---lock strcut(2):(Add by gaopeng) In modify Version I force check all REC_LOCK/TAB_LOCK for this Trx
-
RECORD LOCKS space id 413 page no 4 n bits 72 index id2 of table `test`.`c4` trx id 184771 lock_mode X(LOCK_X)
-
Record lock, heap no 4 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
-
0: len 4; hex 80000014; asc ;;
-
1: len 4; hex 80000014; asc ;;
-
---lock strcut(3):(Add by gaopeng) In modify Version I force check all REC_LOCK/TAB_LOCK for this Trx
-
RECORD LOCKS space id 413 page no 3 n bits 72 index PRIMARY of table `test`.`c4` trx id 184771 lock_mode X(LOCK_X) locks rec but not gap(LOCK_REC_NOT_GAP)
-
Record lock, heap no 4 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
-
0: len 4; hex 80000014; asc ;;
-
1: len 6; hex 00000002d1bd; asc ;;
-
2: len 7; hex a600000e230110; asc # ;;
-
3: len 4; hex 80000014; asc ;;
-
---lock strcut(4):(Add by gaopeng) In modify Version I force check all REC_LOCK/TAB_LOCK for this Trx
-
RECORD LOCKS space id 413 page no 4 n bits 72 index id2 of table `test`.`c4` trx id 184771 lock_mode X(LOCK_X) locks gap before rec(LOCK_GAP)
-
Record lock, heap no 5 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
-
0: len 4; hex 8000001e; asc ;;
-
1: len 4; hex 8000001e; asc ;;
正常的版本只有
-
---TRANSACTION 184771, ACTIVE 45 sec
-
4 lock struct(s), heap size 1160, 3 row lock(s)
-
MySQL thread id 2, OS thread handle 140737154311936, query id 642 localhost root cleaning up
部分后面的都是我加上的,其实修改很简单,innodb其实自己写好了只是没有开启,我开启后加上了序号来表示顺序。上面是一个 select * from c where id2= 20 for update; b列为辅助索引的所有4 lock struct(s),可以看到有了这些信息分析不那么难了。这里稍微分析一下表结构为:mysql> show create table c4;+-------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+| Table | Create Table |+-------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+| c4 | CREATE TABLE `c4` ( `id1` int(11) NOT NULL, `id2` int(11) DEFAULT NULL, PRIMARY KEY (`id1`), KEY `id2` (`id2`)) ENGINE=InnoDB DEFAULT CHARSET=latin1 |+-------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+1 row in set (0.00 sec)数据为:mysql> select * from c4;+-----+------+| id1 | id2 |+-----+------+| 1 | 1 || 10 | 10 || 20 | 20 || 30 | 30 |+-----+------+4 rows in set (0.00 sec)语句为: select * from c where id2= 20 for update;RR模式从锁结构链表来看,这个语句在辅助索引分别锁定了id2:20 id1:20 LOCK_X|LOCK_ORDINARY 也就是NEXT KEY LOCK同时锁定了id2:30 id1:30 LOCK_X|LOCK_GAP也就是gap lock不包含这一列那么画个图容易理解黄色部分为锁定部分:
是不是一目了然?如果是rc那么锁定的只有记录了两个黄色箭头表示gap没有了就不在画图了
2、在死锁检测回滚前调用这个打印函数打印到err日志文件中,打印出全部的事物的显示内存lock struct如下,这里就不给出了,后面会有replace触发死锁千事物锁结构的一个输出3、使用MYSQL TRACE SQL语句得到大部分的函数调用来分析replace的过程修改出现的问题:修改源码打印出所有lock struct 在线上显然是不能用的。因为打印出来后show engine innodb status 会非常长,甚至引发其他问题,但是测试是可以,其次修改了打印死锁事物锁链表到日志后,每次只要遇到死锁信息可以打印到日志,但是每次MYSQLD都会挂掉,但是不影响分析了。三、预备知识(自我理解)1、Precise modes:#define LOCK_ORDINARY 0 /*!< this flag denotes an ordinarynext-key lock in contrast to LOCK_GAPor LOCK_REC_NOT_GAP */默认是LOCK_ORDINARY及普通的next_key_lock,锁住行及以前的间隙#define LOCK_GAP 512 /*!< when this bit is set, it means that thelock holds only on the gap before the record;for instance, an x-lock on the gap does notgive permission to modify the record on whichthe bit is set; locks of this type are createdwhen records are removed from the index chainof records */间隙锁,锁住行以前的间隙,不锁住本行#define LOCK_REC_NOT_GAP 1024 /*!< this bit means that the lock is only onthe index record and does NOT block insertsto the gap before the index record; this isused in the case when we retrieve a recordwith a unique key, and is also used inlocking plain SELECTs (not part of UPDATEor DELETE) when the user has set the READCOMMITTED isolation level */行锁,锁住行而不锁住任何间隙#define LOCK_INSERT_INTENTION 2048 /*!< this bit is set when we place a waitinggap type record lock request in order to letan insert of an index record to wait untilthere are no conflicting locks by othertransactions on the gap; note that this flagremains set when the waiting lock is granted,or if the lock is inherited record */插入意向锁,如果插入的记录在某个已经锁定的间隙内为这个锁2、参数innodb_autoinc_lock_mode的值为1,也许不能保证replace into的顺序。3、infimum和supremum 一个page中包含这两个伪列,页中所有的行未删除(删除未purge)的行都连接到这两个虚列之间,其中 supremum伪列的锁始终为next_key_lock。4、heap no 此行在page中的heap no heap no存储在fixed_extrasize 中,heap no 为物理存储填充的序号,页的空闲空间挂载在page free链表中(头插法)可以重用, 但是重用此heap no不变,如果一直是insert 则heap no 不断增加,并非按照KEY大小排序的逻辑链表顺序,而是物理填充顺序 5、n bits 和这个page相关的锁位图的大小如果我的表有9条数据 还包含2个infimum和supremum虚拟列 及 64+11 bits,及75bits但是必须被8整除为一个字节就是 80 bits6、隐含锁(Implicit lock)和显示锁(explict)锁有隐含和显示之分。隐含锁通常发生在 insert 的时候对cluster index和second index 都加隐含锁,如果是UPDATE(DELETE)对cluster index加显示锁 辅助索引加隐含锁。目的在于减少锁结构的内存开销,如果有事务需要和这个隐含锁而不兼容,这个事务需要帮助 insert或者update(delete)事物将隐含锁变为显示锁,然后给自己加锁,通常insert主键检查会给自己加上S锁,REPLACE、delete、update通常会给自己加上X锁。四、replace过程分析通过replace的trace找到了这些步骤的大概调用:首先我们假设TRX1:replace 不提交TRX2:replace 堵塞TRX1:replace 提交TRX2:replace 继续执行直到完成这样做的目的在于通过trace找到TRX2在哪里等待,确实如我所愿我找到了。1、检查是否冲突,插入主键
-
569 T@4: | | | | | | | | >row_ins
-
570 T@4: | | | | | | | | | row_ins: table: test/c
-
571 T@4: | | | | | | | | | >row_ins_index_entry_step
-
572 T@4: | | | | | | | | | | >row_ins_clust_index_entry
-
573 T@4: | | | | | | | | | | | >row_ins_clust_index_entry_low
-
574 T@4: | | | | | | | | | | | | >btr_cur_search_to_nth_level
-
575 T@4: | | | | | | | | | | | | <btr_cur_search_to_nth_level 2005
-
576 T@4: | | | | | | | | | | | | ib_cur: insert PRIMARY (366) by 183808: TUPLE (info_bits=0, 4 fields): {[4] %(0x00000005),[6] (0x000000020E00),[7] (0x 0A000001010100),[4] (0x00000001)}
-
577 T@4: | | | | | | | | | | | <row_ins_clust_index_entry_low 2649
-
578 T@4: | | | | | | | | | | <row_ins_clust_index_entry 3313
-
579 T@4: | | | | | | | | | <row_ins_index_entry_step 3589
2、检查是否冲突,插入辅助索引,这里实际上就是会话2被堵塞的地方,如下解释(如果冲突回滚先前插入的主键内容)
-
580 T@4: | | | | | | | | | >row_ins_index_entry_step 3589
-
581 T@4: | | | | | | | | | | >row_ins_sec_index_entry_low
-
582 T@4: | | | | | | | | | | | >btr_cur_search_to_nth_level
-
583 T@4: | | | | | | | | | | | <btr_cur_search_to_nth_level 2005
-
584 T@4: | | | | | | | | | | | >row_ins_scan_sec_index_for_duplicate
-
585 T@4: | | | | | | | | | | | | >btr_cur_search_to_nth_level
-
586 T@4: | | | | | | | | | | | | <btr_cur_search_to_nth_level 2005
-
587 T@4: | | | | | | | | | | | | >btr_cur_search_to_nth_level
-
588 T@4: | | | | | | | | | | | | <btr_cur_search_to_nth_level 2005
-
589 T@4: | | | | | | | | | | | | >row_vers_impl_x_locked_low
-
590 T@4: | | | | | | | | | | | | | info: Implicit lock is held by trx:183803
-
591 T@4: | | | | | | | | | | | | <row_vers_impl_x_locked_low 329
-
592 T@4: | | | | | | | | | | | | >thd_report_row_lock_wait
-
593 T@4: | | | | | | | | | | | | <thd_report_row_lock_wait 4246
-
594 T@4: | | | | | | | | | | | <row_ins_scan_sec_index_for_duplicate 2148
-
595 T@4: | | | | | | | | | | <row_ins_sec_index_entry_low 3043
-
596 T@4: | | | | | | | | | <row_ins_index_entry_step 3589
-
597 T@4: | | | | | | | | <row_ins 3758
-
598 //wait here
-
这里我做trace的时候事物的trace停止在了这里我特意加上了598//wait here从下面的输出
-
我们也能肯定确实这里触发了锁等待
-
>row_vers_impl_x_locked_low
-
| info: Implicit lock is held by trx:183803
-
<row_vers_impl_x_locked_low 329
-
>thd_report_row_lock_wait
-
<thd_report_row_lock_wait 4246
-
等待获得锁过后重新检查:
-
599 T@4: | | | | | | | | >row_ins
-
600 T@4: | | | | | | | | | row_ins: table: test/c
-
601 T@4: | | | | | | | | | >row_ins_index_entry_step
-
602 T@4: | | | | | | | | | | >row_ins_sec_index_entry_low
-
603 T@4: | | | | | | | | | | | >btr_cur_search_to_nth_level
-
604 T@4: | | | | | | | | | | | <btr_cur_search_to_nth_level 2005
-
605 T@4: | | | | | | | | | | | >row_ins_scan_sec_index_for_duplicate
-
606 T@4: | | | | | | | | | | | | >btr_cur_search_to_nth_level
-
607 T@4: | | | | | | | | | | | | <btr_cur_search_to_nth_level 2005
-
608 T@4: | | | | | | | | | | | <row_ins_scan_sec_index_for_duplicate 2148
-
609 T@4: | | | | | | | | | | <row_ins_sec_index_entry_low 3043
-
610 T@4: | | | | | | | | | <row_ins_index_entry_step 3589
-
611 T@4: | | | | | | | | <row_ins 3810
我们可以隐隐约约看到row_ins_sec_index_entry_low和row_ins_clust_index_entry_low回检查是否有重复的行 分别代表是二级索引和聚集索引的相关检查,因为就这个案例主键不可能出现重复值,而二级索引这个例子中肯定是 重复的,索引row_ins_sec_index_entry_low触发了等待,其实我们知道这里的锁方式如下列子: ---lock strcut(2):(Add by gaopeng) In modify Version I force check all REC_LOCK/TAB_LOCK chain! for this TrxRECORD LOCKS space id 406 page no 4 n bits 72 index b of table `test`.`c` trx id 177891 lock_mode X(LOCK_X) waiting(LOCK_WAIT)Record lock, heap no 3 PHYSICAL RECORD: n_fields 2; compact format; info bits 32 0: len 4; hex 80000001; asc ;; 1: len 4; hex 80000006; asc ;; LOCK_X|LOCK_ORDINARY|LOCK_WAIT:需要X的next_key lock处于等待状态他需要锁定(infimum,{1,6}]这个区间。这也是死锁发生的关键一个环节。3、这里涉及到了回滚操作,从下面的trace输出我们也能看到确实做了回滚 实际上事物2会堵塞在这里,因为我做trace的时候他一直停在 这里不动了。为此我还加上598行说明在这里wait了
-
612 T@4: | | | | | | | | ib_que: Execute 9 (QUERY THREAD) at 0x7fffe804b7d8
-
613 T@4: | | | | | | | | ib_que: Execute 12 (ROLLBACK) at 0x7fffe804b6b0
-
614 T@4: | | | | | | | | ib_que: Execute 12 (ROLLBACK) at 0x7fffe804b6b0
-
615 T@4: | | | | | | | | ib_que: Execute 9 (QUERY THREAD) at 0x7fffe804b7d8
-
616 T@4: | | | | | | | | ib_que: Execute 9 (QUERY THREAD) at 0x7fffe800eec8
-
617 T@4: | | | | | | | | ib_que: Execute 10 (UNDO ROW) at 0x7fffe801b090
-
618 T@4: | | | | | | | | >btr_cur_search_to_nth_level
-
619 T@4: | | | | | | | | <btr_cur_search_to_nth_level 2005
-
620 T@4: | | | | | | | | >btr_cur_search_to_nth_level
-
621 T@4: | | | | | | | | <btr_cur_search_to_nth_level 2005
-
622 T@4: | | | | | | | | ib_que: Execute 10 (UNDO ROW) at 0x7fffe801b090
-
623 T@4: | | | | | | | | ib_que: Execute 9 (QUERY THREAD) at 0x7fffe800eec8
4、这个重复key会传递给SERVER层次,并且貌似重新初始化了事物(只是从trace猜测)
-
639 T@4: | | | | | | >handler::get_dup_key
-
640 T@4: | | | | | | | >info
-
641 T@4: | | | | | | | | >ha_innobase::update_thd
-
642 T@4: | | | | | | | | | ha_innobase::update_thd: user_thd: 0x7fffe8000b90 -> 0x7fffe8000b90
-
643 T@4: | | | | | | | | | >innobase_trx_init
-
644 T@4: | | | | | | | | | <innobase_trx_init 2765
-
645 T@4: | | | | | | | | <ha_innobase::update_thd 3073
-
646 T@4: | | | | | | | <info 14717
-
647 T@4: | | | | | | <handler::get_dup_key 4550
-
648 T@4: | | | | | | >column_bitmaps_signal
-
649 T@4: | | | | | | | info: read_set: 0x7fffc8941da0 write_set: 0x7fffc8941da0
-
650 T@4: | | | | | | <column_bitmaps_signal 3846
-
651 T@4: | | | | | | >innobase_trx_init
-
652 T@4: | | | | | | <innobase_trx_init 2765
-
653 T@4: | | | | | | >index_init
-
654 T@4: | | | | | | <index_init 8864
5、接下就是真正删除插入主键
-
689 T@4: | | | | | | | | >row_update_for_mysql_using_upd_graph
-
690 T@4: | | | | | | | | | >row_upd_step
-
691 T@4: | | | | | | | | | | >row_upd
-
692 T@4: | | | | | | | | | | | row_upd: table: test/c
-
693 T@4: | | | | | | | | | | | row_upd: info bits in update vector: 0x0
-
694 T@4: | | | | | | | | | | | row_upd: foreign_id: NULL
-
695 T@4: | | | | | | | | | | | ib_cur: delete-mark clust test/c (366) by 183808: COMPACT RECORD(info_bits=32, 4 fields): {[4] $(0x00000004),[6] (0x000000020D 0B),[7] (0x00000001090100),[4] (0x00000001)}
-
696 T@4: | | | | | | | | | | | >row_ins_clust_index_entry
-
697 T@4: | | | | | | | | | | | | >row_ins_clust_index_entry_low
-
698 T@4: | | | | | | | | | | | | | >btr_cur_search_to_nth_level
-
699 T@4: | | | | | | | | | | | | | <btr_cur_search_to_nth_level 2005
-
700 T@4: | | | | | | | | | | | | | ib_cur: insert PRIMARY (366) by 183808: TUPLE (info_bits=0, 4 fields): {[4] %(0x00000005),[6](0x000000020E00),[7] ( 0x00000001090100),[4] (0x00000001)}
-
701 T@4: | | | | | | | | | | | | <row_ins_clust_index_entry_low 2649
-
702 T@4: | | | | | | | | | | | <row_ins_clust_index_entry 3313
-
703 T@4: | | | | | | | | | | | >btr_cur_search_to_nth_level
-
704 T@4: | | | | | | | | | | | <btr_cur_search_to_nth_level 2005
-
705 T@4: | | | | | | | | | | | ib_cur: delete-mark=1 sec 406:4:2 in b(367) by 183808
6、接下就是真正插入辅助索引
-
706 T@4: | | | | | | | | | | | >row_ins_sec_index_entry_low
-
707 T@4: | | | | | | | | | | | | >btr_cur_search_to_nth_level
-
708 T@4: | | | | | | | | | | | | <btr_cur_search_to_nth_level 2005
-
709 T@4: | | | | | | | | | | | | >row_ins_scan_sec_index_for_duplicate
-
710 T@4: | | | | | | | | | | | | | >btr_cur_search_to_nth_level
-
711 T@4: | | | | | | | | | | | | | <btr_cur_search_to_nth_level 2005
-
712 T@4: | | | | | | | | | | | | | >btr_cur_search_to_nth_level
-
713 T@4: | | | | | | | | | | | | | <btr_cur_search_to_nth_level 2005
-
714 T@4: | | | | | | | | | | | | | >row_vers_impl_x_locked_low
-
715 T@4: | | | | | | | | | | | | | <row_vers_impl_x_locked_low 123
-
716 T@4: | | | | | | | | | | | | | >btr_cur_search_to_nth_level
-
717 T@4: | | | | | | | | | | | | | <btr_cur_search_to_nth_level 2005
-
718 T@4: | | | | | | | | | | | | | >row_vers_impl_x_locked_low
-
719 T@4: | | | | | | | | | | | | | | info: Implicit lock is held by trx:183808
-
720 T@4: | | | | | | | | | | | | | <row_vers_impl_x_locked_low 329
-
721 T@4: | | | | | | | | | | | | <row_ins_scan_sec_index_for_duplicate 2148
-
722 T@4: | | | | | | | | | | | | >btr_cur_search_to_nth_level
-
723 T@4: | | | | | | | | | | | | <btr_cur_search_to_nth_level 2005
-
724 T@4: | | | | | | | | | | | | ib_cur: insert b (367) by 183808: TUPLE (info_bits=0, 2 fields): {[4] (0x00000001),[4] %(0x00000005)}
-
725 T@4: | | | | | | | | | | | <row_ins_sec_index_entry_low 3194
-
726 T@4: | | | | | | | | | | <row_upd 3066
-
727 T@4: | | | | | | | | | <row_upd_step 3181
-
728 T@4: | | | | | | | | <row_update_for_mysql_using_upd_graph 2670
-
729 T@4: | | | | | | | <ha_innobase::update_row 8656
注意:上面只是看trace出来的过程,很多是根据函数调用进行的猜测。
五、死锁前事物锁信息打印分析
打印出死锁前事物的全部信息
-
2017-06-28T00:25:20.202052Z 76 [Note] InnoDB: (DeadlockChecker::check_and_resolve(T):Add by gaopeng before rollback print all trx info here:)
-
LIST OF TRANSACTIONS FOR EACH SESSION:
-
---TRANSACTION 422212176319952, not started
-
0 lock struct(s), heap size 1160, 0 row lock(s)
-
---TRANSACTION 182592, ACTIVE 0 sec inserting
-
mysql tables in use 1, locked 1
-
LOCK WAIT 2 lock struct(s), heap size 1160, 1 row lock(s), undo log entries 1
-
MySQL thread id 77, OS thread handle 140737155630848, query id 3627 localhost root update
-
replace into c values(NULL,1)
-
---lock strcut(1):(Add by gaopeng) In modify Version I force check all REC_LOCK/TAB_LOCK for this Trx
-
TABLE LOCK table `mysqlslap`.`c` trx id 182592 lock mode IX
-
---lock strcut(2):(Add by gaopeng) In modify Version I force check all REC_LOCK/TAB_LOCK for this Trx
-
RECORD LOCKS space id 407 page no 4 n bits 80 index b of table `mysqlslap`.`c` trx id 182592 lock_mode X(LOCK_X) waiting(LOCK_WAIT)
-
Record lock, heap no 7 PHYSICAL RECORD: n_fields 2; compact format; info bits 32
-
0: len 4; hex 80000001; asc ;;
-
1: len 4; hex 80000911; asc ;;
-
-
---TRANSACTION 182588, ACTIVE 0 sec updating or deleting
-
mysql tables in use 1, locked 1
-
5 lock struct(s), heap size 1160, 5 row lock(s), undo log entries 2
-
MySQL thread id 76, OS thread handle 140737156429568, query id 3623 localhost root update
-
replace into c values(NULL,1)
-
---lock strcut(1):(Add by gaopeng) In modify Version I force check all REC_LOCK/TAB_LOCK for this Trx
-
TABLE LOCK table `mysqlslap`.`c` trx id 182588 lock mode IX
-
---lock strcut(2):(Add by gaopeng) In modify Version I force check all REC_LOCK/TAB_LOCK for this Trx
-
RECORD LOCKS space id 407 page no 4 n bits 80 index b of table `mysqlslap`.`c` trx id 182588 lock_mode X(LOCK_X)
-
Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
-
0: len 8; hex 73757072656d756d; asc supremum;;
-
Record lock, heap no 7 PHYSICAL RECORD: n_fields 2; compact format; info bits 32
-
0: len 4; hex 80000001; asc ;;
-
1: len 4; hex 80000911; asc ;;
-
-
---lock strcut(3):(Add by gaopeng) In modify Version I force check all REC_LOCK/TAB_LOCK for this Trx
-
RECORD LOCKS space id 407 page no 4 n bits 80 index b of table `mysqlslap`.`c` trx id 182588 lock_mode X(LOCK_X) locks gap before rec(LOCK_GAP)
-
Record lock, heap no 7 PHYSICAL RECORD: n_fields 2; compact format; info bits 32
-
0: len 4; hex 80000001; asc ;;
-
1: len 4; hex 80000911; asc ;;
-
-
---lock strcut(4):(Add by gaopeng) In modify Version I force check all REC_LOCK/TAB_LOCK for this Trx
-
RECORD LOCKS space id 407 page no 3 n bits 88 index PRIMARY of table `mysqlslap`.`c` trx id 182588 lock_mode X(LOCK_X) locks rec but not gap(LOCK_REC_NOT_GAP)
-
Record lock, heap no 2 PHYSICAL RECORD: n_fields 4; compact format; info bits 32
-
0: len 4; hex 80000911; asc ;;
-
1: len 6; hex 00000002c93c; asc <;;
-
2: len 7; hex 2c0000402e0c97; asc , @. ;;
-
3: len 4; hex 80000001; asc ;;
-
-
---lock strcut(5):(Add by gaopeng) In modify Version I force check all REC_LOCK/TAB_LOCK for this Trx
-
RECORD LOCKS space id 407 page no 4 n bits 80 index b of table `mysqlslap`.`c` trx id 182588 lock_mode X(LOCK_X) locks gap before rec(LOCK_GAP) insert intention(LOCK_INSERT_INTENTION) waiting(LOCK_WAIT)
-
Record lock, heap no 7 PHYSICAL RECORD: n_fields 2; compact format; info bits 32
-
0: len 4; hex 80000001; asc ;;
-
1: len 4; hex 80000911; asc ;;
这里涉及两个事物
1、TRANSACTION 182592
2、TRANSACTION 182588
事物182588先于182592执行。
在上一次事物结束后:
留下的数据为:
a=0X911 b=0X1
那么开始辅助索引上的逻辑链表 如下(这里只画单项链表)
死锁流程流程开始:
时刻N:--- TRX:182588 lock strcut(2)
supremum和a=0X911 b=0X1均加LOCK_X|LOCK_ORDINARY 为独占NEXT KEY LOCK, 换句话说整个范围都加锁了,就这个列子replace期间不能 插入任何数据,这点比较狠。 如下图黄色部分都是这里加锁了:
时刻N+1:---TRX:182592 lock strcut(2)
检查辅助索引唯一性,这里在上面的trace中已经说明过了等待发生,锁类型为 LOCK_X|LOCK_ORDINARY|LOCK_WAIT同样为独占NEXT KEY LOCK处于WAIT范围是0X1 0X911如下图 红色箭头部分都是要想加锁的其实就是0X1 0X911和infimum到0X1 0X911的范围:
时刻N+2:---TRX:182588
这里涉及一个重点和lock strcut(5) (参数innodb_autoinc_lock_mode的值为1,也许不能保证replace into的顺序有关),
显然这个时候如果我要插入的新数据是a=0X908 b=0X1,只要小于a=0X911 b=0X1 就会落到TRX182592锁定的范围那锁的类型为: LOCK_X|LOCK_GAP|LOCK_INSERT_INTENTION|LOCK_WAIT 因为这个范围已经被其他事物TRX:182592锁定本事物WAIT
插入意向GAP锁不能获得 如下图绿色箭头就表示a=0X908 b=0X1需要插入到的位置,显然
已经被TRX:182592锁定:
这样一来 TRX:182588等待TRX:182592等待TRX:182588的流程就出来了,死锁触发。从整个死锁的关键来看视乎innodb_autoinc_lock_mode能够保证并发事物获得自增值的顺序就成了关键,如果能够保证顺序显然a=0X908 b=0X1这样的记录不可能出现,因为0X911后面应该是0X912,如果是a=0X912 b=0X1自然不会落到infimum到a=0X911 b=0X1之间不会被TRX182592堵塞
死锁也不会出现,所以这里死锁触发还是需要一定条件的。
如下图:
六、实验初次验证我使用如下:CREATE TABLE `c` ( `a` int(11) NOT NULL , `b` int(11) DEFAULT NULL, PRIMARY KEY (`a`), UNIQUE KEY `b` (`b`)) ENGINE=InnoDB ;这个存储过程增加奇数DELIMITER // CREATE PROCEDURE test1()begin declare num int;set num = 1; while num <= 100000000 do replace into c values(num,1); set num = num+2;end while;end //这个存储过程增加偶数CREATE PROCEDURE test2()begin declare num int;set num = 2; while num <= 100000000 do replace into c values(num,1); set num = num+2;end while;end //DELIMITER ;TRX1:时间Nmysql> call test1(); TRX2:时间N+0.5分钟 mysql> call test2();继续执行TRX1:(几乎同时)时间N+0.5分钟mysql> call test1(); ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transactionTRX2: mysql> call test2();继续执行可以看到这两个存储过程不会产生同样的主键一个为奇数一个为偶数然后让test1先跑一会,然后再跑test2()目的在于模拟上面主键自增取值小于当前值的情况,果然基本是马上就出现死锁了。例如某一时刻 test1() 中主键的值为1001此时test2()才开始跑这个时候主键值为2那么就模拟出来上面判断的0X911后事物取值为0X908的情况,只是这里不再是偶然事件,而是必然,而原题中为偶然事件。由于家里没有修改了源码的环境就先看show engine innodb status中的死锁如下:
-
------------------------
-
LATEST DETECTED DEADLOCK
-
------------------------
-
2017-06-29 14:10:30 0x7fa48148b700
-
*** (1) TRANSACTION:
-
TRANSACTION 4912797, ACTIVE 0 sec inserting
-
mysql tables in use 1, locked 1
-
LOCK WAIT 2 lock struct(s), heap size 1136, 1 row lock(s), undo log entries 1
-
MySQL thread id 2, OS thread handle 140344520656640, query id 3371 localhost root update
-
replace into c values(num,1)
-
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
-
RECORD LOCKS space id 598 page no 4 n bits 80 index b of table `test`.`c` trx id 4912797 lock_mode X waiting
-
Record lock, heap no 3 PHYSICAL RECORD: n_fields 2; compact format; info bits 32
-
0: len 4; hex 80000001; asc ;;
-
1: len 4; hex 800007d5; asc ;;
-
-
*** (2) TRANSACTION:
-
TRANSACTION 4912793, ACTIVE 0 sec updating or deleting
-
mysql tables in use 1, locked 1
-
6 lock struct(s), heap size 1136, 6 row lock(s), undo log entries 2
-
MySQL thread id 3, OS thread handle 140344520390400, query id 3365 localhost root update
-
replace into c values(num,1)
-
*** (2) HOLDS THE LOCK(S):
-
RECORD LOCKS space id 598 page no 4 n bits 80 index b of table `test`.`c` trx id 4912793 lock_mode X
-
Record lock, heap no 3 PHYSICAL RECORD: n_fields 2; compact format; info bits 32
-
0: len 4; hex 80000001; asc ;;
-
1: len 4; hex 800007d5; asc ;;
-
-
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
-
RECORD LOCKS space id 598 page no 4 n bits 80 index b of table `test`.`c` trx id 4912793 lock_mode X locks gap before rec insert intention waiting
-
Record lock, heap no 3 PHYSICAL RECORD: n_fields 2; compact format; info bits 32
-
0: len 4; hex 80000001; asc ;;
-
1: len 4; hex 800007d5; asc ;;
-
-
*** WE ROLL BACK TRANSACTION (1)
可以看到完全一致,证明问题判断正确。七、疑问1、二级索引中heap no运行一段时间后如下:00200010000e 80000001800008af002000180054 800000018000071200200020000e 800000018000001400240028002a 8000000180000017002000300070 80000001800008e000200038fff2 800000018000091100200040002a 800000018000001a00200048002a 800000018000001b00000050ff82 8000000180000912002000580000 800000018000001d00200060ff90 800000018000001e00200068ffd6 800000018000090e00200070ff58 80000001800008df很显然这里只有8000000180000912 是当前数据,其他都标记为了del,按理说长度一样的数据进行删除插入,空间应该会不断重用,为什么有时候重用不了呢?2、在整个replace加锁流程中,我并没有完全搞懂,譬如182588的lock strcut(3)和lock strcut(1)分别是什么时候加 的用于保护什么操作,这里只是从死锁现象进行了分析。
八、RC模式下我做了同样的测试得到如下的死锁前事物LOCK STRUCT链表和RR模式基本无异。不在分析给出即可。
mysql> show variables like '%tx_isolation%';
+---------------+----------------+
| Variable_name | Value |
+---------------+----------------+
| tx_isolation | READ-COMMITTED |
+---------------+----------------+
1 row in set (0.09 sec)
-
------- TRX HAS BEEN WAITING 0 SEC FOR THIS LOCK TO BE GRANTED:
-
RECORD LOCKS space id 407 page no 4 n bits 104 index b of table `mysqlslap`.`c` trx id 289638 lock_mode X(LOCK_X) waiting(LOCK_WAIT)
-
Record lock, heap no 20 PHYSICAL RECORD: n_fields 2; compact format; info bits 32
-
0: len 4; hex 80000001; asc ;;
-
1: len 4; hex 80001220; asc ;;
-
-
------------------
-
---lock strcut(1):(Add by gaopeng) In modify Version I force check all REC_LOCK/TAB_LOCK for this Trx
-
TABLE LOCK table `mysqlslap`.`c` trx id 289638 lock mode IX
-
---lock strcut(2):(Add by gaopeng) In modify Version I force check all REC_LOCK/TAB_LOCK for this Trx
-
RECORD LOCKS space id 407 page no 4 n bits 104 index b of table `mysqlslap`.`c` trx id 289638 lock_mode X(LOCK_X) waiting(LOCK_WAIT)
-
Record lock, heap no 20 PHYSICAL RECORD: n_fields 2; compact format; info bits 32
-
0: len 4; hex 80000001; asc ;;
-
1: len 4; hex 80001220; asc ;;
-
-
---TRANSACTION 289636, ACTIVE 0 sec updating or deleting
-
mysql tables in use 1, locked 1
-
7 lock struct(s), heap size 1160, 7 row lock(s), undo log entries 2
-
MySQL thread id 5, OS thread handle 140734658983680, query id 4646 localhost root update
-
replace into c values(null,1)
-
---lock strcut(1):(Add by gaopeng) In modify Version I force check all REC_LOCK/TAB_LOCK for this Trx
-
TABLE LOCK table `mysqlslap`.`c` trx id 289636 lock mode IX
-
---lock strcut(2):(Add by gaopeng) In modify Version I force check all REC_LOCK/TAB_LOCK for this Trx
-
RECORD LOCKS space id 407 page no 4 n bits 104 index b of table `mysqlslap`.`c` trx id 289636 lock_mode X(LOCK_X)
-
Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
-
0: len 8; hex 73757072656d756d; asc supremum;;
-
Record lock, heap no 3 PHYSICAL RECORD: n_fields 2; compact format; info bits 32
-
0: len 4; hex 80000001; asc ;;
-
1: len 4; hex 80001221; asc !;;
-
-
---lock strcut(3):(Add by gaopeng) In modify Version I force check all REC_LOCK/TAB_LOCK for this Trx
-
RECORD LOCKS space id 407 page no 4 n bits 104 index b of table `mysqlslap`.`c` trx id 289636 lock_mode X(LOCK_X) locks gap before rec(LOCK_GAP)
-
Record lock, heap no 20 PHYSICAL RECORD: n_fields 2; compact format; info bits 32
-
0: len 4; hex 80000001; asc ;;
-
1: len 4; hex 80001220; asc ;;
-
-
---lock strcut(4):(Add by gaopeng) In modify Version I force check all REC_LOCK/TAB_LOCK for this Trx
-
RECORD LOCKS space id 407 page no 4 n bits 104 index b of table `mysqlslap`.`c` trx id 289636 lock_mode X(LOCK_X)
-
-
---lock strcut(5):(Add by gaopeng) In modify Version I force check all REC_LOCK/TAB_LOCK for this Trx
-
RECORD LOCKS space id 407 page no 4 n bits 104 index b of table `mysqlslap`.`c` trx id 289636 lock_mode X(LOCK_X)
-
Record lock, heap no 20 PHYSICAL RECORD: n_fields 2; compact format; info bits 32
-
0: len 4; hex 80000001; asc ;;
-
1: len 4; hex 80001220; asc ;;
-
-
---lock strcut(6):(Add by gaopeng) In modify Version I force check all REC_LOCK/TAB_LOCK for this Trx
-
RECORD LOCKS space id 407 page no 3 n bits 104 index PRIMARY of table `mysqlslap`.`c` trx id 289636 lock_mode X(LOCK_X) locks rec butnot gap(LOCK_REC_NOT_GAP)
-
Record lock, heap no 8 PHYSICAL RECORD: n_fields 4; compact format; info bits 32
-
0: len 4; hex 80001221; asc !;;
-
1: len 6; hex 000000046b64; asc kd;;
-
2: len 7; hex 30000001f00c97; asc 0 ;;
-
3: len 4; hex 80000001; asc ;;
-
-
---lock strcut(7):(Add by gaopeng) In modify Version I force check all REC_LOCK/TAB_LOCK for this Trx
-
RECORD LOCKS space id 407 page no 4 n bits 104 index b of table `mysqlslap`.`c` trx id 289636 lock_mode X(LOCK_X) locks gap before rec(LOCK_GAP) insert intention(LOCK_INSERT_INTENTION) waiting(LOCK_WAIT)
-
Record lock, heap no 20 PHYSICAL RECORD: n_fields 2; compact format; info bits 32
-
0: len 4; hex 80000001; asc ;;
-
1: len 4; hex 80001220; asc ;;
本文某些关键点参考了文章,最大的提示就是自增值不是有序的,这点以后要验证一下,但是实验也证明了这一点:https://yq.aliyun.com/articles/41190作者微信:
版权声明:本文内容由阿里云实名注册用户自发贡献,版权归原作者所有,阿里云开发者社区不拥有其著作权,亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容,填写侵权投诉表单进行举报,一经查实,本社区将立刻删除涉嫌侵权内容。