之前遇到一个DBA,在生产库上加字段,导致数据库连接数打满。原因就是MDL锁引起。下面让我来介绍一下MDL锁及其排查和处理方式。
MDL锁:
全称meta data lock,是表锁,用于保护数据库对象定义不被修改。执行SQL语句操作表都是需要获取和持有MDL锁,直到锁被释放。
案例描述:
一位同学查询只有2条数据的test.tt1,10秒后却返回锁等待超时报错。
案例警示:
- 要对生产数据库有敬畏之心
- 业务高峰,不建议对大表和热表做DDL变更(推荐gh-ost)
- 有效的数据库备份,重于一切
- 良好的数据库运维规范是减少故障的基础(DBA风险操作,需要审核和通知业务方)
技术回放:
数据库版本、表结构和报错信息,如下
mysql>select version();+---------------+| version()|+---------------+|5.7.38-41-log |+---------------+1 row inset(0.54 sec)mysql> show createtable tmp_mdl_lock ;+--------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+|Table|CreateTable|+--------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+| tmp_mdl_lock |CREATETABLE `tmp_mdl_lock` ( `id` int(11)NOTNULL AUTO_INCREMENT, `name` varchar(10) DEFAULT NULL, PRIMARY KEY (`id`)) ENGINE=InnoDB AUTO_INCREMENT=3 DEFAULT CHARSET=utf8 |+--------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+1 row inset(0.00 sec)mysql>select*from test.tmp_mdl_lock;ERROR 1205(HY000): Lock wait timeout exceeded; try restarting transaction
拓展:
lock_wait_timeout设置了10秒,因此上面的SQL执行了10秒报锁等待超时错误
如果表 test.tmp_mdl_lock上的查询语句频繁,而且客户端有重试机制,也就是说超时后会再起一个新SESSION再请求的话,这个库的线程很快就会爆满
DBA分析和处理过程:
1.首先查看会话信息,发现Waiting for table metadata lock,确认为DML锁引起
select*from information_schema.processlistwhere info isnotnull;+-----------+------+-----------+------+---------+------+---------------------------------+---------------------------------------------------------------------+---------+-----------+---------------+| ID | USER | HOST | DB | COMMAND |TIME| STATE | INFO | TIME_MS | ROWS_SENT | ROWS_EXAMINED |+-----------+------+-----------+------+---------+------+---------------------------------+---------------------------------------------------------------------+---------+-----------+---------------+|251182208| root | localhost |NULL| Query |0| executing |select*from information_schema.processlistwhere info isnotnull|2|0|0||251181173| root | localhost |NULL| Query |3| Waiting for table metadata lock |select*from test.tmp_mdl_lock|3077|0|0|+-----------+------+-----------+------+---------+------+---------------------------------+---------------------------------------------------------------------+---------+-----------+---------------+
2.查询持有并导致其他事务阻塞的连接会话
select*from sys.schema_table_lock_waits \G ***************************1. row *************************** object_schema: test object_name: tmp_mdl_lock waiting_thread_id:251885079 waiting_pid:251181173 waiting_account: root@localhost waiting_lock_type: SHARED_READ waiting_lock_duration: TRANSACTION waiting_query:select*from test.tmp_mdl_lock waiting_query_secs:10 waiting_query_rows_affected:0 waiting_query_rows_examined:0 blocking_thread_id:251884951 blocking_pid:251181044 blocking_account: root@localhost blocking_lock_type: SHARED_NO_READ_WRITE blocking_lock_duration: TRANSACTION sql_kill_blocking_query: KILL QUERY 251181044sql_kill_blocking_connection: KILL 2511810441 row inset(0.05 sec)拓展:sys.schema_table_lock_waits默认为空,请打开mdl对应的instrument; show global variables like'performance_schema';+--------------------+-------+| Variable_name | Value |+--------------------+-------+| performance_schema |ON|+--------------------+-------+update setup_instruments set ENABLED ='yes',timed ='yes'where NAME ='wait/lock/metadata/sql/mdl';select*from setup_instruments where name ='wait/lock/metadata/sql/mdl';+----------------------------+---------+-------+| NAME | ENABLED | TIMED |+----------------------------+---------+-------+| wait/lock/metadata/sql/mdl | YES | YES |
3.KILL 251181044,断开线程的连接;
拓展:
kill query 线程id,表示终止这个线程中正在执行的语句;kill 线程id,表示断开这个线程的连接,这个连接中未提交的事务会回滚、在执行的SQL会停止