一、问题描述
问题现场如下:
mysql> show processlist; +----+-----------------+-----------+------+---------+-------+-------------------------+----------------------------------+ | Id | User | Host | db | Command | Time | State | Info | +----+-----------------+-----------+------+---------+-------+-------------------------+----------------------------------+ | 5 | event_scheduler | localhost | NULL | Daemon | 34025 | Waiting on empty queue | NULL | | 11 | root | localhost | NULL | Query | 1528 | User sleep | select sleep(1000) from test.e01 | | 12 | root | localhost | NULL | Query | 0 | starting | show processlist | | 23 | root | localhost | NULL | Query | 33 | Waiting for table flush | select * from test.e01 | | 24 | root | localhost | NULL | Query | 7 | Waiting for table flush | select * from test.e01 | +----+-----------------+-----------+------+---------+-------+-------------------------+----------------------------------+
乍一看来,很是奇怪,这里没有出现我们经常遇到的 flush table/flush table with read lock 堵塞,直接出现了 Waiting for table flush的堵塞,有点像
https://www.jianshu.com/p/b141585cd844
以前记录的文章中的案例2,但是其实并不一样,这里是由于analyze table语句造成的。
复现过程非常简单(必须是社区版本,我使用的8.0.21),如下:
session 1
mysql> select sleep(1000) from test.e01;(要有几条数据) 这条语句肯定结束不了
session 2
mysql> analyze table test.e01; +----------+---------+----------+----------+ | Table | Op | Msg_type | Msg_text | +----------+---------+----------+----------+ | test.e01 | analyze | status | OK | +----------+---------+----------+----------+
session 3
mysql> select * from test.e01; # 被堵塞了
此时堵塞的情形就是Waiting for table flush
二、analyze触发了什么
analyze table 除了更新我们的统计数据,实际上最后做了一个操作如下(栈):
#0 TABLE_SHARE::clear_version (this=0x7ffeec00ee38) at /opt/mysql/mysql-8.0.21/sql/table.cc:518 #1 0x0000000003680a4d in <lambda(std::unordered_map<std::__cxx11::basic_string<char>, std::unique_ptr<TABLE_SHARE, Table_share_deleter>, std::hash<std::__cxx11::basic_string<char> >, std::equal_to<std::__cxx11::basic_string<char> >, Malloc_allocator<std::pair<const std::__cxx11::basic_string<char>, std::unique_ptr<TABLE_SHARE, Table_share_deleter> > > >::iterator)>::operator()(std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::unique_ptr<TABLE_SHARE, Table_share_deleter>, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, Malloc_allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::unique_ptr<TABLE_SHARE, Table_share_deleter> > > >::iterator) const (__closure=0x7fff600ce8a0, my_it=...) at /opt/mysql/mysql-8.0.21/sql/sql_base.cc:9831 #2 0x0000000003680d0c in tdc_remove_table (thd=0x7ffed8094550, remove_type=TDC_RT_REMOVE_UNUSED, db=0x7ffed80c4a08 "test", table_name=0x7ffed80c4a20 "e01", has_lock=false) at /opt/mysql/mysql-8.0.21/sql/sql_base.cc:9844 #3 0x0000000003d890a9 in mysql_admin_table (thd=0x7ffed8094550, tables=0x7ffed80c4cb8, check_opt=0x7ffed80ba370, operator_name=0x61c06e7 "analyze", lock_type=TL_READ_NO_INSERT, open_for_modify=true, repair_table_use_frm=false, extra_open_options=0, prepare_func=0x0, operator_func= (int (handler::*)(handler * const, THD *, HA_CHECK_OPT *)) 0x3accc64 <handler::ha_analyze(THD*, HA_CHECK_OPT*)>, check_view=0, alter_info=0x7ffed80c4b28, need_to_acquire_shared_backup_lock=true) at /opt/mysql/mysql-8.0.21/sql/sql_admin.cc:1296
大概看一下做了什么,如下:
if (share->ref_count() > 0) { if (remove_type != TDC_RT_REMOVE_NOT_OWN_KEEP_SHARE) share->clear_version(); table_cache_manager.free_table(thd, remove_type, share); }
这里判断了是否当前table share正在使用,如果正在使用(很显然我们这个table share是不能直接释放的,因为有select一直持有它),那么将share版本的设置为0(share->clear_version()。
实际上这个版本由全局变量refresh_version初始化),目的在于下次如果有使用表定义的时候需要重新打开table share。
然后释放了当前没有使用的table cache(类型TDC_RT_REMOVE_UNUSED)。
三、再次访问表堵塞
当再次访问表的时候(open_table),会去判断如下是否有老的table share存在,如果存在则需要等待释放:
share_found: if (!(flags & MYSQL_OPEN_IGNORE_FLUSH)) { if (share->has_old_version()) { release_table_share(share); ... wait_result = tdc_wait_for_old_version(thd, table_list->db, table_list->table_name, ot_ctx->get_timeout(), deadlock_weight);
首先如果存在判断是否存在的老版本,判断是通过table share的版本和当前全局版本refresh_version进行比对。
前面我们知道这里table share的版本已经设置为0,因此这里必然进入release_table_share环节,然后等待持有者的释放(案例窗口1的select查询),然后再次获取table share。
等待栈如下:
(gdb) bt #0 tdc_wait_for_old_version (thd=0x7ffed8094550, db=0x7ffed802e5c0 "test", table_name=0x7ffed802e5d8 "e01", wait_timeout=31536000, deadlock_weight=0) at /opt/mysql/mysql-8.0.21/sql/sql_base.cc:2705 #1 0x0000000003671d7e in open_table (thd=0x7ffed8094550, table_list=0x7ffed802e7c8, ot_ctx=0x7fff600cf1a0) at /opt/mysql/mysql-8.0.21/sql/sql_base.cc:3280 #2 0x0000000003675c36 in open_and_process_table (thd=0x7ffed8094550, lex=0x7ffed80b9fe0, tables=0x7ffed802e7c8, counter=0x7ffed80ba038, prelocking_strategy=0x7fff600cf2b8, has_prelocking_list=false, ot_ctx=0x7fff600cf1a0) at /opt/mysql/mysql-8.0.21/sql/sql_base.cc:5000 #3 0x0000000003677102 in open_tables (thd=0x7ffed8094550, start=0x7fff600cf260, counter=0x7ffed80ba038, flags=0, prelocking_strategy=0x7fff600cf2b8) at /opt/mysql/mysql-8.0.21/sql/sql_base.cc:5664 #4 0x0000000003678993 in open_tables_for_query (thd=0x7ffed8094550, tables=0x7ffed802e7c8, flags=0) at /opt/mysql/mysql-8.0.21/sql/sql_base.cc:6515
四、Percona修复该问题
Percona在上文中已经提到问题如下:
一个关键的修改点如下
官方版本:
Percona版本:
如此修改后analyze不会进入tdc_remove_table函数,那么table share的版本不会设置为0。因此如果使用官方版本小心本问题。
最后说下,官方版本测试到8.0.21依旧存在这个问题,而Percona已经修复了,详情见下
https://www.percona.com/blog/2018/03/27/analyze-table-is-no-longer-a-blocking-operation/
全文完。
Enjoy MySQL :)