truncate table 和 drop table 的一点坑-阿里云开发者社区

truncate table 和 drop table 的一点坑

2018-03-27 4066

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

本文涉及的产品

RDS MySQL Serverless 基础系列，0.5-2RCU 50GB

云数据库 RDS MySQL，集群系列 2核4GB

云数据库 RDS MySQL，高可用系列 2核4GB

简介： 网上一搜这个关键字，得到的结果大多都是delete、truncate、drop之间的区别但是今天我们要讲的内容，是我们在生产环境中遇到的真实案例互联网公司一般对大表，都会采用分区表或者物理分表吧，这里主要描述的是分表的删除过程中的问题案例一环境MySQL5.

网上一搜这个关键字，得到的结果大多都是delete、truncate、drop之间的区别

但是今天我们要讲的内容，是我们在生产环境中遇到的真实案例

互联网公司一般对大表，都会采用分区表或者物理分表吧，这里主要描述的是分表的删除过程中的问题

案例一

环境

MySQL5.1.54
InnoDB
128G memory
innodb_adaptive_hash_index=1
centos6.6

问题复现

业务描述：由于磁盘空间不断上涨，DBA定期会去删除业务过期不用的表，比如自动删除1年之前的数据

当执行到 drop table的时候，实例的慢查询从 5/s 飙升到 3000/s， 10ms 飙升到 200ms （表大小在10G左右）

slow sql的种类非常多，并不是被删除表的slow ，而是其他表的查询变慢。

别问我怎么知道的，这是我们的slow 监控系统发现的

那么问题来了，Drop table 为啥会出现这样的问题呢？

当然，当时还不得而知，后来我们就换了种思路，drop table = truncate table + drop table，结果神奇的是：慢查询不见了

案例二

环境

MySQL5.6.27
InnoDB
128G memory
innodb_adaptive_hash_index=1
centos6.6

问题复现

业务描述：由于磁盘空间不断上涨，DBA定期会去删除业务过期不用的表，比如自动删除1年之前的数据

由于之前5.1的经验，我们的脚本同样延续之前的做法，drop table = truncate table + drop table

结果在 truncate table 的时候，慢查询出现了，症状和5.1之前的drop table 一模一样

后来，我们还是换了种思路，将truncate table 直接换成 drop table ，神奇的是：慢查询又不见了

案例三

最后一个血腥的案例

环境

MySQL5.6.27
MySQL5.7.21
InnoDB
128G memory
innodb_adaptive_hash_index=1
centos6.6


M(5.6)
|
-> S1(5.6)
-> S2(5.6)
-> S3(5.7)

问题

* 业务描述：
一主三从，从库有2个版本，2个是5.6.27，1个是5.7.21
在Master 执行truncate table xx ， 当master执行完毕后，过了一会，3个slave全部开始延迟，延迟一段时间后，其中两个crash，自动重启。
这是多么一个神奇的事情


* 错误日志

mysql tables in use 1, locked 1
0 lock struct(s), heap size 1136, 0 row lock(s)
MySQL thread id 24952, OS thread handle 140332353423104, query id 70663631 System lock
truncate table broker_lifecycle_status

SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 88024435
--Thread 140329857865472 has waited at buf0flu.cc line 1209 for 0.00 seconds the semaphore:
SX-lock on RW-latch at 0x7faf43804cb8 created in file buf0buf.cc line 1460
a writer (thread id 140332353423104) has reserved it in mode  SX
number of readers 0, waiters flag 1, lock_word: 10000000
Last time read locked in file row0sel.cc line 3751
Last time write locked in file /export/home/pb2/build/sb_0-26514852-1514433850.9/mysql-5.7.21/storage/innobase/fsp/fsp0fsp.cc line 656
--Thread 140332352624384 has waited at ha_innodb.cc line 5582 for 240.00 seconds the semaphore:
Mutex at 0x8af31638, Mutex DICT_SYS created dict0dict.cc:1172, lock var 1

--Thread 140332512077568 has waited at srv0srv.cc line 1982 for 270.00 seconds the semaphore:
X-lock on RW-latch at 0x8af31598 created in file dict0dict.cc line 1183
a writer (thread id 140332353423104) has reserved it in mode  exclusive
number of readers 0, waiters flag 1, lock_word: 0
Last time read locked in file row0purge.cc line 862
Last time write locked in file /export/home/pb2/build/sb_0-26514852-1514433850.9/mysql-5.7.21/storage/innobase/row/row0trunc.cc line 1835
OS WAIT ARRAY INFO: signal count 232035265
RW-shared spins 0, rounds 279022143, OS waits 12780361
RW-excl spins 0, rounds 1878235251, OS waits 18129012
RW-sx spins 129404128, rounds 2720581091, OS waits 36295215
Spin rounds per wait: 279022143.00 RW-shared, 1878235251.00 RW-excl, 21.02 RW-sx


InnoDB: ###### Diagnostic info printed to the standard error stream
2018-04-17T11:59:30.590002+08:00 0 [ERROR] [FATAL] InnoDB: Semaphore wait has lasted > 600 seconds. We intentionally crash the server because it appears to be hung.
2018-04-17 11:59:30 0x7fa1b6c88700  InnoDB: Assertion failure in thread 140332533057280 in file ut0ut.cc line 942
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.


* strace 结果

91079 restart_syscall(<... resuming interrupted call ...> <unfinished ...>
91071 futex(0x8af31764, FUTEX_WAIT_PRIVATE, 3508, NULL <unfinished ...>
90833 futex(0x1e73a84, FUTEX_WAIT_PRIVATE, 295237, NULL <unfinished ...>
44265 restart_syscall(<... resuming interrupted call ...> <unfinished ...>
14421 futex(0x1e73a84, FUTEX_WAIT_PRIVATE, 295236, NULL <unfinished ...>
25300 restart_syscall(<... resuming interrupted call ...> <unfinished ...>
192855 futex(0x8af31764, FUTEX_WAIT_PRIVATE, 3507, NULL <unfinished ...>
60765 restart_syscall(<... resuming interrupted call ...> <unfinished ...>
51043 restart_syscall(<... resuming interrupted call ...> <unfinished ...>
36890 futex(0x1e41a44, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>
36889 rt_sigtimedwait([HUP QUIT TERM], NULL, NULL, 8 <unfinished ...>
36886 futex(0x3b99394, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>
36885 restart_syscall(<... resuming interrupted call ...> <unfinished ...>
36884 restart_syscall(<... resuming interrupted call ...> <unfinished ...>
36883 futex(0x3b99274, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>
36882 futex(0x3b990c4, FUTEX_WAIT_PRIVATE, 170512183, NULL <unfinished ...>
36881 futex(0x3b99034, FUTEX_WAIT_PRIVATE, 230090551, NULL <unfinished ...>
36880 futex(0x3b98fa4, FUTEX_WAIT_PRIVATE, 302280849, NULL <unfinished ...>
36879 futex(0x3b98f14, FUTEX_WAIT_PRIVATE, 72214487, NULL <unfinished ...>
36878 futex(0x8af0b5c4, FUTEX_WAIT_PRIVATE, 23749, NULL <unfinished ...>
36877 restart_syscall(<... resuming interrupted call ...> <unfinished ...>
36876 restart_syscall(<... resuming interrupted call ...> <unfinished ...>
36875 restart_syscall(<... resuming interrupted call ...> <unfinished ...>
36873 futex(0x8af01f64, FUTEX_WAIT_PRIVATE, 12270083, NULL <unfinished ...>
36872 futex(0x8af01f64, FUTEX_WAIT_PRIVATE, 12270082, NULL <unfinished ...>
36871 futex(0x8af01f64, FUTEX_WAIT_PRIVATE, 12270080, NULL <unfinished ...>
36870 futex(0x8af01f64, FUTEX_WAIT_PRIVATE, 12270081, NULL <unfinished ...>
36869 futex(0x8af01f64, FUTEX_WAIT_PRIVATE, 12270077, NULL <unfinished ...>
36868 futex(0x8af01f64, FUTEX_WAIT_PRIVATE, 12270079, NULL <unfinished ...>
36867 futex(0x8af01f64, FUTEX_WAIT_PRIVATE, 12270078, NULL <unfinished ...>
36866 futex(0x33dd8ae4, FUTEX_WAIT_PRIVATE, 6225, NULL <unfinished ...>
36865 io_getevents(140422104584192, 1, 256,  <unfinished ...>
36864 io_getevents(140422104596480, 1, 256,  <unfinished ...>
36863 io_getevents(140422104608768, 1, 256,  <unfinished ...>
36862 io_getevents(140422104621056, 1, 256,  <unfinished ...>
36861 io_getevents(140422104633344, 1, 256,  <unfinished ...>
36860 io_getevents(140422104645632, 1, 256,  <unfinished ...>

分析

1. 很明显，执行truncate再slave被卡住了

2. 3台slave都报同样的错误，卡了很长时间，ACTIVE 252 sec truncating table 一直增加。

3. 其中两台slave（1台5.6，1台5.7）Semaphore wait has lasted > 600 seconds之后重启

4. 还有一台slave 当Semaphore快要到600s的时候，结束了，所以怀疑是MySQL bug（MySQL8.0修复），truncate的时候purge线程处理太慢导致

5. 网上也有类似故障：

    Truncation was executed on master which executed it as expected without failure but immediate slave crashed afterwards, heavy read activity is mostly happening on slave

https://bugs.mysql.com/bug.php?id=68184
https://bugs.launchpad.net/percona-server/+bug/1633869

6. 还有说innodb_purge_threads=4导致的，但是我们的MySQL5.6设置的就是1，所以这个不成立。
https://blog.pythian.com/mysql-crashes-ddl-statement-lesson-purge-threads/

总结

好了，这几个问题彻底弄晕我们，不同版本之间的truncate 和 drop 会给用户带来不同的体验

后来，仔细研究了下官方文档，其实有一些猫腻在里面，有兴趣的同学可以深究下

https://dev.mysql.com/doc/refman/5.7/en/truncate-table.html

On a system with a large InnoDB buffer pool and innodb_adaptive_hash_index enabled,
TRUNCATE TABLE operations may cause a temporary drop in system performance due to an LRU scan that occurs when removing an InnoDB table's adaptive hash index entries.
The problem was addressed for DROP TABLE in MySQL 5.5.23 (Bug #13704145, Bug #64284) but remains a known issue for TRUNCATE TABLE (Bug #68184).

最佳实践

MySQL5.6版本以下：使用truncate table + drop table 替代 drop table
MySQL5.6版本+ ：直接使用drop table

truncate table 和 drop table 的一点坑

案例一

案例二

案例三

总结

最佳实践

热门文章

最新文章

相关电子书

热门

活动广场

任务中心

开发者评测

高校计划

乘风者计划

训练营

阿里云MVP

话题

直播

下载

镜像站

技术资料

插件

truncate table 和 drop table 的一点坑

案例一

案例二

案例三

总结

最佳实践

热门文章

最新文章

相关电子书