truncate table 和 drop table 的一点坑

本文涉及的产品
RDS MySQL Serverless 基础系列,0.5-2RCU 50GB
云数据库 RDS MySQL,集群系列 2核4GB
推荐场景:
搭建个人博客
云数据库 RDS MySQL,高可用系列 2核4GB
简介: 网上一搜这个关键字,得到的结果大多都是delete、truncate、drop之间的区别 但是今天我们要讲的内容,是我们在生产环境中遇到的真实案例 互联网公司一般对大表,都会采用分区表或者物理分表吧,这里主要描述的是分表的删除过程中的问题 案例一 环境MySQL5.

网上一搜这个关键字,得到的结果大多都是delete、truncate、drop之间的区别

但是今天我们要讲的内容,是我们在生产环境中遇到的真实案例

互联网公司一般对大表,都会采用分区表或者物理分表吧,这里主要描述的是分表的删除过程中的问题

案例一

  • 环境
MySQL5.1.54
InnoDB
128G memory
innodb_adaptive_hash_index=1
centos6.6
  • 问题复现

业务描述:由于磁盘空间不断上涨,DBA定期会去删除业务过期不用的表,比如自动删除1年之前的数据

当执行到 drop table的时候,实例的慢查询从 5/s 飙升到 3000/s, 10ms 飙升到 200ms (表大小在10G左右)

slow sql的种类非常多,并不是被删除表的slow ,而是其他表的查询变慢。

别问我怎么知道的,这是我们的slow 监控系统发现的

那么问题来了,Drop table 为啥会出现这样的问题呢?

当然,当时还不得而知,后来我们就换了种思路,drop table = truncate table + drop table,结果神奇的是:慢查询不见了

案例二

  • 环境
MySQL5.6.27
InnoDB
128G memory
innodb_adaptive_hash_index=1
centos6.6
  • 问题复现

业务描述:由于磁盘空间不断上涨,DBA定期会去删除业务过期不用的表,比如自动删除1年之前的数据

由于之前5.1的经验,我们的脚本同样延续之前的做法,drop table = truncate table + drop table

结果在 truncate table 的时候,慢查询出现了,症状和5.1之前的drop table 一模一样

后来,我们还是换了种思路,将truncate table 直接换成 drop table , 神奇的是:慢查询又不见了

案例三

最后一个血腥的案例

  • 环境
MySQL5.6.27
MySQL5.7.21
InnoDB
128G memory
innodb_adaptive_hash_index=1
centos6.6


M(5.6)
|
-> S1(5.6)
-> S2(5.6)
-> S3(5.7)
  • 问题
* 业务描述:
一主三从,从库有2个版本,2个是5.6.27,1个是5.7.21
在Master 执行truncate table xx , 当master执行完毕后,过了一会,3个slave全部开始延迟,延迟一段时间后,其中两个crash,自动重启。
这是多么一个神奇的事情


* 错误日志

mysql tables in use 1, locked 1
0 lock struct(s), heap size 1136, 0 row lock(s)
MySQL thread id 24952, OS thread handle 140332353423104, query id 70663631 System lock
truncate table broker_lifecycle_status

SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 88024435
--Thread 140329857865472 has waited at buf0flu.cc line 1209 for 0.00 seconds the semaphore:
SX-lock on RW-latch at 0x7faf43804cb8 created in file buf0buf.cc line 1460
a writer (thread id 140332353423104) has reserved it in mode  SX
number of readers 0, waiters flag 1, lock_word: 10000000
Last time read locked in file row0sel.cc line 3751
Last time write locked in file /export/home/pb2/build/sb_0-26514852-1514433850.9/mysql-5.7.21/storage/innobase/fsp/fsp0fsp.cc line 656
--Thread 140332352624384 has waited at ha_innodb.cc line 5582 for 240.00 seconds the semaphore:
Mutex at 0x8af31638, Mutex DICT_SYS created dict0dict.cc:1172, lock var 1

--Thread 140332512077568 has waited at srv0srv.cc line 1982 for 270.00 seconds the semaphore:
X-lock on RW-latch at 0x8af31598 created in file dict0dict.cc line 1183
a writer (thread id 140332353423104) has reserved it in mode  exclusive
number of readers 0, waiters flag 1, lock_word: 0
Last time read locked in file row0purge.cc line 862
Last time write locked in file /export/home/pb2/build/sb_0-26514852-1514433850.9/mysql-5.7.21/storage/innobase/row/row0trunc.cc line 1835
OS WAIT ARRAY INFO: signal count 232035265
RW-shared spins 0, rounds 279022143, OS waits 12780361
RW-excl spins 0, rounds 1878235251, OS waits 18129012
RW-sx spins 129404128, rounds 2720581091, OS waits 36295215
Spin rounds per wait: 279022143.00 RW-shared, 1878235251.00 RW-excl, 21.02 RW-sx


InnoDB: ###### Diagnostic info printed to the standard error stream
2018-04-17T11:59:30.590002+08:00 0 [ERROR] [FATAL] InnoDB: Semaphore wait has lasted > 600 seconds. We intentionally crash the server because it appears to be hung.
2018-04-17 11:59:30 0x7fa1b6c88700  InnoDB: Assertion failure in thread 140332533057280 in file ut0ut.cc line 942
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.


* strace 结果

91079 restart_syscall(<... resuming interrupted call ...> <unfinished ...>
91071 futex(0x8af31764, FUTEX_WAIT_PRIVATE, 3508, NULL <unfinished ...>
90833 futex(0x1e73a84, FUTEX_WAIT_PRIVATE, 295237, NULL <unfinished ...>
44265 restart_syscall(<... resuming interrupted call ...> <unfinished ...>
14421 futex(0x1e73a84, FUTEX_WAIT_PRIVATE, 295236, NULL <unfinished ...>
25300 restart_syscall(<... resuming interrupted call ...> <unfinished ...>
192855 futex(0x8af31764, FUTEX_WAIT_PRIVATE, 3507, NULL <unfinished ...>
60765 restart_syscall(<... resuming interrupted call ...> <unfinished ...>
51043 restart_syscall(<... resuming interrupted call ...> <unfinished ...>
36890 futex(0x1e41a44, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>
36889 rt_sigtimedwait([HUP QUIT TERM], NULL, NULL, 8 <unfinished ...>
36886 futex(0x3b99394, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>
36885 restart_syscall(<... resuming interrupted call ...> <unfinished ...>
36884 restart_syscall(<... resuming interrupted call ...> <unfinished ...>
36883 futex(0x3b99274, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>
36882 futex(0x3b990c4, FUTEX_WAIT_PRIVATE, 170512183, NULL <unfinished ...>
36881 futex(0x3b99034, FUTEX_WAIT_PRIVATE, 230090551, NULL <unfinished ...>
36880 futex(0x3b98fa4, FUTEX_WAIT_PRIVATE, 302280849, NULL <unfinished ...>
36879 futex(0x3b98f14, FUTEX_WAIT_PRIVATE, 72214487, NULL <unfinished ...>
36878 futex(0x8af0b5c4, FUTEX_WAIT_PRIVATE, 23749, NULL <unfinished ...>
36877 restart_syscall(<... resuming interrupted call ...> <unfinished ...>
36876 restart_syscall(<... resuming interrupted call ...> <unfinished ...>
36875 restart_syscall(<... resuming interrupted call ...> <unfinished ...>
36873 futex(0x8af01f64, FUTEX_WAIT_PRIVATE, 12270083, NULL <unfinished ...>
36872 futex(0x8af01f64, FUTEX_WAIT_PRIVATE, 12270082, NULL <unfinished ...>
36871 futex(0x8af01f64, FUTEX_WAIT_PRIVATE, 12270080, NULL <unfinished ...>
36870 futex(0x8af01f64, FUTEX_WAIT_PRIVATE, 12270081, NULL <unfinished ...>
36869 futex(0x8af01f64, FUTEX_WAIT_PRIVATE, 12270077, NULL <unfinished ...>
36868 futex(0x8af01f64, FUTEX_WAIT_PRIVATE, 12270079, NULL <unfinished ...>
36867 futex(0x8af01f64, FUTEX_WAIT_PRIVATE, 12270078, NULL <unfinished ...>
36866 futex(0x33dd8ae4, FUTEX_WAIT_PRIVATE, 6225, NULL <unfinished ...>
36865 io_getevents(140422104584192, 1, 256,  <unfinished ...>
36864 io_getevents(140422104596480, 1, 256,  <unfinished ...>
36863 io_getevents(140422104608768, 1, 256,  <unfinished ...>
36862 io_getevents(140422104621056, 1, 256,  <unfinished ...>
36861 io_getevents(140422104633344, 1, 256,  <unfinished ...>
36860 io_getevents(140422104645632, 1, 256,  <unfinished ...>
  • 分析
1. 很明显,执行truncate再slave被卡住了

2. 3台slave都报同样的错误,卡了很长时间,ACTIVE 252 sec truncating table 一直增加。

3. 其中两台slave(1台5.6,1台5.7)Semaphore wait has lasted > 600 seconds之后重启

4. 还有一台slave 当Semaphore快要到600s的时候,结束了,所以怀疑是MySQL bug(MySQL8.0修复),truncate的时候purge线程处理太慢导致

5. 网上也有类似故障:

    Truncation was executed on master which executed it as expected without failure but immediate slave crashed afterwards, heavy read activity is mostly happening on slave

https://bugs.mysql.com/bug.php?id=68184
https://bugs.launchpad.net/percona-server/+bug/1633869

6. 还有说innodb_purge_threads=4导致的,但是我们的MySQL5.6设置的就是1,所以这个不成立。
https://blog.pythian.com/mysql-crashes-ddl-statement-lesson-purge-threads/


总结

好了,这几个问题彻底弄晕我们,不同版本之间的truncate 和 drop 会给用户带来不同的体验

后来,仔细研究了下官方文档,其实有一些猫腻在里面,有兴趣的同学可以深究下

https://dev.mysql.com/doc/refman/5.7/en/truncate-table.html

On a system with a large InnoDB buffer pool and innodb_adaptive_hash_index enabled,
TRUNCATE TABLE operations may cause a temporary drop in system performance due to an LRU scan that occurs when removing an InnoDB table's adaptive hash index entries.
The problem was addressed for DROP TABLE in MySQL 5.5.23 (Bug #13704145, Bug #64284) but remains a known issue for TRUNCATE TABLE (Bug #68184).

最佳实践

  • MySQL5.6版本以下:使用truncate table + drop table 替代 drop table
  • MySQL5.6版本+ : 直接使用drop table
相关实践学习
如何在云端创建MySQL数据库
开始实验后,系统会自动创建一台自建MySQL的 源数据库 ECS 实例和一台 目标数据库 RDS。
全面了解阿里云能为你做什么
阿里云在全球各地部署高效节能的绿色数据中心,利用清洁计算为万物互联的新世界提供源源不断的能源动力,目前开服的区域包括中国(华北、华东、华南、香港)、新加坡、美国(美东、美西)、欧洲、中东、澳大利亚、日本。目前阿里云的产品涵盖弹性计算、数据库、存储与CDN、分析与搜索、云通信、网络、管理与监控、应用服务、互联网中间件、移动服务、视频服务等。通过本课程,来了解阿里云能够为你的业务带来哪些帮助 &nbsp; &nbsp; 相关的阿里云产品:云服务器ECS 云服务器 ECS(Elastic Compute Service)是一种弹性可伸缩的计算服务,助您降低 IT 成本,提升运维效率,使您更专注于核心业务创新。产品详情: https://www.aliyun.com/product/ecs
目录
相关文章
|
7月前
|
算法 关系型数据库 MySQL
drop、truncate 和 delete 的区别
drop、truncate 和 delete 的区别
|
2月前
|
前端开发
`<table>`
【10月更文挑战第17天】
44 1
|
5月前
|
SQL Oracle 关系型数据库
ALTER TABLE
【7月更文挑战第19天】ALTER TABLE 。
63 5
|
5月前
|
存储
DROP TABLE 语句
【7月更文挑战第20天】DROP TABLE 语句。
84 2
|
JavaScript 前端开发 数据可视化
vxe-table
vxe-table
757 0
vxe-table
瞬表——Ephemeron Table
瞬表——Ephemeron Table
170 0
Stones on the Table
Stones on the Table
136 0
Stones on the Table
ALTER TABLE
本文主要介绍如何对表相关的Sequence类型进行修改。
|
Web App开发 关系型数据库
Truncate/Delete/Drop table的特点和区别
之前一直对Truncate/Delete/Drop认识的不是很清晰,所以特意的翻了一下MySQL5.7 Reference Manual,准备系统的了解一下,这里是一些翻译,外加一点自己的认知。
1242 0
|
关系型数据库 MySQL