MySQL在8.0.16版本之前,对 IN和 EXISTS处理是不一样的,EXISTS只能采用子查询方式,所以执行计划中能看到DEPENDENT SUBQUERY。但可以把IN优化成semi join,优化器开关(optimizer_switch)中有几个相关的开关
loosescan=on firstmatch=on duplicateweedout=on materialization=on
MySQL从8.0.16开始,增加对EXISTS的优化,和IN一样也支持自动转换成semi join。
从8.0.18开始,又增加了对NOT EXISTS转变成anti join的优化。
我们来看下同一个SQL在5.7版本和8.0.18版本中的不同表现
1. 测试环境
两个测试表的DDL
# t1表共有30万条记录 [root@yejr.run]> show create table t1\G *************************** 1. row *************************** Table: t1 Create Table: CREATE TABLE t1 ( id int unsigned NOT NULL AUTO_INCREMENT, seq int unsigned NOT NULL DEFAULT '0', name varchar(20) NOT NULL DEFAULT '', x int DEFAULT NULL, PRIMARY KEY (id), KEY k1 (seq) ) ENGINE=InnoDB AUTO_INCREMENT=300001; # t2表共有19条记录 [root@yejr.run]> show create table t2\G *************************** 1. row *************************** Table: t2 Create Table: CREATE TABLE t2 ( id int NOT NULL, nu int DEFAULT NULL, name varchar(20) NOT NULL DEFAULT '', PRIMARY KEY (id) ) ENGINE=InnoDB;
数据随机填充。
2. MySQL 5.7下的执行计划及成本
5.7还不支持anti-join优化,只能是用子查询。
[root@yejr.run]> explain select * from t1 where not exists ( select 1 from t2 where t1.x = t2.nu); +----+--------------------+-------+------+------+---------+------+--------+----------+-------------+ | id | select_type | table | type | key | key_len | ref | rows | filtered | Extra | +----+--------------------+-------+------+------+---------+------+--------+----------+-------------+ | 1 | PRIMARY | t1 | ALL | NULL | NULL | NULL | 376310 | 100.00 | Using where | | 2 | DEPENDENT SUBQUERY | t2 | ALL | NULL | NULL | NULL | 19 | 10.00 | Using where | +----+--------------------+-------+------+------+---------+------+--------+----------+-------------+ [root@yejr.run]> show warnings; +-------+------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Level | Code | Message | +-------+------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Note | 1276 | Field or reference 'yejr.t1.x' of SELECT #2 was resolved in SELECT #1 | | Note | 1003 | /* select#1 */ select yejr.t1.id AS id,yejr.t1.seq AS seq,yejr.t1.name AS name,yejr.t1.x AS x from yejr.t1 where exists(/* select#2 */ select 1 from yejr.t2 where (yejr.t1.x = yejr.t2.nu)) is false | +-------+------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ #该SQL耗时 1.34秒 [root@yejr.run]> select * from t1 where not exists ( select 1 from t2 where t1.x = t2.nu); 299994 rows in set (1.34 sec) #从统计结果中的 Handler_read_rnd_next 值来看,应该是做了一次笛卡尔积扫描 [root@yejr.run]> show status like 'handl%read%'; +-----------------------+---------+ | Variable_name | Value | +-----------------------+---------+ | Handler_read_first | 300001 | | Handler_read_key | 300001 | ... | Handler_read_rnd_next | 6299939 | +-----------------------+---------+
这里要纠个偏,不少人说MySQL对子查询支持不好,实际上是因为优化器无法将这个子查询改写优化成JOIN查询。
像下面这个子查询SQL,就可以被自动优化了
[root@yejr.run]> explain select * from t1 where exists ( select nu from t2 where t1.seq=t2.nu and nu >= 10); +----+-------------+-------+-------+---------------+------+---------+------------+------+----------+----------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+-------+---------------+------+---------+------------+------+----------+----------------------------------------+ | 1 | SIMPLE | t2 | range | nu | nu | 4 | NULL | 17 | 111.76 | Using where; Using index; LooseScan | | 1 | SIMPLE | t1 | ref | k1 | k1 | 4 | yejr.t2.nu | 1 | 100.00 | Using join buffer (Batched Key Access) | +----+-------------+-------+-------+---------------+------+---------+------------+------+----------+----------------------------------------+ 2 rows in set, 2 warnings (0.00 sec) [root@yejr.run]> show warnings; | Level | Code | Message | | Note | 1276 | Field or reference 'yejr.t1.seq' of SELECT #2 was resolved in SELECT #1 | | Note | 1003 | /* select#1 */ select `yejr`.`t1`.`id` AS `id`,`yejr`.`t1`.`seq` AS `seq`,`yejr`.`t1`.`name` AS `name`,`yejr`.`t1`.`x` AS `x` from `yejr`.`t1` semi join (`yejr`.`t2`) where ((`yejr`.`t1`.`seq` = `yejr`.`t2`.`nu`) and (`yejr`.`t2`.`nu` >= 10)) | +-------+------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
3. MySQL 8.0.19下的执行计划及成本
这时支持对anti-join的优化,优化器会对SQL进行改写优化。
[root@yejr.run]> explain select * from t1 where not exists ( select 1 from t2 where t1.x = t2.nu); +----+--------------+-------------+--------+---------------------+---------------------+---------+-----------------+--------+----------+-------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+--------------+-------------+--------+---------------------+---------------------+---------+-----------------+--------+----------+-------------------------+ | 1 | SIMPLE | t1 | ALL | NULL | NULL | NULL | NULL | 376310 | 100.00 | NULL | | 1 | SIMPLE | <subquery2> | eq_ref | <auto_distinct_key> | <auto_distinct_key> | 5 | zhishutang.t1.x | 1 | 100.00 | Using where; Not exists | | 2 | MATERIALIZED | t2 | ALL | NULL | NULL | NULL | NULL | 19 | 100.00 | NULL | +----+--------------+-------------+--------+---------------------+---------------------+---------+-----------------+--------+----------+-------------------------+ 3 rows in set, 2 warnings (0.00 sec) #直接优化成anti join了 [root@yejr.run]> show warnings; +-------+------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Level | Code | Message | +-------+------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Note | 1276 | Field or reference 'yejr.t1.x' of SELECT #2 was resolved in SELECT #1 | | Note | 1003 | /* select#1 */ select yejr.t1.id AS id,yejr.t1.seq AS seq,yejr.t1.name AS name,yejr.t1.x AS x from yejr.t1 anti join (yejr.t2) on((<subquery2>.nu = yejr.t1.x)) where true | +-------+------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ # explain analyze结果 [root@yejr.run]> explain analyze select * from t1 where not exists ( select 1 from t2 where t1.x = t2.nu); | -> Nested loop anti-join (actual time=0.058..248.704 rows=299994 loops=1) -> Table scan on t1 (cost=37975.50 rows=376310) (actual time=0.035..82.504 rows=300000 loops=1) -> Single-row index lookup on <subquery2> using <auto_distinct_key> (nu=t1.x) (actual time=0.000..0.000 rows=0 loops=300000) -> Materialize with deduplication (actual time=0.000..0.000 rows=0 loops=300000) -> Filter: (t2.nu is not null) (cost=2.15 rows=19) (actual time=0.006..0.011 rows=19 loops=1) -> Table scan on t2 (cost=2.15 rows=19) (actual time=0.005..0.009 rows=19 loops=1) #该SQL耗时 0.15 秒 [root@yejr.run]> select * from t1 where not exists ( select 1 from t2 where t1.x = t2.nu); 299994 rows in set (0.15 sec) [root@yejr.run]> show status like 'handl%read%'; +-----------------------+--------+ | Variable_name | Value | +-----------------------+--------+ | Handler_read_first | 2 | | Handler_read_key | 200574 | ... | Handler_read_rnd_next | 300021 |
相对于5.7的性能有了很大提升。
4. 一个小小的脑洞
测试过程中我突发奇想,在MySQL 8.0.19版本下,如果把oiptimizer_switch里的semijoin关闭后,应该也相当于关闭anti join优化。
用EXPLAIN ANALYZE再查看这个SQL的执行计划及耗时,结果是这样的
| -> Filter: exists(select #2) (cost=37975.50 rows=376310) (actual time=54.977..1976.963 rows=6 loops=1) -> Table scan on t1 (cost=37975.50 rows=376310) (actual time=0.054..70.366 rows=300000 loops=1) -> Select #2 (subquery in condition; dependent) -> Limit: 1 row(s) (actual time=0.006..0.006 rows=0 loops=300000) -> Filter: (t1.x = t2.nu) (cost=0.44 rows=2) (actual time=0.006..0.006 rows=0 loops=300000) -> Table scan on t2 (cost=0.44 rows=19) (actual time=0.002..0.004 rows=19 loops=300000)
看起来的确如此。在8.0.19中,也修复了松华老师之前在8.0.18版本遇到的小bug。
最后提醒大家不要轻易关闭 semijoin 开关,以防连 anti-join 优化也跟着消失,这就不划算了。
MySQL优化器的确在不断进步中,欣慰,也建议大家适时更新到高版本。
参考:
- Antijoin in MySQL 8(https://mysqlserverteam.com/antijoin-in-mysql-8/ )
- MySQL 8 Anti-Join 几点总结
全文完。