MySQL 8.0中对EXISTS、NOT EXISTS的持续优化-阿里云开发者社区

MySQL 8.0中对EXISTS、NOT EXISTS的持续优化

2021-11-18 638

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

本文涉及的产品

RDS MySQL Serverless 基础系列，0.5-2RCU 50GB

云数据库 RDS MySQL，集群系列 2核4GB

云数据库 RDS PostgreSQL，集群系列 2核4GB

简介： MySQL 8.0中对EXISTS、NOT EXISTS的持续优化

MySQL在8.0.16版本之前，对 IN和 EXISTS处理是不一样的，EXISTS只能采用子查询方式，所以执行计划中能看到DEPENDENT SUBQUERY。但可以把IN优化成semi join，优化器开关（optimizer_switch）中有几个相关的开关

loosescan=on
firstmatch=on
duplicateweedout=on
materialization=on

MySQL从8.0.16开始，增加对EXISTS的优化，和IN一样也支持自动转换成semi join。

从8.0.18开始，又增加了对NOT EXISTS转变成anti join的优化。

我们来看下同一个SQL在5.7版本和8.0.18版本中的不同表现

1. 测试环境

两个测试表的DDL

# t1表共有30万条记录
[root@yejr.run]> show create table t1\G
*************************** 1. row ***************************
       Table: t1
Create Table: CREATE TABLE t1 (
  id int unsigned NOT NULL AUTO_INCREMENT,
  seq int unsigned NOT NULL DEFAULT '0',
  name varchar(20) NOT NULL DEFAULT '',
  x int DEFAULT NULL,
  PRIMARY KEY (id),
  KEY k1 (seq)
) ENGINE=InnoDB AUTO_INCREMENT=300001;
# t2表共有19条记录
[root@yejr.run]> show create table t2\G
*************************** 1. row ***************************
       Table: t2
Create Table: CREATE TABLE t2 (
  id int NOT NULL,
  nu int DEFAULT NULL,
  name varchar(20) NOT NULL DEFAULT '',
  PRIMARY KEY (id)
) ENGINE=InnoDB;

数据随机填充。

2. MySQL 5.7下的执行计划及成本

5.7还不支持anti-join优化，只能是用子查询。

[root@yejr.run]> explain select * from t1 where not exists ( select 1 from t2 where t1.x = t2.nu);
+----+--------------------+-------+------+------+---------+------+--------+----------+-------------+
| id | select_type        | table | type | key  | key_len | ref  | rows   | filtered | Extra       |
+----+--------------------+-------+------+------+---------+------+--------+----------+-------------+
|  1 | PRIMARY            | t1    | ALL  | NULL | NULL    | NULL | 376310 |   100.00 | Using where |
|  2 | DEPENDENT SUBQUERY | t2    | ALL  | NULL | NULL    | NULL |     19 |    10.00 | Using where |
+----+--------------------+-------+------+------+---------+------+--------+----------+-------------+
[root@yejr.run]> show warnings;
+-------+------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message                                                                                                                                                                                                                                                                                                   |
+-------+------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Note  | 1276 | Field or reference 'yejr.t1.x' of SELECT #2 was resolved in SELECT #1                                                                                                                                 |
| Note  | 1003 | /* select#1 */ select yejr.t1.id AS id,yejr.t1.seq AS seq,yejr.t1.name AS name,yejr.t1.x AS x from yejr.t1 where exists(/* select#2 */ select 1 from yejr.t2 where (yejr.t1.x = yejr.t2.nu)) is false |
+-------+------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
#该SQL耗时 1.34秒
[root@yejr.run]> select * from t1 where not exists ( select 1 from t2 where t1.x = t2.nu);
299994 rows in set (1.34 sec)
#从统计结果中的 Handler_read_rnd_next 值来看，应该是做了一次笛卡尔积扫描
[root@yejr.run]> show status like 'handl%read%';
+-----------------------+---------+
| Variable_name         | Value   |
+-----------------------+---------+
| Handler_read_first    | 300001  |
| Handler_read_key      | 300001  |
...
| Handler_read_rnd_next | 6299939 |
+-----------------------+---------+

这里要纠个偏，不少人说MySQL对子查询支持不好，实际上是因为优化器无法将这个子查询改写优化成JOIN查询。

像下面这个子查询SQL，就可以被自动优化了

[root@yejr.run]> explain select * from t1 where exists ( select nu from t2 where t1.seq=t2.nu and nu >= 10);
+----+-------------+-------+-------+---------------+------+---------+------------+------+----------+----------------------------------------+
| id | select_type | table | type  | possible_keys | key  | key_len | ref        | rows | filtered | Extra                                  |
+----+-------------+-------+-------+---------------+------+---------+------------+------+----------+----------------------------------------+
|  1 | SIMPLE      | t2    | range | nu            | nu   | 4       | NULL       |   17 |   111.76 | Using where; Using index; LooseScan    |
|  1 | SIMPLE      | t1    | ref   | k1            | k1   | 4       | yejr.t2.nu |    1 |   100.00 | Using join buffer (Batched Key Access) |
+----+-------------+-------+-------+---------------+------+---------+------------+------+----------+----------------------------------------+
2 rows in set, 2 warnings (0.00 sec)
[root@yejr.run]> show warnings;
| Level | Code | Message                                                                                                                                                                                                                                                                                                  |
| Note  | 1276 | Field or reference 'yejr.t1.seq' of SELECT #2 was resolved in SELECT #1                                                                                                                                                                            |
| Note  | 1003 | /* select#1 */ select `yejr`.`t1`.`id` AS `id`,`yejr`.`t1`.`seq` AS `seq`,`yejr`.`t1`.`name` AS `name`,`yejr`.`t1`.`x` AS `x` from `yejr`.`t1` semi join (`yejr`.`t2`) where ((`yejr`.`t1`.`seq` = `yejr`.`t2`.`nu`) and (`yejr`.`t2`.`nu` >= 10)) |
+-------+------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

3. MySQL 8.0.19下的执行计划及成本

这时支持对anti-join的优化，优化器会对SQL进行改写优化。

[root@yejr.run]> explain select * from t1 where not exists ( select 1 from t2 where t1.x = t2.nu);
+----+--------------+-------------+--------+---------------------+---------------------+---------+-----------------+--------+----------+-------------------------+
| id | select_type  | table       | type   | possible_keys       | key                 | key_len | ref             | rows   | filtered | Extra                   |
+----+--------------+-------------+--------+---------------------+---------------------+---------+-----------------+--------+----------+-------------------------+
|  1 | SIMPLE       | t1          | ALL    | NULL                | NULL                | NULL    | NULL            | 376310 |   100.00 | NULL                    |
|  1 | SIMPLE       | <subquery2> | eq_ref | <auto_distinct_key> | <auto_distinct_key> | 5       | zhishutang.t1.x |      1 |   100.00 | Using where; Not exists |
|  2 | MATERIALIZED | t2          | ALL    | NULL                | NULL                | NULL    | NULL            |     19 |   100.00 | NULL                    |
+----+--------------+-------------+--------+---------------------+---------------------+---------+-----------------+--------+----------+-------------------------+
3 rows in set, 2 warnings (0.00 sec)
#直接优化成anti join了
[root@yejr.run]> show warnings;
+-------+------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message                                                                                                                                                                    |
+-------+------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Note  | 1276 | Field or reference 'yejr.t1.x' of SELECT #2 was resolved in SELECT #1                                                                                                      |
| Note  | 1003 | /* select#1 */ select yejr.t1.id AS id,yejr.t1.seq AS seq,yejr.t1.name AS name,yejr.t1.x AS x from yejr.t1 anti join (yejr.t2) on((<subquery2>.nu = yejr.t1.x)) where true |
+-------+------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
# explain analyze结果
[root@yejr.run]> explain analyze select * from t1 where not exists ( select 1 from t2 where t1.x = t2.nu);
| -> Nested loop anti-join  (actual time=0.058..248.704 rows=299994 loops=1)
    -> Table scan on t1  (cost=37975.50 rows=376310) (actual time=0.035..82.504 rows=300000 loops=1)
    -> Single-row index lookup on <subquery2> using <auto_distinct_key> (nu=t1.x)  (actual time=0.000..0.000 rows=0 loops=300000)
        -> Materialize with deduplication  (actual time=0.000..0.000 rows=0 loops=300000)
            -> Filter: (t2.nu is not null)  (cost=2.15 rows=19) (actual time=0.006..0.011 rows=19 loops=1)
                -> Table scan on t2  (cost=2.15 rows=19) (actual time=0.005..0.009 rows=19 loops=1)
#该SQL耗时 0.15 秒
[root@yejr.run]> select * from t1 where not exists ( select 1 from t2 where t1.x = t2.nu);
299994 rows in set (0.15 sec)
[root@yejr.run]> show status like 'handl%read%';
+-----------------------+--------+
| Variable_name         | Value  |
+-----------------------+--------+
| Handler_read_first    | 2      |
| Handler_read_key      | 200574 |
...
| Handler_read_rnd_next | 300021 |

相对于5.7的性能有了很大提升。

4. 一个小小的脑洞

测试过程中我突发奇想，在MySQL 8.0.19版本下，如果把oiptimizer_switch里的semijoin关闭后，应该也相当于关闭anti join优化。

用EXPLAIN ANALYZE再查看这个SQL的执行计划及耗时，结果是这样的

| -> Filter: exists(select #2)  (cost=37975.50 rows=376310) (actual time=54.977..1976.963 rows=6 loops=1)
    -> Table scan on t1  (cost=37975.50 rows=376310) (actual time=0.054..70.366 rows=300000 loops=1)
    -> Select #2 (subquery in condition; dependent)
        -> Limit: 1 row(s)  (actual time=0.006..0.006 rows=0 loops=300000)
            -> Filter: (t1.x = t2.nu)  (cost=0.44 rows=2) (actual time=0.006..0.006 rows=0 loops=300000)
                -> Table scan on t2  (cost=0.44 rows=19) (actual time=0.002..0.004 rows=19 loops=300000)

看起来的确如此。在8.0.19中，也修复了松华老师之前在8.0.18版本遇到的小bug。

最后提醒大家不要轻易关闭 semijoin 开关，以防连 anti-join 优化也跟着消失，这就不划算了。

MySQL优化器的确在不断进步中，欣慰，也建议大家适时更新到高版本。

参考：

Antijoin in MySQL 8（https://mysqlserverteam.com/antijoin-in-mysql-8/ ）
MySQL 8 Anti-Join 几点总结

全文完。

MySQL 8.0中对EXISTS、NOT EXISTS的持续优化

1. 测试环境

2. MySQL 5.7下的执行计划及成本

3. MySQL 8.0.19下的执行计划及成本

4. 一个小小的脑洞

老叶茶馆

热门文章

最新文章

相关课程

相关电子书

相关实验场景

推荐镜像

热门

活动广场

任务中心

开发者评测

高校计划

乘风者计划

训练营

阿里云MVP

话题

直播

下载

镜像站

技术资料

插件

MySQL 8.0中对EXISTS、NOT EXISTS的持续优化

1. 测试环境

2. MySQL 5.7下的执行计划及成本

3. MySQL 8.0.19下的执行计划及成本

4. 一个小小的脑洞

老叶茶馆

热门文章

最新文章

相关课程

相关电子书

相关实验场景

推荐镜像