PostgreSQL 14中提升Nested Loop Joins性能的enable_memoize

2023-07-19 236

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

本文涉及的产品

云原生数据库 PolarDB PostgreSQL 版，标准版 2核4GB 50GB

云原生数据库 PolarDB MySQL 版，通用型 2核4GB 50GB

简介： PostgreSQL 14中提升Nested Loop Joins性能的enable_memoize

PostgreSQL 14中提升Nested Loop Joins性能的enable_memoize

最近在PG14中发现新增一个配置参数enable_memoize，通过此参数可以提升嵌套循环连接的性能，有人测试性能竟然能提升1000倍！

将查询提升1000倍暗示整个语句非常烂，而memoize可能有很大帮助，那么对于普通join是否也有用呢？

什么是memoization？

memoization表示：给定y=f(x)关系，可以在任何计算中使用y替代f(x)。例如，不论计算多少此upper(x)，总会得到X。如果整个函数计算代价非常高，并且输入值仅几种，那么为什么不维护一个映射输入值的哈希表，并通过它来查找已知值而不是计算他们呢？

正如我之前的博文https://blog.jooq.org/oracle-scalar-subquery-caching/，Oracle 11已经引入了一个标量子查询缓存的特性，可以在jOOQ中激活，避免代价高昂的PL/SQL上下文切换。

enable_memoize在PostgreSQL嵌套循环join中非常有用，参考上述推文，lateral join通常由嵌套循环连接来执行。

开启/关闭该参数

创建一个schema：

CREATE TABLE t AS
SELECT i, i % 5 AS j
FROM generate_series(1, 100000) AS t(i);
CREATE TABLE u AS
SELECT i, i % 20000 as j
FROM generate_series(1, 100000) AS t(i);
CREATE INDEX uj ON u(j);

总之，表t和u有100000条记录；t.j仅有5个不同的记录，每个重复20000次；u.j有20000个不同记录，每个重复5次。在PG14中查看enable_memoize：

SELECTcurrent_setting('enable_memoize');
|current_setting|
|---------------|
|on

整个特性已开启，看下explain：

EXPLAIN
SELECT *
FROM t JOIN u ON t.j = u.j;
|QUERY PLAN                                                            |
|----------------------------------------------------------------------|
|Nested Loop  (cost=0.30..8945.41 rows=496032 width=16)                |
|  ->  Seq Scan on t  (cost=0.00..1443.00 rows=100000 width=8)         |
|  ->  Memoize  (cost=0.30..0.41 rows=5 width=8)                       |
|        Cache Key: t.j                                                |
|        ->  Index Scan using uj on u  (cost=0.29..0.40 rows=5 width=8)|
|              Index Cond: (j = t.j)                                   |

如果没有memoization，进行join时，对于t中的100000行记录，需要执行100000次匹配u表中的这5个值，但使用memoization后，查询仅需要执行5次，因为t.j仅有5个不同的值。

关闭后，PG似乎在我的机器上选择hash join或merge join（多次执行，执行计划可能会变）

|QUERY PLAN                                                         |
|-------------------------------------------------------------------|
|Hash Join  (cost=3084.00..11568.51 rows=499351 width=16)           |
|  Hash Cond: (t.j = u.j)                                           |
|  ->  Seq Scan on t  (cost=0.00..1443.00 rows=100000 width=8)      |
|  ->  Hash  (cost=1443.00..1443.00 rows=100000 width=8)            |
|        ->  Seq Scan on u  (cost=0.00..1443.00 rows=100000 width=8)|

|QUERY PLAN                                                              |
|------------------------------------------------------------------------|
|Merge Join  (cost=9748.11..763846.11 rows=50000000 width=16)            |
|  Merge Cond: (u.j = t.j)                                               |
|  ->  Index Scan using uj on u  (cost=0.29..3848.29 rows=100000 width=8)|
|  ->  Sort  (cost=9747.82..9997.82 rows=100000 width=8)                 |
|        Sort Key: t.j                                                   |
|        ->  Seq Scan on t  (cost=0.00..1443.00 rows=100000 width=8)     |

使用benchmark

1）在模式A和模式B中重复操作25次并进行比较

2）重复上述5次以减轻预热和其他缓存效果

DO $$
DECLARE
  v_ts TIMESTAMP;
  v_repeat CONSTANT INT := 25;
  rec RECORD;
BEGIN
  -- Repeat the whole benchmark several times to avoid warmup penalty
  FOR r IN 1..5 LOOP
    v_ts := clock_timestamp();
    SET enable_memoize = OFF;
    FOR i IN 1..v_repeat LOOP
      FOR rec IN (
        SELECT t.*
        FROM t JOIN u ON t.j = u.j
      ) LOOP
        NULL;
      END LOOP;
    END LOOP;
    RAISE INFO 'Run %, Statement 1: %', r, (clock_timestamp() - v_ts);
    v_ts := clock_timestamp();
    SET enable_memoize = ON;
    FOR i IN 1..v_repeat LOOP
      FOR rec IN (
        SELECT t.*
        FROM t JOIN u ON t.j = u.j
      ) LOOP
        NULL;
      END LOOP;
    END LOOP;
    RAISE INFO 'Run %, Statement 2: %', r, (clock_timestamp() - v_ts);
    RAISE INFO '';
  END LOOP;
END$$;

在我机器上，结果一致：

Run 1, Statement 1: 00:00:03.763426
Run 1, Statement 2: 00:00:03.401346
Run 2, Statement 1: 00:00:03.769419
Run 2, Statement 2: 00:00:03.375677
Run 3, Statement 1: 00:00:03.771465
Run 3, Statement 2: 00:00:03.374413
Run 4, Statement 1: 00:00:03.769136
Run 4, Statement 2: 00:00:03.398734
Run 5, Statement 1: 00:00:03.772544
Run 5, Statement 2: 00:00:03.375272

即有10%的加速。在整个系统中，仅此一项就已经值得。

优化LATERAL

使用LATERAL替代，执行下面查询：

SELECT *
FROM
  t,
  LATERAL (
    SELECT count(*)
    FROM u
    WHERE t.j = u.j
  ) AS u(j)

Explain的结果：

|QUERY PLAN                                                                       |
|---------------------------------------------------------------------------------|
|Nested Loop  (cost=4.40..3969.47 rows=100000 width=16)                           |
|  ->  Seq Scan on t  (cost=0.00..1443.00 rows=100000 width=8)                    |
|  ->  Memoize  (cost=4.40..4.42 rows=1 width=8)                                  |
|        Cache Key: t.j                                                           |
|        ->  Aggregate  (cost=4.39..4.40 rows=1 width=8)                          |
|              ->  Index Only Scan using uj on u  (cost=0.29..4.38 rows=5 width=0)|
|                    Index Cond: (j = t.j)

因此，对于5个不同的t.j值作为入参，可以再次缓存COUNT(*)值，而不是每次都重复计算。

DO $$
DECLARE
  v_ts TIMESTAMP;
  v_repeat CONSTANT INT := 25;
  rec RECORD;
BEGIN
  -- Repeat the whole benchmark several times to avoid warmup penalty
  FOR r IN 1..5 LOOP
    v_ts := clock_timestamp();
    SET enable_memoize = OFF;
    FOR i IN 1..v_repeat LOOP
      FOR rec IN (
        SELECT *
        FROM
          t,
          LATERAL (
            SELECT count(*)
            FROM u
            WHERE t.j = u.j
          ) AS u(j)
      ) LOOP
        NULL;
      END LOOP;
    END LOOP;
    RAISE INFO 'Run %, Statement 1: %', r, (clock_timestamp() - v_ts);
    v_ts := clock_timestamp();
    SET enable_memoize = ON;
    FOR i IN 1..v_repeat LOOP
      FOR rec IN (
        SELECT *
        FROM
          t,
          LATERAL (
            SELECT count(*)
            FROM u
            WHERE t.j = u.j
          ) AS u(j)
      ) LOOP
        NULL;
      END LOOP;
    END LOOP;
    RAISE INFO 'Run %, Statement 2: %', r, (clock_timestamp() - v_ts);
    RAISE INFO '';
  END LOOP;
END$$;

这次有了巨大提升：

Run 1, Statement 1: 00:00:03.419728
Run 1, Statement 2: 00:00:01.083941
Run 2, Statement 1: 00:00:03.404954
Run 2, Statement 2: 00:00:01.098404
Run 3, Statement 1: 00:00:03.425725
Run 3, Statement 2: 00:00:01.093883
Run 4, Statement 1: 00:00:03.441691
Run 4, Statement 2: 00:00:01.127837
Run 5, Statement 1: 00:00:03.420172
Run 5, Statement 2: 00:00:01.097943

这真是一个好消息，这也适用普通的相关子查询吗？因为上面的LATERAL相关子查询也可以改写成：

SELECT
  t.*,
  (
    SELECT count(*)
    FROM u
    WHERE t.j = u.j
  ) j
FROM t;

遗憾的是，该计划并没有显示使用memoization：

|QUERY PLAN                                                                   |
|-----------------------------------------------------------------------------|
|Seq Scan on t  (cost=0.00..441693.00 rows=100000 width=16)                   |
|  SubPlan 1                                                                  |
|    ->  Aggregate  (cost=4.39..4.40 rows=1 width=8)                          |
|          ->  Index Only Scan using uj on u  (cost=0.29..4.38 rows=5 width=0)|
|                Index Cond: (j = t.j)

Run 1, Statement 1: 00:00:03.617562
Run 1, Statement 2: 00:00:03.605765
Run 2, Statement 1: 00:00:03.610084
Run 2, Statement 2: 00:00:03.682064
Run 3, Statement 1: 00:00:03.725952
Run 3, Statement 2: 00:00:03.705622
Run 4, Statement 1: 00:00:03.672669
Run 4, Statement 2: 00:00:03.644612
Run 5, Statement 1: 00:00:03.645741
Run 5, Statement 2: 00:00:03.642717

似乎有了整个新功能，将来可以将相关子查询重写成嵌套循环外连接？其他优化器已经这么做了，我们在这里将拥有和Oracle标量子查询缓存相同的功能。

结论

该功能在PG14中开启，除了一些额外的内存消耗，看不出其他任何缺点。相关子查询是一个函数，他的入参是outer查询列的谓词和引用。相关子查询的结果可以缓存和memoized。如上所示，这对您的SQL查询将带来巨大影响。只需升级到PG14即可从中获利。

原文

https://blog.jooq.org/postgresql-14s-enable_memoize-for-improved-performance-of-nested-loop-joins/

PostgreSQL 14中提升Nested Loop Joins性能的enable_memoize

PostgreSQL 14中提升Nested Loop Joins性能的enable_memoize

什么是memoization？

开启/关闭该参数

使用benchmark

优化LATERAL

结论

原文

热门文章

最新文章

相关课程

相关电子书

相关实验场景

推荐镜像

热门

活动广场

任务中心

开发者评测

高校计划

乘风者计划

训练营

阿里云MVP

话题

直播

下载

镜像站

技术资料

插件

PostgreSQL 14中提升Nested Loop Joins性能的enable_memoize

PostgreSQL 14中提升Nested Loop Joins性能的enable_memoize

什么是memoization？

开启/关闭该参数

使用benchmark

优化LATERAL

结论

原文

热门文章

最新文章

相关课程

相关电子书

相关实验场景

推荐镜像