GPDB-内核特性-GP7不再支持动态分区裁剪-阿里云开发者社区

GPDB-内核特性-GP7不再支持动态分区裁剪

2023-07-19 415

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

简介： GPDB-内核特性-GP7不再支持动态分区裁剪

GPDB-内核特性-GP7不再支持动态分区裁剪

GreenPlum支持分区表的功能，并通过分区裁剪来减少读取的数据量。分区裁剪分为静态分区裁剪和动态分区裁剪。静态分区裁剪：执行计划在生成时，就通过条件值过滤出需要的子分区，执行时仅扫描裁剪后的分区即可；动态分区裁剪：发生在SQL执行阶段，需要根据维度表的数据动态分析出需要哪些分区。

GP6中ORCA支持动态分区功能：

set optimizer=on;
explain select * from sales;
                                              QUERY PLAN                                              
------------------------------------------------------------------------------------------------------
 Gather Motion 1:1  (slice1; segments: 1)  (cost=0.00..431.00 rows=1 width=16)
   ->  Sequence  (cost=0.00..431.00 rows=1 width=16)
         ->  Partition Selector for sales (dynamic scan id: 1)  (cost=10.00..100.00 rows=100 width=4)
               Partitions selected: 366 (out of 366)
         ->  Dynamic Seq Scan on sales (dynamic scan id: 1)  (cost=0.00..431.00 rows=1 width=16)
 Optimizer: Pivotal Optimizer (GPORCA)
(6 rows)

引入Sequence、PartitionSelector和DynamicSeqScan等算子完成动态分区裁剪。PartitionSelector根据数据分析出需要哪些分区，DynamicSeqScan扫描对应分区，Sequence负责处理所有DynamicSeqScan算子。

然而，GP7中进行测试时，发现执行计划不一样了：

set optimizer=on;
explain select * from sales;
                                       QUERY PLAN                                        
-----------------------------------------------------------------------------------------
 Gather Motion 1:1  (slice1; segments: 1)  (cost=0.00..812886.00 rows=22179600 width=24)
   ->  Append  (cost=0.00..369294.00 rows=22179600 width=24)
         ->  Seq Scan on sales_1_prt_1  (cost=0.00..706.00 rows=60600 width=24)
         ->  Seq Scan on sales_1_prt_2  (cost=0.00..706.00 rows=60600 width=24)
         ->  Seq Scan on sales_1_prt_3  (cost=0.00..706.00 rows=60600 width=24)
          ................
          ................
         ->  Seq Scan on sales_1_prt_364  (cost=0.00..706.00 rows=60600 width=24)
         ->  Seq Scan on sales_1_prt_365  (cost=0.00..706.00 rows=60600 width=24)
         ->  Seq Scan on sales_1_prt_366  (cost=0.00..706.00 rows=60600 width=24)
 Optimizer: Postgres query optimizer
(369 rows)

发生了什么？经过分析查看GreenPlum的issues，发现GP7竟然不支持动态分区了：

https://github.com/greenplum-db/gpdb/issues/11363

Partition table support in Orca was disabled for the master branch and we currently fall back to planner on all queries on partition tables. We are actively working to add this functionality back (eg: #11336 re-introduces support for static partition elimination).

Master分支中，目前是GP7，Orca禁用了分区表，执行计划又回到要查询所有分区表。我们正在积极做这个事，#11336重新引入支持静态分区裁剪功能：

https://github.com/greenplum-db/gpdb/pull/11336

Does this mean there will be no Dynamic scan node added to the plan?

@JunfengYang Yes, our current plan is to not use DynamicXXXScan nodes anymore, and use the Append node (just as in Planner) with enumerated Scan node children.

当前版本计划，不再使用DynamicXXXScan节点了，而是使用带有枚举Scan子节点的Append节点完成。

那么，GP7中如何实现动态分区裁剪的效果呢？

Gather Motion
  ->Nest Loop
    ->Append
      ->SeqScan on tpart1
      ->SeqScan on tpart2
      ->...
      ->SeqScan on tpartn
    ->Material
      ->Partition Selector
        ->SeqScan on t1

比如一个nest loop join，通过顺序扫描表t1，将他的值都扫描出来，然后通过PartitionSelector算子判断这些值落在哪个分区表中，并将所有值通过Material算子物化；Append算子根据PartitionSelector算子计算的分区，顺序扫描这些的到的分区得到值，与Material算子物化值进行join。

GPDB-内核特性-GP7不再支持动态分区裁剪

热门文章

最新文章

相关电子书

探索云世界

热门

云计算

大数据

云原生

人工智能

数据库

开发与运维

活动广场

任务中心

训练营

直播

乘风者计划

下载

镜像站

技术资料

GPDB-内核特性-GP7不再支持动态分区裁剪

热门文章

最新文章

相关电子书