GPDB-内核特性-GP7不再支持动态分区裁剪
GreenPlum支持分区表的功能,并通过分区裁剪来减少读取的数据量。分区裁剪分为静态分区裁剪和动态分区裁剪。静态分区裁剪:执行计划在生成时,就通过条件值过滤出需要的子分区,执行时仅扫描裁剪后的分区即可;动态分区裁剪:发生在SQL执行阶段,需要根据维度表的数据动态分析出需要哪些分区。
GP6中ORCA支持动态分区功能:
set optimizer=on; explain select * from sales; QUERY PLAN ------------------------------------------------------------------------------------------------------ Gather Motion 1:1 (slice1; segments: 1) (cost=0.00..431.00 rows=1 width=16) -> Sequence (cost=0.00..431.00 rows=1 width=16) -> Partition Selector for sales (dynamic scan id: 1) (cost=10.00..100.00 rows=100 width=4) Partitions selected: 366 (out of 366) -> Dynamic Seq Scan on sales (dynamic scan id: 1) (cost=0.00..431.00 rows=1 width=16) Optimizer: Pivotal Optimizer (GPORCA) (6 rows)
引入Sequence、PartitionSelector和DynamicSeqScan等算子完成动态分区裁剪。PartitionSelector根据数据分析出需要哪些分区,DynamicSeqScan扫描对应分区,Sequence负责处理所有DynamicSeqScan算子。
然而,GP7中进行测试时,发现执行计划不一样了:
set optimizer=on; explain select * from sales; QUERY PLAN ----------------------------------------------------------------------------------------- Gather Motion 1:1 (slice1; segments: 1) (cost=0.00..812886.00 rows=22179600 width=24) -> Append (cost=0.00..369294.00 rows=22179600 width=24) -> Seq Scan on sales_1_prt_1 (cost=0.00..706.00 rows=60600 width=24) -> Seq Scan on sales_1_prt_2 (cost=0.00..706.00 rows=60600 width=24) -> Seq Scan on sales_1_prt_3 (cost=0.00..706.00 rows=60600 width=24) ................ ................ -> Seq Scan on sales_1_prt_364 (cost=0.00..706.00 rows=60600 width=24) -> Seq Scan on sales_1_prt_365 (cost=0.00..706.00 rows=60600 width=24) -> Seq Scan on sales_1_prt_366 (cost=0.00..706.00 rows=60600 width=24) Optimizer: Postgres query optimizer (369 rows)
发生了什么?经过分析查看GreenPlum的issues,发现GP7竟然不支持动态分区了:
https://github.com/greenplum-db/gpdb/issues/11363
Partition table support in Orca was disabled for the master branch and we currently fall back to planner on all queries on partition tables. We are actively working to add this functionality back (eg: #11336 re-introduces support for static partition elimination).
Master分支中,目前是GP7,Orca禁用了分区表,执行计划又回到要查询所有分区表。我们正在积极做这个事,#11336重新引入支持静态分区裁剪功能:
https://github.com/greenplum-db/gpdb/pull/11336
Does this mean there will be no Dynamic scan node added to the plan?
@JunfengYang Yes, our current plan is to not use DynamicXXXScan nodes anymore, and use the Append node (just as in Planner) with enumerated Scan node children.
当前版本计划,不再使用DynamicXXXScan节点了,而是使用带有枚举Scan子节点的Append节点完成。
那么,GP7中如何实现动态分区裁剪的效果呢?
Gather Motion ->Nest Loop ->Append ->SeqScan on tpart1 ->SeqScan on tpart2 ->... ->SeqScan on tpartn ->Material ->Partition Selector ->SeqScan on t1
比如一个nest loop join,通过顺序扫描表t1,将他的值都扫描出来,然后通过PartitionSelector算子判断这些值落在哪个分区表中,并将所有值通过Material算子物化;Append算子根据PartitionSelector算子计算的分区,顺序扫描这些的到的分区得到值,与Material算子物化值进行join。