语法
<PRE prettyprinted? linenums>
- /*+engine=mpp*/ EXPLAIN [ ( option [, ...] ) ] statement
- 其中 option 可以是以下之一:
- FORMAT { TEXT | GRAPHVIZ }
- TYPE { LOGICAL | DISTRIBUTED | VALIDATE }
描述
通过EXPLAIN可以查看一个语句的逻辑或分布式执行计划,或者对一个语句进行逻辑验证。 通过 TYPE DISTRIBUTED 选项可以查看分布式的分片计划(fragmented plan)。每个分片计划由一个或者多个COMPUTENODE Worker节点执行。 多个分片在COMPUTENODE Worker节点之间通过Data Exchange交换数据。分片类型指定了分片如何在COMPUTENODE Worker节点之间交换数据,以及数据在多个分片中如何分布:
SINGLE
- 分片在单个COMPUTENODE Worker节点执行。
HASH
- 分片在固定个数的COMPUTENODE Worker节点执行,每个节点的输入数据通过Hash函数进行分桶输入。
ROUND_ROBIN
- 分片在固定个数的COMPUTENODE Worker节点执行,每个节点的输入数据通过轮询的方式进行输入。
BROADCAST
- 分片在固定个数的COMPUTENODE Worker节点执行,输入数据通过广播的方式广播到所有节点。
SOURCE
例子
逻辑执行计划:<PRE prettyprinted? linenums>
- tiny> /*+engine=mpp*/ EXPLAIN SELECT regionkey, count(*) FROM nation GROUP BY 1;
- Query Plan
- ----------------------------------------------------------------------------------------------------------
- - Output[regionkey, _col1] => [regionkey:bigint, count:bigint]
- _col1 := count
- - RemoteExchange[GATHER] => regionkey:bigint, count:bigint
- - Aggregate(FINAL)[regionkey] => [regionkey:bigint, count:bigint]
- count := "count"("count_8")
- - LocalExchange[HASH][$hashvalue] ("regionkey") => regionkey:bigint, count_8:bigint, $hashvalue:bigint
- - RemoteExchange[REPARTITION][$hashvalue_9] => regionkey:bigint, count_8:bigint, $hashvalue_9:bigint
- - Project[] => [regionkey:bigint, count_8:bigint, $hashvalue_10:bigint]
- $hashvalue_10 := "combine_hash"(BIGINT '0', COALESCE("$operator$hash_code"("regionkey"), 0))
- - Aggregate(PARTIAL)[regionkey] => [regionkey:bigint, count_8:bigint]
- count_8 := "count"(*)
- - TableScan[tpch:tpch:nation:sf0.1, originalConstraint = true] => [regionkey:bigint]
- regionkey := tpch:regionkey
分布式执行计划:<PRE prettyprinted? linenums>
- tiny> /*+engine=mpp*/ EXPLAIN (TYPE DISTRIBUTED) SELECT regionkey, count(*) FROM nation GROUP BY 1;
- Query Plan
- ----------------------------------------------------------------------------------------------
- Fragment 0 [SINGLE]
- Output layout: [regionkey, count]
- Output partitioning: SINGLE []
- - Output[regionkey, _col1] => [regionkey:bigint, count:bigint]
- _col1 := count
- - RemoteSource[1] => [regionkey:bigint, count:bigint]
- Fragment 1 [HASH]
- Output layout: [regionkey, count]
- Output partitioning: SINGLE []
- - Aggregate(FINAL)[regionkey] => [regionkey:bigint, count:bigint]
- count := "count"("count_8")
- - LocalExchange[HASH][$hashvalue] ("regionkey") => regionkey:bigint, count_8:bigint, $hashvalue:bigint
- - RemoteSource[2] => [regionkey:bigint, count_8:bigint, $hashvalue_9:bigint]
- Fragment 2 [SOURCE]
- Output layout: [regionkey, count_8, $hashvalue_10]
- Output partitioning: HASH [regionkey][$hashvalue_10]
- - Project[] => [regionkey:bigint, count_8:bigint, $hashvalue_10:bigint]
- $hashvalue_10 := "combine_hash"(BIGINT '0', COALESCE("$operator$hash_code"("regionkey"), 0))
- - Aggregate(PARTIAL)[regionkey] => [regionkey:bigint, count_8:bigint]
- count_8 := "count"(*)
- - TableScan[tpch:tpch:nation:sf0.1, originalConstraint = true] => [regionkey:bigint]
- regionkey := tpch:regionkey
验证查询语句:<PRE prettyprinted? linenums>
- tiny> /*+engine=mpp*/ EXPLAIN (TYPE VALIDATE) SELECT regionkey, count(*) FROM nation GROUP BY 1;
- Valid
- -------
- true
MPP引擎执行计划运行时分析
语法
/*+engine=mpp*/ EXPLAIN ANALYZE statement
描述
执行语句,并展示语句对应的分布式执行计划和每个操作的详细开销信息。
Note
相关开销的统计信息可能不是完全准确,特别是对于执行非常快的查询。
例子
在以下例子中,你可以查看每个stage的CPU开销,以及在stage中每个Plan Node的相对开销。注意,每个Plan Node的相对开销是基于Wall Time的,Wall Time与CPU Time可能相关,也可能不相关。 针对每个Plan Node,你还能看到一些额外的统计信息内容(比如:每个节点实例的平均输入,相关Plan Node的平均Hash冲突数)。 这些统计信息有助于用户发现查询的执行时的数据异常情况(数据倾斜、异常Hash冲突等)。<PRE prettyprinted? linenums>
- sf1> /*+engine=mpp*/ EXPLAIN ANALYZE SELECT count(*), clerk FROM orders WHERE orderdate > date '1995-01-01' GROUP BY clerk;
- Query Plan
- -----------------------------------------------------------------------------------------------
- Fragment 1 [HASH]
- Cost: CPU 88.57ms, Input: 4000 rows (148.44kB), Output: 1000 rows (28.32kB)
- Output layout: [count, clerk]
- Output partitioning: SINGLE []
- - Project[] => [count:bigint, clerk:varchar(15)]
- Cost: 26.24%, Input: 1000 rows (37.11kB), Output: 1000 rows (28.32kB), Filtered: 0.00%
- Input avg.: 62.50 lines, Input std.dev.: 14.77%
- - Aggregate(FINAL)[clerk][$hashvalue] => [clerk:varchar(15), $hashvalue:bigint, count:bigint]
- Cost: 16.83%, Output: 1000 rows (37.11kB)
- Input avg.: 250.00 lines, Input std.dev.: 14.77%
- count := "count"("count_8")
- - LocalExchange[HASH][$hashvalue] ("clerk") => clerk:varchar(15), count_8:bigint, $hashvalue:bigint
- Cost: 47.28%, Output: 4000 rows (148.44kB)
- Input avg.: 4000.00 lines, Input std.dev.: 0.00%
- - RemoteSource[2] => [clerk:varchar(15), count_8:bigint, $hashvalue_9:bigint]
- Cost: 9.65%, Output: 4000 rows (148.44kB)
- Input avg.: 4000.00 lines, Input std.dev.: 0.00%
- Fragment 2 [tpch:orders:1500000]
- Cost: CPU 14.00s, Input: 818058 rows (22.62MB), Output: 4000 rows (148.44kB)
- Output layout: [clerk, count_8, $hashvalue_10]
- Output partitioning: HASH [clerk][$hashvalue_10]
- - Aggregate(PARTIAL)[clerk][$hashvalue_10] => [clerk:varchar(15), $hashvalue_10:bigint, count_8:bigint]
- Cost: 4.47%, Output: 4000 rows (148.44kB)
- Input avg.: 204514.50 lines, Input std.dev.: 0.05%
- Collisions avg.: 5701.28 (17569.93% est.), Collisions std.dev.: 1.12%
- count_8 := "count"(*)
- - ScanFilterProject[table = tpch:tpch:orders:sf1.0, originalConstraint = ("orderdate" > "$literal$date"(BIGINT '9131')), filterPredicate = ("orderdate" > "$literal$date"(BIGINT '9131'))] => [cler
- Cost: 95.53%, Input: 1500000 rows (0B), Output: 818058 rows (22.62MB), Filtered: 45.46%
- Input avg.: 375000.00 lines, Input std.dev.: 0.00%
- $hashvalue_10 := "combine_hash"(BIGINT '0', COALESCE("$operator$hash_code"("clerk"), 0))
- orderdate := tpch:orderdate
- clerk := tpch:clerk