1 执行计划基本概念
使用PolarDB-X时,应用程序的查询SQL(称为逻辑SQL)发往PolarDB-X计算节点(CN),PolarDB-X会将其分成可下推的、和不可下推的两部分,可下推的部分也被称为物理SQL。不可下推的SQL在CN上执行,下推的SQL在DN上执行。
PolarDB-X在查询优化过程中尽可能将用户SQL下推到DN上执行,避免CN和DN间数据网络交互以外,同时充分利用多分片并发执行的能力,利用各个DN资源,加速查询。可以下发给DN的算子包括LogicalView,LogicalModifyView,PhyTableOperation, IndexScan,对于语句其它无法下推的部分,优化器会选择最优的方式来执行,比如选择合适的算子执行、选择合适的并行度策略以及是否使用 MPP 执行。
对于PolarDB-X的执行计划,下面的基础概念需要了解
a) 逻辑SQL: 用户侧发起的查询SQL;
b) 物理SQL:SQL经过查询优化后,一般会拆分为可下推和不可下推的SQL,其中可下推的SQL是发往DN执行的,叫物理SQL。如果逻辑SQL被全部下推到DN执行,那么物理SQL等价于逻辑SQL。
PolarDB-X的explain命令提供了比较多的选项,可以从各个不同的角度查看和分析执行计划。
2 实例sql语句
MySQL [test]> select a.dept_no,b.dept_name, sum(a.salary) from emp a, dept b
where a.dept_no=b.dept_no and b.dept_no=30
group by a.dept_no, b.dept_name;
+---------+--------------+---------------+
| dept_no | dept_name | sum(a.salary) |
+---------+--------------+---------------+
| 30 | 进出口部 | 1375 |
+---------+--------------+---------------+
1 row in set (0.09 sec)
本文中使用上述sql语句示范从不同角度使用explain的不同选项查看执行计划。
3 查看逻辑执行计划
MySQL [test]> explain select a.dept_no,b.dept_name, sum(a.salary) from emp a, dept b
-> where a.dept_no=b.dept_no and b.dept_no=30
-> group by a.dept_no, b.dept_name;
+-----------------------------------------------------------------------------------+
| LOGICAL EXECUTIONPLAN |
+-----------------------------------------------------------------------------------+
| HashAgg(group="dept_no,dept_name", sum(a.salary)="SUM(salary)") |
| Project(dept_no="dept_no0", salary="salary", dept_no0="dept_no", dept_name="dept_name") |
| BKAJoin(condition="dept_no = dept_no", type="inner") |
| LogicalView(tables="dept[p11]", sql="SELECT `dept_no`, `dept_name` FROM `dept` AS `dept` WHERE (`dept_no` = ?)") |
| Gather(concurrent=true) |
| LogicalView(tables="emp[p1,p2]", shardCount=2, sql="SELECT `dept_no`, `salary` FROM `emp` AS `emp` WHERE ((`dept_no` = ?) AND (`dept_no` IN (...)))") |
| HitCache:true |
| Source:PLAN_CACHE |
| TemplateId: 466da528 |
+--------------------------------------------------------------------------------+
9 rows in set (0.01 sec)
使用不带任何选项的explain命令可以查看sql语句的逻辑执行计划,这条有两个LogicalView算子可以下推到dn节点执行,CN节点先做bkajoin,然后做哈希聚合。
如果想看优化器对每一步的估算成本,使用cost选项
MySQL [test]> explain cost select a.dept_no,b.dept_name, sum(a.salary) from emp a, dept b where a.dept_no=b.dept_no and b.dept_no=30 group by a.dept_no, b.dept_name;
+-----------------------------------------------------------------------------------+
| LOGICAL EXECUTIONPLAN |
+-----------------------------------------------------------------------------------+
| HashAgg(group="dept_no,dept_name", sum(a.salary)="SUM(salary)"): rowcount = 1.0, cumulative cost = value = 1.0015027E7, cpu = 27.0, memory = 199.0, io = 3.0, net = 2.0 |
| Project(dept_no="dept_no0", salary="salary", dept_no0="dept_no", dept_name="dept_name"): rowcount = 1.0, cumulative cost = value = 1.0015012E7, cpu = 12.0, memory = 69.0, io = 3.0, net = 2.0 |
| BKAJoin(condition="dept_no = dept_no", type="inner"): rowcount = 1.0, cumulative cost = value = 1.0015011E7, cpu = 11.0, memory = 69.0, io = 3.0, net = 2.0 |
| LogicalView(tables="dept[p11]", sql="SELECT `dept_no`, `dept_name` FROM `dept` AS `dept` WHERE (`dept_no` = ?)"): rowcount = 1.0, cumulative cost = value = 5005002.0, cpu = 2.0, memory = 0.0, io = 1.0, net = 1.0 |
| Gather(concurrent=true): rowcount = 1.0, cumulative cost = value = 5003.0, cpu = 3.0, memory = 0.0, io = 1.0, net = 0.0 |
| LogicalView(tables="emp[p1,p2]", shardCount=2, sql="SELECT `dept_no`, `salary` FROM `emp` AS `emp` WHERE ((`dept_no` = ?) AND (`dept_no` IN (...)))"): rowcount = 1.0, cumulative cost = value = 5002.0, cpu = 2.0, memory = 0.0, io = 1.0, net = 0.0 |
| HitCache:true |
| Source:PLAN_CACHE |
| WorkloadType: TP |
| TemplateId: 466da528 |
+-----------------------------------------------------------------------------------+
10 rows in set (0.00 sec)
上面可以看到每一个算子的估算函数,累加成本,消耗的cpu、内存、IO。使用analyze选项,在看到优化器估算的成本之外,也可以看到每个算子实际执行的资源消耗
MySQL [test]> explain analyze select a.dept_no,b.dept_name, sum(a.salary) from emp a, dept b where a.dept_no=b.dept_no and b.dept_no=30 group by a.dept_no, b.dept_name;
+-----------------------------------------------------------------------------------+
| LOGICAL EXECUTIONPLAN |
+-----------------------------------------------------------------------------------+
| HashAgg(group="dept_no,dept_name", sum(a.salary)="SUM(salary)"): rowcount = 1.0, cumulative cost = value = 1.0015027E7, cpu = 27.0, memory = 199.0, io = 3.0, net = 2.0, actual time = 0.000 + 0.000, actual rowcount = 1, actual memory = 0, instances = 1 |
| Project(dept_no="dept_no0", salary="salary", dept_no0="dept_no", dept_name="dept_name"): rowcount = 1.0, cumulative cost = value = 1.0015012E7, cpu = 12.0, memory = 69.0, io = 3.0, net = 2.0, actual time = 0.003 + 0.001, actual rowcount = 27, actual memory = 0, instances = 1 |
| BKAJoin(condition="dept_no = dept_no", type="inner"): rowcount = 1.0, cumulative cost = value = 1.0015011E7, cpu = 11.0, memory = 69.0, io = 3.0, net = 2.0, actual time = 0.003 + 0.001, actual rowcount = 27, actual memory = 261876, instances = 1 |
| LogicalView(tables="dept[p11]", sql="SELECT `dept_no`, `dept_name` FROM `dept` AS `dept` WHERE (`dept_no` = ?)"): rowcount = 1.0, cumulative cost = value = 5005002.0, cpu = 2.0, memory = 0.0, io = 1.0, net = 1.0, actual time = 0.000 + 0.001, actual rowcount = 1, actual memory = 0, instances = 0 |
| Gather(concurrent=true): rowcount = 1.0, cumulative cost = value = 5003.0, cpu = 3.0, memory = 0.0, io = 1.0, net = 0.0, actual time = 0.000 + 0.000, actual rowcount = 0, actual memory = 0, instances = 0 |
| LogicalView(tables="emp[p1,p2]", shardCount=2, sql="SELECT `dept_no`, `salary` FROM `emp` AS `emp` WHERE ((`dept_no` = ?) AND (`dept_no` IN (...)))"): rowcount = 1.0, cumulative cost = value = 5002.0, cpu = 2.0, memory = 0.0, io = 1.0, net = 0.0, actual time = 0.001 + 0.001, actual rowcount = 27, actual memory = 0, instances = 0 |
| HitCache:true |
| Source:PLAN_CACHE |
| TemplateId: 466da528 |
+----------------------------------------------------------------------------------+
9 rows in set (0.05 sec)
从上面不但可以看到优化器估算的成本,也可以看到每个算子实际执行的时间,行数及内存。
使用physical选项可以看到sql语句执行的片段及片段之间的依赖关系
MySQL [test]> explain physical select a.dept_no,b.dept_name, sum(a.salary) from emp a, dept b where a.dept_no=b.dept_no and b.dept_no=30 group by a.dept_no, b.dept_name;
+-----------------------------------------------------------------------------------+
| PLAN |
+-----------------------------------------------------------------------------------+
| ExecutorMode: TP_LOCAL |
| Fragment 0 dependency: [] parallelism: 1 |
| Project(dept_no="dept_no0", salary="salary", dept_no0="dept_no", dept_name="dept_name") |
| BKAJoin(condition="dept_no = dept_no", type="inner") |
| LogicalView(tables="dept[p11]", sql="SELECT `dept_no`, `dept_name` FROM `dept` AS `dept` WHERE (`dept_no` = ?)") |
| Gather(concurrent=true) |
| LogicalView(tables="emp[p1,p2]", shardCount=2, sql="SELECT `dept_no`, `salary` FROM `emp` AS `emp` WHERE ((`dept_no` = ?) AND (`dept_no` IN (...)))") |
| Fragment 1 dependency: [0] parallelism: 2 |
| HashAgg(group="dept_no,dept_name", sum(a.salary)="SUM(salary)") |
| RemoteSource(sourceFragmentIds=[0], type=RecordType(INTEGER dept_no, BIGINT salary, TINYINT(3) dept_no0, VARCHAR(20) dept_name)) |
+-----------------------------------------------------------------------------------+
10 rows in set (0.05 sec)
语句的执行分为两个片段,后面的哈希聚合片段依赖前面的投影、bkaJoin片段。
4 物理执行计划查看
查看物理执行计划使用execute选项,看到的是语句在dn节点执行部分的MySQL执行计划。
explain execute select a.dept_no,b.dept_name, sum(a.salary) from emp a, dept b where a.dept_no=b.dept_no and b.dept_no=30 group by a.dept_no, b.dept_name\G;
*************************** 1. row ***************************
id:
select_type: SIMPLE
table: dept
partitions: NULL
type: const
possible_keys: PRIMARY
key: PRIMARY
key_len: 1
ref: const
rows:
filtered: 100
Extra: NULL
*************************** 2. row ***************************
id:
select_type: SIMPLE
table: emp
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: ▒
filtered: 9.999999046325684
Extra: Using where
2 rows in set (0.05 sec)
ERROR: No query specified
4 索引推荐
explain的advisor选项可以为sql语句推荐合适的索引及表的分区形式。
MySQL [test]> explain advisor select a.dept_no,b.dept_name, sum(a.salary) from emp a, dept b where a.dept_no=b.dept_no and b.dept_no=30 group by a.dept_no, b.dept_name\G;
*************************** 1. row ***************************
IMPROVE_VALUE: 99.6%
IMPROVE_CPU: 207.2%
IMPROVE_MEM: 51.3%
IMPROVE_IO: -33.3%
IMPROVE_NET: 100.0%
BEFORE_VALUE: 1.00100228E7
BEFORE_CPU: 22.7
BEFORE_MEM: 140.8
BEFORE_IO: 2
BEFORE_NET: 2
AFTER_VALUE: 5015007.4
AFTER_CPU: 7.3
AFTER_MEM: 93
AFTER_IO: 3
AFTER_NET: 1
ADVISE_INDEX: ALTER TABLE test.emp BROADCAST; ALTER TABLE test.dept BROADCAST;
NEW_PLAN:
Gather(concurrent=true)
LogicalView(tables="emp,dept", shardCount=0, sql="SELECT `emp`.`dept_no`, `dept`.`dept_name`, SUM(`emp`.`salary`) AS `sum(a.salary)` FROM `emp` AS `emp` INNER JOIN `dept` AS `dept` ON (((`dept`.`dept_no` = ?) AND (`emp`.`dept_no` = `dept`.`dept_no`)) AND (`emp`.`dept_no` = ?)) GROUP BY `emp`.`dept_no`, `dept`.`dept_name`")
INFO: BROADCAST
1 row in set (0.18 sec)
ERROR: No query specified
这里推荐是将这两个表都改为广播表。