PolarDB-X的主要用途是水平分表,在水平分表中,有些表只有少量数据,如果创建成普通的单表,在同大表join时,会导致join不能下推到存储节点,只能在计算节点执行,join、sort之类的操作如果在计算节点执行,不但会导致计算节点负载过重,也会导致大量数据在计算和存储节点之间传输,影响集群的性能。如果将此类小表创建成广播表,PolarDB-X就可以将join操作下推到存储节点执行,从而大幅度提高性能。简单测试一下就可以看出来。
1 集群中已有下面的测试表
MySQL [test]> show create table dept;
+-------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| TABLE | CREATE TABLE |
+-------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| dept | CREATE TABLE `dept` (
`dept_no` tinyint(4) NOT NULL,
`dept_name` varchar(20) DEFAULT NULL,
`loc` varchar(30) DEFAULT NULL,
PRIMARY KEY (`dept_no`)
) ENGINE = InnoDB DEFAULT CHARSET = utf8 |
+-------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.04 sec)
MySQL [test]> show create table emp;
+-------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| TABLE | CREATE TABLE |
+-------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| emp | CREATE TABLE `emp` (
`id` bigint(11) NOT NULL AUTO_INCREMENT,
`name` varchar(255) DEFAULT NULL,
`dept_no` int(11) DEFAULT NULL,
`salary` bigint(10) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE = InnoDB AUTO_INCREMENT = 100005 DEFAULT CHARSET = utf8
PARTITION BY KEY(`id`)
PARTITIONS 2 |
+-------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.01 sec)
2 测试表已有数据
MySQL [test]> select count(*) from emp;
+----------+
| count(*) |
+----------+
| 1066 |
+----------+
1 row in set (0.05 sec)
MySQL [test]> select * from emp limit 5;
+------+----------------------+---------+--------+
| id | name | dept_no | salary |
+------+----------------------+---------+--------+
| 6 | Marlena Normanvell | 27 | 55 |
| 2 | Bobina De-Ville | 5 | 35 |
| 8 | Murielle Crosser | 3 | 97 |
| 4 | Danyette Debold | 4 | 30 |
| 10 | Crosby Ather | 27 | 7 |
+------+----------------------+---------+--------+
5 rows in set (0.29 sec)
MySQL [test]> select count(*) from dept;
+----------+
| count(*) |
+----------+
| 15 |
+----------+
1 row in set (0.07 sec)
MySQL [test]> select * from dept limit 5;
+---------+-----------+-----------+
| dept_no | dept_name | loc |
+---------+-----------+-----------+
| 1 | 广告部 | 襄阳市 |
| 5 | 研发部 | 黑河市 |
| 8 | 研发部 | 本溪市 |
| 9 | 总务部 | 温州市 |
| 10 | 广告部 | 长春市 |
+---------+-----------+-----------+
5 rows in set (0.10 sec)
3 dept表为非广播表时的执行计划
explain select a.name,a.salary, b.dept_name from emp a inner join dept b on a.dept_no=b.dept_no where b.dept_no=9;
MySQL [test]> explain select a.name,a.salary, b.dept_name from emp a inner join dept b on a.dept_no=b.dept_no where b.dept_no=9;
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| LOGICAL EXECUTIONPLAN |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Project(name="name", salary="salary", dept_name="dept_name") |
| BKAJoin(condition="dept_no = dept_no", type="inner") |
| LogicalView(tables="dept[p11]", sql="SELECT `dept_no`, `dept_name` FROM `dept` AS `dept` WHERE (`dept_no` = ?)") |
| Gather(concurrent=true) |
| LogicalView(tables="emp[p1,p2]", shardCount=2, sql="SELECT `name`, `dept_no`, `salary` FROM `emp` AS `emp` WHERE ((`dept_no` = ?) AND (`dept_no` IN (...)))") |
| HitCache:true |
| Source:PLAN_CACHE |
| TemplateId: 73dc4e54 |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+
8 rows in set (0.03 sec)
这个join的表中,dept的数据只有30行,where条件为dept中列,集群在选择执行计划时,没有将join下推到存储节点,而是从存储节点拉取数据,在计算节点进行join。
4 dept为广播表时的执行计划
--将dept表创建为广播表,再看测试计划
MySQL [test]> show create table dept_b;
+--------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| TABLE | CREATE TABLE |
+--------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| dept_b | CREATE TABLE `dept_b` (
`dept_no` tinyint(4) NOT NULL,
`dept_name` varchar(20) DEFAULT NULL,
`loc` varchar(30) DEFAULT NULL,
PRIMARY KEY (`dept_no`)
) ENGINE = InnoDB DEFAULT CHARSET = utf8
BROADCAST |
+--------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.03 sec)
MySQL [test]> insert into dept_b select * from dept;
Query OK, 15 rows affected (0.54 sec)
--执行计划中将join操作下推到了每个存储节点
MySQL [test]> explain select a.name,a.salary, b.dept_name from emp a inner join dept_b b on a.dept_no=b.dept_no where b.dept_no=9;
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| LOGICAL EXECUTIONPLAN |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Gather(concurrent=true) |
| LogicalView(tables="emp[p1,p2],dept_b", shardCount=2, sql="SELECT `emp`.`name`, `emp`.`salary`, `dept_b`.`dept_name` FROM `emp` AS `emp` INNER JOIN `dept_b` AS `dept_b` ON (((`dept_b`.`dept_no` = ?) AND (`emp`.`dept_no` = `dept_b`.`dept_no`)) AND (`emp`.`dept_no` = ?))") |
| HitCache:false |
| Source:PLAN_CACHE |
| TemplateId: 3ecd1ad1 |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
5 rows in set (0.22 sec)
--以a表中的数据作为查询条件做点差
MySQL [test]> explain select a.name,a.salary, b.dept_name from emp a inner join dept_b b on a.dept_no=b.dept_no where a.id=2;
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| LOGICAL EXECUTIONPLAN |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Gather(concurrent=true) |
| LogicalView(tables="emp[p1],dept_b", sql="SELECT `emp`.`name`, `emp`.`salary`, `dept_b`.`dept_name` FROM `emp` AS `emp` INNER JOIN `dept_b` AS `dept_b` ON ((`emp`.`dept_no` = `dept_b`.`dept_no`) AND (`emp`.`id` = ?))") |
| HitCache:true |
| Source:PLAN_CACHE |
| TemplateId: d78a561e |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
5 rows in set (0.01 sec)
MySQL [test]> explain select a.name,a.salary, b.dept_name from emp a inner join dept b on a.dept_no=b.dept_no where a.id=2;
+--------------------------------------------------------------------------------------------------------------------------------------------------------+
| LOGICAL EXECUTIONPLAN |
+--------------------------------------------------------------------------------------------------------------------------------------------------------+
| Project(name="name", salary="salary", dept_name="dept_name") |
| BKAJoin(condition="dept_no = dept_no", type="inner") |
| LogicalView(tables="emp[p1]", sql="SELECT `name`, `dept_no`, `salary` FROM `emp` AS `emp` WHERE (`id` = ?)") |
| Gather(concurrent=true) |
| LogicalView(tables="dept[p1,p2,p3,...p16]", shardCount=16, sql="SELECT `dept_no`, `dept_name` FROM `dept` AS `dept` WHERE (`dept_no` IN (...))") |
| HitCache:false |
| Source:PLAN_CACHE |
| TemplateId: 1253bcbb |
+--------------------------------------------------------------------------------------------------------------------------------------------------------+
8 rows in set (0.07 sec)
在使用广播表时,计算节点直接将join推到了响应的存储节点执行,而在使用单表时,dept作为外表,需要在所有节点上执行,最后在计算节点做join,消耗的资源要远大于使用广播表时。
4 结论
当大表与小表做join时,将小表设置为广播表,在以大表的分区列做查询时可以大幅度提高查询性能。当以小表的分区列也可以join下推到存储节点执行,从而节省计算节点的存储资源。