tablegroup特性是数据库分库分表水平分库分表中的必不可少的技术手段,合理使用tablegroup可以将更多的sql计算下推到存储节点,节省网络带宽,优化sql执行性能。
1 测试表及数据
测试表的定义
MySQL [test]> show create table emp;
+-------+-------------------------------------------------------+
| TABLE | CREATE TABLE
| emp | CREATE TABLE `emp` (
`id` bigint(11) NOT NULL AUTO_INCREMENT,
`name` varchar(255) DEFAULT NULL,
`dept_no` int(11) DEFAULT NULL,
`salary` bigint(10) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE = InnoDB AUTO_INCREMENT = 100005 DEFAULT CHARSET = utf8
PARTITION BY KEY(`id`)
PARTITIONS 2 |
+-------+--------------------------------------------------------+
1 row in set (0.04 sec)
MySQL [test]> show create table emp_bonus;
+-----------+------------------------------------+
| TABLE | CREATE TABLE |
+-----------+-- -------------------------------------+
| emp_bonus | CREATE TABLE `emp_bonus` (
`emp_id` bigint(20) NOT NULL,
`bonus` decimal(10, 0) DEFAULT NULL,
PRIMARY KEY (`emp_id`)
) ENGINE = InnoDB DEFAULT CHARSET = utf8
PARTITION BY KEY(`emp_id`)
PARTITIONS 2 |
+----------------------------------------------------+
1 row in set (0.03 sec)
表中已有数据,运行一个查询,查找一条数据
MySQL [test]> select a.name, a.salary, b.bonus from emp a, emp_bonus b where a.id=b.emp_id and a.id=20;
+-----------------+--------+-------+
| name | salary | bonus |
+-----------------+--------+-------+
| Padraic Kuhne | 13 | 1503 |
+-----------------+--------+-------+
1 row in set (0.09 sec)
2 查看执行计划(点差)
MySQL [test]> explain select a.name, a.salary, b.bonus from emp a, emp_bonus b where a.id=b.emp_id and a.id=20;
+----------------------------------------------------------+
| LOGICAL EXECUTIONPLAN
+----------------------------------------------------------+
| PhyTableOperation(tables="emp[p1],emp_bonus[p1]", sql="SELECT `a`.`name` AS `name`, `a`.`salary` AS `salary`, `b`.`bonus` AS `bonus` FROM ? AS `a`, ? AS `b` WHERE ((`a`.`id` = `b`.`emp_id`) AND (`a`.`id` = ?))") |
| HitCache:true |
| Source:PLAN_CACHE |
| TemplateId: f321691d |
+-----------------------------------------------------------+
4 rows in set (0.08 sec)
执行的是物理表操作,优化器找到了数据的正确位置,join下推到了正确的分区和节点。看一下数据库中已有的tablegroup定义
MySQL [test]> show tablegroup;
+--------------+----------------+------------------+----------+--------------+------------------+--------------+--------------+------------+-------------+-------------+
| TABLE_SCHEMA | TABLE_GROUP_ID | TABLE_GROUP_NAME | LOCALITY | PRIMARY_ZONE | IS_MANUAL_CREATE | CUR_PART_KEY | MAX_PART_KEY | PART_COUNT | TABLE_COUNT | INDEX_COUNT |
+--------------+----------------+------------------+----------+--------------+------------------+--------------+--------------+------------+-------------+-------------+
| test | 1 | tg1 | NULL | NULL | 0 | KEY(BIGINT) | KEY(BIGINT) | 2 | 2 | 0 |
| test | 2 | tg2 | NULL | NULL | 0 | KEY(TINYINT) | KEY(TINYINT) | 16 | 1 | 0 |
| test | 4 | broadcast_tg | NULL | NULL | 0 | NO_PART_KEY | NO_PART_KEY | 2 | 1 | 0 |
+--------------+----------------+------------------+----------+--------------+------------------+--------------+--------------+------------+-------------+-------------+
3 rows in set (0.14 sec)
数据库中已有3个TABLEGROUP,这三个tablegroup都是在创建表时数据库自动创建的,可见创建表时不指定tablegroup,系统会自动创建或指定一个,创建或指定的表组与所创建的分区方式完全一致。这里emp和emp_bonus的分区方式完全一直,系统为emp_bonus指定了与其分区方式一致的tg1分组。
3 不在同一表组中的表点差时的执行计划
创建一个tablegroup
MySQL [test]> create tablegroup test_tg;
Query OK, 1 row affected (0.17 sec)
table group已经创建
MySQL [test]> show tablegroup;
+--------------+----------------+------------------+----------+--------------+------------------+--------------+--------------+------------+-------------+-------------+
| TABLE_SCHEMA | TABLE_GROUP_ID | TABLE_GROUP_NAME | LOCALITY | PRIMARY_ZONE | IS_MANUAL_CREATE | CUR_PART_KEY | MAX_PART_KEY | PART_COUNT | TABLE_COUNT | INDEX_COUNT |
+--------------+----------------+------------------+----------+--------------+------------------+--------------+--------------+------------+-------------+-------------+
| test | 1 | tg1 | NULL | NULL | 0 | KEY(BIGINT) | KEY(BIGINT) | 2 | 2 | 0 |
| test | 2 | tg2 | NULL | NULL | 0 | KEY(TINYINT) | KEY(TINYINT) | 16 | 1 | 0 |
| test | 4 | broadcast_tg | NULL | NULL | 0 | NO_PART_KEY | NO_PART_KEY | 2 | 1 | 0 |
| test | 5 | test_tg | NULL | NULL | 1 | | | 0 | 0 | 0 |
+--------------+----------------+------------------+----------+--------------+------------------+--------------+--------------+------------+-------------+-------------+
4 rows in set (0.04 sec)
创建测试表加入到刚才的tablegroup
MySQL [test]> CREATE TABLE `emp_bonus_tg2` (
`emp_id` bigint(20) NOT NULL,
`bonus` decimal(10, 0) DEFAULT NULL,
PRIMARY KEY (`emp_id`)
) ENGINE = InnoDB DEFAULT CHARSET = utf8 tablegroup='test_tg'
PARTITION BY KEY(`emp_id`)
PARTITIONS 2;
MySQL [test]> insert into emp_bonus_tg2 select * from emp_bonus;
Query OK, 663 rows affected (0.26 sec)
CREATE TABLE `emp_bonus_no_part` (
`emp_id` bigint(20) NOT NULL,
`bonus` decimal(10, 0) DEFAULT NULL,
PRIMARY KEY (`emp_id`)
) ENGINE = InnoDB DEFAULT CHARSET = utf8;
查看一下执行计划
MySQL [test]> explain select a.name, a.salary, b.bonus from emp a, emp_bonus_no_part b where a.id=b.emp_id and a.id=20;
+-------------------------------------------------------------+
| LOGICAL EXECUTIONPLAN|
+--------------------------------------------------------------+
| PhyTableOperation(tables="emp[p1],emp_bonus_no_part[p3]", sql="SELECT `a`.`name` AS `name`, `a`.`salary` AS `salary`, `b`.`bonus` AS `bonus` FROM ? AS `a`, ? AS `b` WHERE ((`a`.`id` = `b`.`emp_id`) AND (`a`.`id` = ?))") |
| HitCache:false |
| Source:PLAN_CACHE |
| TemplateId: d557410f |
+-------------------------------------------------------------+
4 rows in set (0.11 sec)
MySQL [test]> explain select a.name, a.salary, b.bonus from emp a, emp_bonus_tg2 b where a.id=b.emp_id and a.id=20;
+-------------------------------------------------------------+
| LOGICAL EXECUTIONPLAN
+-------------------------------------------------------------+
| PhyTableOperation(tables="emp[p1],emp_bonus_tg2[p1]", sql="SELECT `a`.`name` AS `name`, `a`.`salary` AS `salary`, `b`.`bonus` AS `bonus` FROM ? AS `a`, ? AS `b` WHERE ((`a`.`id` = `b`.`emp_id`) AND (`a`.`id` = ?))") |
| HitCache:true |
| Source:PLAN_CACHE |
| TemplateId: 1e8fbbfd |
+-------------------------------------------------------------+
4 rows in set (0.00 sec)
MySQL [test]> explain select a.name, a.salary, b.bonus from emp a, emp_bonus b where a.id=b.emp_id and a.id=20;
+-------------------------------------------------------------+
| LOGICAL EXECUTIONPLAN
+-------------------------------------------------------------+
| PhyTableOperation(tables="emp[p1],emp_bonus[p1]", sql="SELECT `a`.`name` AS `name`, `a`.`salary` AS `salary`, `b`.`bonus` AS `bonus` FROM ? AS `a`, ? AS `b` WHERE ((`a`.`id` = `b`.`emp_id`) AND (`a`.`id` = ?))") |
| HitCache:true |
| Source:PLAN_CACHE |
| TemplateId: f321691d |
+-------------------------------------------------------------+
4 rows in set (0.01 sec)
这三条语句优化器都正确的定位数据所在的分区,将join操作下推到正确的节点。
看一下这三个表的拓扑
MySQL [test]> show topology from emp_bonus_tg2;
+----+-------------------+--------------------------+----------------+
| ID | GROUP_NAME | TABLE_NAME | PARTITION_NAME |
+----+-------------------+--------------------------+----------------+
| 0 | TEST_P00001_GROUP | emp_bonus_tg2_00KT_00001 | p2 |
| 1 | TEST_P00000_GROUP | emp_bonus_tg2_00KT_00000 | p1 |
+----+-------------------+--------------------------+----------------+
2 rows in set (0.00 sec)
MySQL [test]> show topology from emp_bonus;
+----+-------------------+----------------------+----------------+
| ID | GROUP_NAME | TABLE_NAME | PARTITION_NAME |
+----+-------------------+----------------------+----------------+
| 0 | TEST_P00001_GROUP | emp_bonus_OLKx_00001 | p2 |
| 1 | TEST_P00000_GROUP | emp_bonus_OLKx_00000 | p1 |
+----+-------------------+----------------------+----------------+
2 rows in set (0.02 sec)
MySQL [test]> show topology from emp_bonus_no_part;
+----+-------------------+------------------------------+----------------+
| ID | GROUP_NAME | TABLE_NAME | PARTITION_NAME |
+----+-------------------+------------------------------+----------------+
| 0 | TEST_P00001_GROUP | emp_bonus_no_part_mqCN_00001 | p2 |
| 1 | TEST_P00001_GROUP | emp_bonus_no_part_mqCN_00003 | p4 |
| 2 | TEST_P00001_GROUP | emp_bonus_no_part_mqCN_00005 | p6 |
| 3 | TEST_P00001_GROUP | emp_bonus_no_part_mqCN_00007 | p8 |
| 4 | TEST_P00001_GROUP | emp_bonus_no_part_mqCN_00009 | p10 |
| 5 | TEST_P00001_GROUP | emp_bonus_no_part_mqCN_00011 | p12 |
| 6 | TEST_P00001_GROUP | emp_bonus_no_part_mqCN_00013 | p14 |
| 7 | TEST_P00001_GROUP | emp_bonus_no_part_mqCN_00015 | p16 |
| 8 | TEST_P00000_GROUP | emp_bonus_no_part_mqCN_00000 | p1 |
| 9 | TEST_P00000_GROUP | emp_bonus_no_part_mqCN_00002 | p3 |
| 10 | TEST_P00000_GROUP | emp_bonus_no_part_mqCN_00004 | p5 |
| 11 | TEST_P00000_GROUP | emp_bonus_no_part_mqCN_00006 | p7 |
| 12 | TEST_P00000_GROUP | emp_bonus_no_part_mqCN_00008 | p9 |
| 13 | TEST_P00000_GROUP | emp_bonus_no_part_mqCN_00010 | p11 |
| 14 | TEST_P00000_GROUP | emp_bonus_no_part_mqCN_00012 | p13 |
| 15 | TEST_P00000_GROUP | emp_bonus_no_part_mqCN_00014 | p15 |
+----+-------------------+------------------------------+----------------+
16 rows in set (0.00 sec)
这里可以看到,如果建表时不输出分区策略,系统将执行默认的分区策略,默认以主键作为分区键,分区个数为16个。如果不想让表分区,需要加single分区策略,如下面演示的这样。
MySQL [test]> CREATE TABLE `emp_bonus_single` (
-> `emp_id` bigint(20) NOT NULL,
-> `bonus` decimal(10, 0) DEFAULT NULL,
-> PRIMARY KEY (`emp_id`)
-> ) ENGINE = InnoDB DEFAULT CHARSET = utf8 single;
Query OK, 0 rows affected (1.73 sec)
MySQL [test]> insert into emp_bonus_single select * from emp_bonus;
Query OK, 663 rows affected (0.13 sec)
MySQL [test]> show topology from emp_bonus_single;
+----+------- +----------------+----------+
| ID | GROUP_NAME| TABLE_NAME |PARTITION_NAME |
+----+------
)
-------------+-----------------------+----------------+
| 0 | TEST_P00000_GROUP | emp_bonus_single_jTOU | p1 |
+----+-------------------+-----------------------+----------------+
1 row in set (0.01 sec
可以看到,这张表只创建了一个分区,看一下和这张表关联查询时的执行计划
MySQL [test]> explain select a.name, a.salary, b.bonus from emp a, emp_bonus_single b where a.id=b.emp_id and a.id in (20, 25);
+-----------------------------------------------------------+
| LOGICAL EXECUTIONPLAN |
+-----------------------------------------------------------+
| PhyTableOperation(tables="emp[p1],emp_bonus_single[p1]", sql="SELECT `a`.`name` AS `name`, `a`.`salary` AS `salary`, `b`.`bonus` AS `bonus` FROM ? AS `a`, ? AS `b` WHERE ((`a`.`id` = `b`.`emp_id`) AND (`a`.`id` = ?))") |
| HitCache:false |
| Source:PLAN_CACHE |
| TemplateId: e11a2572 |
+-----------------------------------------------------------+
4 rows in set (0.06 sec)
优化器仍然找到了数据所在的分区,将join下推到了数据所在的存储节点。
4 查询数据跨分区时的执行计划
找一条在分区2上的数据
MySQL [test]> select * from emp partition(p2) limit 10;
+------+----------------------+---------+--------+
| id | name | dept_no | salary |
+------+----------------------+---------+--------+
| 6 | Marlena Normanvell | 27 | 55 |
| 8 | Murielle Crosser | 3 | 97 |
| 10 | Crosby Ather | 27 | 7 |
| 14 | Morgun Yanin | 10 | 22 |
| 18 | Nanete Josland | 5 | 76 |
| 24 | Foss Brantzen | 16 | 100 |
| 26 | Gaspard Morritt | 13 | 98 |
| 30 | Ernst O'Cahey' | 19 | 35 |
| 32 | Amandy Micklem | 23 | 60 |
| 38 | Peder Ugoletti | 6 | 28 |
+------+----------------------+---------+--------+
10 rows in set (0.14 sec)
调整一下语句,查询两个分区上的数据,看一下执行计划
MySQL [test]> explain select a.name, a.salary, b.bonus from emp a, emp_bonus_single b where a.id=b.emp_id and a.id in (20, 6);
+-----------------------------------------------------------+
| LOGICAL EXECUTIONPLAN |
+-----------------------------------------------------------+
| Project(name="name", salary="salary", bonus="bonus") |
| BKAJoin(condition="emp_id = id", type="inner") |
| Gather(concurrent=true) |
| LogicalView(tables="emp[p1,p2]", shardCount=2, sql="SELECT `id`, `name`, `salary` FROM `emp` AS `emp` WHERE (`id` IN(?, ?))") |
| LogicalView(tables="emp_bonus_single[p1]", sql="SELECT `emp_id`, `bonus` FROM `emp_bonus_single` AS `emp_bonus_single` WHERE ((`emp_id` IN(?, ?)) AND (`emp_id` IN (...)))") |
| HitCache:true |
| Source:PLAN_CACHE |
| TemplateId: 78490106 |
+-----------------------------------------------------------+
8 rows in set (0.06 sec)
MySQL [test]> explain select a.name, a.salary, b.bonus from emp a, emp_bonus b where a.id=b.emp_id and a.id in (20, 6);
+-----------------------------------------------------------+
| LOGICAL EXECUTIONPLAN |
+-----------------------------------------------------------+
| Gather(concurrent=true) |
| LogicalView(tables="emp[p1,p2],emp_bonus[p1,p2]", shardCount=2, sql="SELECT `emp`.`name`, `emp`.`salary`, `emp_bonus`.`bonus` FROM `emp` AS `emp` INNER JOIN `emp_bonus` AS `emp_bonus` ON ((`emp`.`id` = `emp_bonus`.`emp_id`) AND (`emp`.`id` IN(?, ?)))") |
| HitCache:false |
| Source:PLAN_CACHE |
| TemplateId: 405b9871 |
+----------------------------------------------------------+
5 rows in set (0.07 sec)
MySQL [test]> explain select a.name, a.salary, b.bonus from emp a, emp_bonus_no_part b where a.id=b.emp_id and a.id in (20, 6);
+-----------------------------------------------------------+
| LOGICAL EXECUTIONPLAN |
+-----------------------------------------------------------+
| Project(name="name", salary="salary", bonus="bonus") |
| BKAJoin(condition="emp_id = id", type="inner") |
| Gather(concurrent=true) |
| LogicalView(tables="emp[p1,p2]", shardCount=2, sql="SELECT `id`, `name`, `salary` FROM `emp` AS `emp` WHERE (`id` IN(?, ?))") |
| Gather(concurrent=true) |
| LogicalView(tables="emp_bonus_no_part[p3,p9]", shardCount=2, sql="SELECT `emp_id`, `bonus` FROM `emp_bonus_no_part` AS `emp_bonus_no_part` WHERE ((`emp_id` IN(?, ?)) AND (`emp_id` IN (...)))") |
| HitCache:false |
| Source:PLAN_CACHE |
| TemplateId: 443832e3 |
+-----------------------------------------------------------+
9 rows in set (0.06 sec)
MySQL [test]> explain select a.name, a.salary, b.bonus from emp a, emp_bonus_tg2 b where a.id=b.emp_id and a.id in (20, 6);
+-----------------------------------------------------------+
| LOGICAL EXECUTIONPLAN |
+-----------------------------------------------------------+
| Project(name="name", salary="salary", bonus="bonus") |
| BKAJoin(condition="emp_id = id", type="inner") |
| Gather(concurrent=true) |
| LogicalView(tables="emp[p1,p2]", shardCount=2, sql="SELECT `id`, `name`, `salary` FROM `emp` AS `emp` WHERE (`id` IN(?, ?))") |
| Gather(concurrent=true) |
| LogicalView(tables="emp_bonus_tg2[p1,p2]", shardCount=2, sql="SELECT `emp_id`, `bonus` FROM `emp_bonus_tg2` AS `emp_bonus_tg2` WHERE ((`emp_id` IN(?, ?)) AND (`emp_id` IN (...)))") |
| HitCache:false |
| Source:PLAN_CACHE |
| TemplateId: a9b84351
+-----------------------------------------------------------+
9 rows in set (0.06 sec)
可以看到,只有同一个表组的表join时join操作可以下推到存储节点执行,其它情况下都需要在计算节点做join操作。
5 分析及结论
在两表join时,如果是根据分区键查询,在数据可以定位到一个节点的情况下,优化器能够找到数据所在的分区,将数据下推到正确的存储节点执行。如果查询的数据涉及到多个分区,则需要join的表在同一个表组,才能将join操作下推到存储节点执行。
另外,PolarDB-X的tablegroup功能的涉及比较任性化,默认情况下就可以将分区方式一致的表设置到统一表组,减轻了开发和运维的负担。