Greenplum 大宽表 OR 分层关系 - 大量nestloop，补齐其他字段的性能损耗-阿里云开发者社区

Greenplum 大宽表 OR 分层关系 - 大量nestloop，补齐其他字段的性能损耗

2019-04-27 1454

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

本文涉及的产品

云数据库 Tair（兼容Redis），内存型 2GB

Redis 开源版，标准版 2GB

RDS MySQL Serverless 基础系列，0.5-2RCU 50GB

简介： 标签PostgreSQL , Greenplum , 宽表 , 关系 , 循环 , 性能背景GPDB中，使用关系存储，还是使用大宽表呢？关系存储，在查询其他表的内容时，需要JOIN补齐。JOIN可能需要重分布数据，维度表可以解决大量数据重分布的问题。

背景

GPDB中，使用关系存储，还是使用大宽表呢？

关系存储，在查询其他表的内容时，需要JOIN补齐。JOIN可能需要重分布数据，维度表可以解决大量数据重分布的问题。

大宽表，则可能存在数据冗余的问题，但是不需要JOIN。

业务可以在两者之间权衡。

下面的例子是补齐字段的例子，虽然JOIN补齐内容的表只有一条记录，但是使用了嵌套循环，性能相差了几十倍。

postgres=# create table a (id int, info text, c1 int);    
NOTICE:  Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'id' as the Greenplum Database data distribution key for this table.    
HINT:  The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew.    
CREATE TABLE    
  
postgres=# create table b(c1 int, c2 int, c3 int, c4 int);    
NOTICE:  Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'c1' as the Greenplum Database data distribution key for this table.    
HINT:  The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew.    
CREATE TABLE    
  
postgres=# insert into a select id, 'test', 1 from generate_series(1,10000000) t(id);    
INSERT 0 10000000    
postgres=# insert into b values (1,1,1,1);    
INSERT 0 1    
    
postgres=# explain analyze select count(*) from (select a.id,a.info,a.c1,b.c1,b.c2 from a cross join b where b.c2=1) t;    
                                                                           QUERY PLAN                                                                               
----------------------------------------------------------------------------------------------------------------------------------------------------------------    
 Aggregate  (cost=6738189.77..6738189.78 rows=1 width=8)    
   Rows out:  1 rows with 0.003 ms to first row, 0.007 ms to end, start offset by 26 ms.    
   ->  Gather Motion 33:1  (slice2; segments: 33)  (cost=6738189.41..6738189.76 rows=1 width=8)    
         Rows out:  33 rows at destination with 5.391 ms to first row, 1481 ms to end, start offset by 26 ms.    
         ->  Aggregate  (cost=6738189.41..6738189.42 rows=1 width=8)    
               Rows out:  Avg 1.0 rows x 33 workers.  Max 1 rows (seg0) with 0.005 ms to first row, 17 ms to end, start offset by 28 ms.    
               ->  Nested Loop  (cost=1.39..6713190.40 rows=303019 width=0)    
                     Rows out:  Avg 303030.3 rows x 33 workers.  Max 303531 rows (seg23) with 0.006 ms to first row, 30 ms to end, start offset by 28 ms.    
                     ->  Seq Scan on a  (cost=0.00..113451.03 rows=303019 width=0)    
                           Rows out:  Avg 303030.3 rows x 33 workers.  Max 303531 rows (seg23) with 0.004 ms to first row, 18 ms to end, start offset by 28 ms.    
                     ->  Materialize  (cost=1.39..1.72 rows=1 width=0)    
                           Rows out:  0 rows (seg0) with 1227 ms to end, start offset by 1507 ms.    
                           ->  Broadcast Motion 33:33  (slice1; segments: 33)  (cost=0.00..1.35 rows=1 width=0)    
                                 Rows out:  Avg 1.0 rows x 33 workers at destination.  Max 1 rows (seg0) with 24 ms to end of 2 scans, start offset by 53 ms.    
                                 ->  Seq Scan on b  (cost=0.00..1.01 rows=1 width=0)    
                                       Filter: c2 = 1    
                                       Rows out:  1 rows (seg27) with 0.125 ms to first row, 0.126 ms to end.    
 Slice statistics:    
   (slice0)    Executor memory: 437K bytes.    
   (slice1)    Executor memory: 525K bytes avg x 33 workers, 540K bytes max (seg27).    
   (slice2)    Executor memory: 385K bytes avg x 33 workers, 385K bytes max (seg0).    
 Statement statistics:    
   Memory used: 1280000K bytes    
 Optimizer status: legacy query optimizer    
 Total runtime: 1510.517 ms    
(25 rows)

如果业务上先从单表查出内容，直接在结果中补齐，而不是JOIN，那么只需要75毫秒。

postgres=# explain analyze select count(*) from (select a.id,a.info,a.c1,1,1 from a where a.c1=1) t;    
                                                                         QUERY PLAN                                                                              
-------------------------------------------------------------------------------------------------------------------------------------------------------------    
 Aggregate  (cost=163449.41..163449.42 rows=1 width=8)    
   Rows out:  1 rows with 0.001 ms to first row, 0.008 ms to end, start offset by 1.746 ms.    
   ->  Gather Motion 33:1  (slice1; segments: 33)  (cost=163449.05..163449.40 rows=1 width=8)    
         Rows out:  33 rows at destination with 2.365 ms to first row, 72 ms to end, start offset by 1.747 ms.    
         ->  Aggregate  (cost=163449.05..163449.06 rows=1 width=8)    
               Rows out:  Avg 1.0 rows x 33 workers.  Max 1 rows (seg0) with 0.002 ms to first row, 12 ms to end, start offset by 1.505 ms.    
               ->  Seq Scan on a  (cost=0.00..138450.04 rows=303019 width=0)    
                     Filter: c1 = 1    
                     Rows out:  Avg 303030.3 rows x 33 workers.  Max 303531 rows (seg23) with 5.279 ms to first row, 24 ms to end, start offset by 1.700 ms.    
 Slice statistics:    
   (slice0)    Executor memory: 373K bytes.    
   (slice1)    Executor memory: 302K bytes avg x 33 workers, 302K bytes max (seg0).    
 Statement statistics:    
   Memory used: 1280000K bytes    
 Optimizer status: legacy query optimizer    
 Total runtime: 75.788 ms    
(16 rows)

Greenplum 大宽表 OR 分层关系 - 大量nestloop，补齐其他字段的性能损耗

标签

背景

数据库

热门文章

最新文章

相关课程

相关电子书

相关实验场景