postgresql 分区与优化

本文涉及的产品
云原生数据库 PolarDB PostgreSQL 版,标准版 2核4GB 50GB
云原生数据库 PolarDB MySQL 版,通用型 2核4GB 50GB
简介:

--对于分区表constraint_exclusion 这个参数需要配置为partition或on
postgres=# show constraint_exclusion ;
 constraint_exclusion 
----------------------
 partition

 
 
 --创建父子表, 用于存储分区数据
create table t(id int primary key);
create table t1(like t including all) inherits(t);
create table t2(like t including all) inherits(t);
create table t3(like t including all) inherits(t);
create table t4(like t including all) inherits(t);
--PostgreSQL的子表和子表之间的约束是没有任何关系的, 所以也可以有重叠, 即非全局约束.
 alter table t1 add constraint ck_t1_1 check(id<0);
 alter table t2 add constraint ck_t2_1 check(id>=0 and id<100);
 alter table t3 add constraint ck_t3_1 check(id>=100 and id<200);
 alter table t4 add constraint ck_t4_1 check(id>=200);
 
 --分区字段传入常量, 执行时扫描的是父表和约束对应的子表 :
postgres=#  explain select * from t where id=10;
                                 QUERY PLAN                                  
-----------------------------------------------------------------------------
 Append  (cost=0.00..8.17 rows=2 width=4)
   ->  Seq Scan on t  (cost=0.00..0.00 rows=1 width=4)
         Filter: (id = 10)
   ->  Index Only Scan using t2_pkey on t2  (cost=0.15..8.17 rows=1 width=4)
         Index Cond: (id = 10)
(5 rows)

--分区字段传入常量, 执行时扫描的是父表和约束对应的子表;
postgres=#  prepare p_test as select * from t where id=$1;
PREPARE
postgres=# explain execute p_test(1);
                                 QUERY PLAN                                  
-----------------------------------------------------------------------------
 Append  (cost=0.00..8.17 rows=2 width=4)
   ->  Seq Scan on t  (cost=0.00..0.00 rows=1 width=4)
         Filter: (id = 1)
   ->  Index Only Scan using t2_pkey on t2  (cost=0.15..8.17 rows=1 width=4)
         Index Cond: (id = 1)
(5 rows)

--子句查询, 执行时扫描的是父表和所有子表, 注意这里使用的子查询是子表的查询, 理论上应该是扫描父表和该子表
postgres=#  explain select * from t where id=(select id from t1 limit 1);
                                 QUERY PLAN                                  
-----------------------------------------------------------------------------
 Append  (cost=0.01..32.70 rows=5 width=4)
   InitPlan 1 (returns $0)
     ->  Limit  (cost=0.00..0.01 rows=1 width=4)
           ->  Seq Scan on t1 t1_1  (cost=0.00..34.00 rows=2400 width=4)
   ->  Seq Scan on t  (cost=0.00..0.00 rows=1 width=4)
         Filter: (id = $0)
   ->  Index Only Scan using t1_pkey on t1  (cost=0.15..8.17 rows=1 width=4)
         Index Cond: (id = $0)
   ->  Index Only Scan using t2_pkey on t2  (cost=0.15..8.17 rows=1 width=4)
         Index Cond: (id = $0)
   ->  Index Only Scan using t3_pkey on t3  (cost=0.15..8.17 rows=1 width=4)
         Index Cond: (id = $0)
   ->  Index Only Scan using t4_pkey on t4  (cost=0.15..8.17 rows=1 width=4)
         Index Cond: (id = $0)
(14 rows)

--综上可知在对分区表进行查询时最好使用字面常量,而不要使用子查询之类复杂的sql


--如果子表上约束删除,则pg不得不把删除约束的子表也加入到查询中(即使子表可以忽略)
alter table t4 drop constraint ck_t4_1;
postgres=#  explain select * from t where id=10;   
                                 QUERY PLAN                                  
-----------------------------------------------------------------------------
 Append  (cost=0.00..16.34 rows=3 width=4)
   ->  Seq Scan on t  (cost=0.00..0.00 rows=1 width=4)
         Filter: (id = 10)
   ->  Index Only Scan using t2_pkey on t2  (cost=0.15..8.17 rows=1 width=4)
         Index Cond: (id = 10)
   ->  Index Only Scan using t4_pkey on t4  (cost=0.15..8.17 rows=1 width=4)
         Index Cond: (id = 10)
(7 rows)



--如果constraint_exclusion设置为off,pg不得不进行全表扫描
postgres=# set constraint_exclusion=off;
SET
postgres=#  explain select * from t where id=10;
                                 QUERY PLAN                                  
-----------------------------------------------------------------------------
 Append  (cost=0.00..32.69 rows=5 width=4)
   ->  Seq Scan on t  (cost=0.00..0.00 rows=1 width=4)
         Filter: (id = 10)
   ->  Index Only Scan using t1_pkey on t1  (cost=0.15..8.17 rows=1 width=4)
         Index Cond: (id = 10)
   ->  Index Only Scan using t2_pkey on t2  (cost=0.15..8.17 rows=1 width=4)
         Index Cond: (id = 10)
   ->  Index Only Scan using t3_pkey on t3  (cost=0.15..8.17 rows=1 width=4)
         Index Cond: (id = 10)
   ->  Index Only Scan using t4_pkey on t4  (cost=0.15..8.17 rows=1 width=4)
         Index Cond: (id = 10)
(11 rows)



--分区表上一般针对分区建立相对应的分区索引
--建在父表的索引为全局索引,但如果你表没有数据要查询子表时,则分区表要进行全表扫描

--父表建立的全局索引
postgres=# \d+ p
                                       Table "public.p"
  Column   |              Type              | Modifiers | Storage | Stats target | Description 
-----------+--------------------------------+-----------+---------+--------------+-------------
 city_id   | integer                        | not null  | plain   |              | 
 logtime   | timestamp(0) without time zone | not null  | plain   |              | 
 peaktemp  | integer                        |           | plain   |              | 
 unitsales | integer                        |           | plain   |              | 
Indexes:
    "idx_city_id" btree (city_id)
    "idx_p_logtime" btree (logtime)
Triggers:
    delete_p_trigger BEFORE DELETE ON p FOR EACH ROW EXECUTE PROCEDURE p_delete_trigger()
    insert_p_trigger BEFORE INSERT ON p FOR EACH ROW EXECUTE PROCEDURE p_insert_trigger()
Child tables: p_201201,
              p_201202,
              p_201203,
              p_201204,
              p_201205,
              p_201206,
              p_201207,
              p_201208,
              p_201209,
              p_201210,
              p_201211,
              p_201212,
              p_default
Has OIDs: no


--分区没有索引,不能使用父表索引
postgres=# explain select * from p_201202 where city_id=2 and logtime=timestamp '2012-02-02 12:59:59';
                                          QUERY PLAN                                          
----------------------------------------------------------------------------------------------
 Seq Scan on p_201202  (cost=0.00..214.01 rows=2 width=20)
   Filter: ((city_id = 2) AND (logtime = '2012-02-02 12:59:59'::timestamp without time zone))
(2 rows)


--建立分区索引,可以使用分区索引
postgres=# CREATE INDEX idx_p_201202_city_id ON p_201202 (city_id);
CREATE INDEX
postgres=# explain select * from p_201202 where city_id=2 and logtime=timestamp '2012-02-02 12:59:59';
                                      QUERY PLAN                                      
--------------------------------------------------------------------------------------
 Index Scan using idx_p_201202_city_id on p_201202  (cost=0.29..8.33 rows=2 width=20)
   Index Cond: (city_id = 2)
   Filter: (logtime = '2012-02-02 12:59:59'::timestamp without time zone)



--也可以指定只查询父表的数据

postgres=# select * from only p;
 city_id | logtime | peaktemp | unitsales 
---------+---------+----------+-----------
(0 rows)

--如果一个分区表,父子表之间不再有继承关系,则查询父表时不再过滤到子表
postgres=# alter table t3 no inherit t; 
ALTER TABLE
postgres=# explain select count(*) from t;
                            QUERY PLAN                            
------------------------------------------------------------------
 Aggregate  (cost=73.50..73.51 rows=1 width=0)
   ->  Append  (cost=0.00..62.80 rows=4281 width=0)
         ->  Seq Scan on t  (cost=0.00..0.00 rows=1 width=0)
         ->  Seq Scan on t1  (cost=0.00..31.40 rows=2140 width=0)
         ->  Seq Scan on t2  (cost=0.00..31.40 rows=2140 width=0)
(5 rows)

--再次添加继承,查询父表可以过滤到子表
postgres=# alter table t3 inherit t;      
ALTER TABLE
postgres=# explain select count(*) from t;
                            QUERY PLAN                            
------------------------------------------------------------------
 Aggregate  (cost=110.25..110.26 rows=1 width=0)
   ->  Append  (cost=0.00..94.20 rows=6421 width=0)
         ->  Seq Scan on t  (cost=0.00..0.00 rows=1 width=0)
         ->  Seq Scan on t1  (cost=0.00..31.40 rows=2140 width=0)
         ->  Seq Scan on t2  (cost=0.00..31.40 rows=2140 width=0)
         ->  Seq Scan on t3  (cost=0.00..31.40 rows=2140 width=0)
(6 rows)


--以下为p表测试数据代码
CREATE TABLE p (
    city_id         int not null,
    logtime         timestamp(0) not null,
    peaktemp        int,
    unitsales       int
);


CREATE INDEX idx_p_logtime ON p (logtime);


CREATE TABLE p_201201 (LIKE p INCLUDING all) INHERITS (p);
CREATE TABLE p_201202 (LIKE p INCLUDING all) INHERITS (p);
CREATE TABLE p_201203 (LIKE p INCLUDING all) INHERITS (p);
CREATE TABLE p_201204 (LIKE p INCLUDING all) INHERITS (p);
CREATE TABLE p_201205 (LIKE p INCLUDING all) INHERITS (p);
CREATE TABLE p_201206 (LIKE p INCLUDING all) INHERITS (p);
CREATE TABLE p_201207 (LIKE p INCLUDING all) INHERITS (p);
CREATE TABLE p_201208 (LIKE p INCLUDING all) INHERITS (p);
CREATE TABLE p_201209 (LIKE p INCLUDING all) INHERITS (p);
CREATE TABLE p_201210 (LIKE p INCLUDING all) INHERITS (p);
CREATE TABLE p_201211 (LIKE p INCLUDING all) INHERITS (p);
CREATE TABLE p_201212 (LIKE p INCLUDING all) INHERITS (p);
CREATE TABLE p_default (LIKE p INCLUDING all) INHERITS (p);





CREATE OR REPLACE FUNCTION p_insert_trigger()
RETURNS TRIGGER AS $$
BEGIN
    IF    ( NEW.logtime >= DATE '2012-01-01' AND NEW.logtime < DATE '2012-02-01' ) THEN
        INSERT INTO p_201201 VALUES (NEW.*);
    ELSIF ( NEW.logtime >= DATE '2012-02-01' AND NEW.logtime < DATE '2012-03-01' ) THEN
        INSERT INTO p_201202 VALUES (NEW.*);
    ELSIF ( NEW.logtime >= DATE '2012-03-01' AND NEW.logtime < DATE '2012-04-01' ) THEN
        INSERT INTO p_201203 VALUES (NEW.*);
    ELSIF ( NEW.logtime >= DATE '2012-04-01' AND NEW.logtime < DATE '2012-05-01' ) THEN
        INSERT INTO p_201204 VALUES (NEW.*);
    ELSIF ( NEW.logtime >= DATE '2012-05-01' AND NEW.logtime < DATE '2012-06-01' ) THEN
        INSERT INTO p_201205 VALUES (NEW.*);
    ELSIF ( NEW.logtime >= DATE '2012-06-01' AND NEW.logtime < DATE '2012-07-01' ) THEN
        INSERT INTO p_201206 VALUES (NEW.*);
    ELSIF ( NEW.logtime >= DATE '2012-07-01' AND NEW.logtime < DATE '2012-08-01' ) THEN
        INSERT INTO p_201207 VALUES (NEW.*);
    ELSIF ( NEW.logtime >= DATE '2012-08-01' AND NEW.logtime < DATE '2012-09-01' ) THEN
        INSERT INTO p_201208 VALUES (NEW.*);
    ELSIF ( NEW.logtime >= DATE '2012-09-01' AND NEW.logtime < DATE '2012-10-01' ) THEN
        INSERT INTO p_201209 VALUES (NEW.*);
    ELSIF ( NEW.logtime >= DATE '2012-10-01' AND NEW.logtime < DATE '2012-11-01' ) THEN
        INSERT INTO p_201210 VALUES (NEW.*);
    ELSIF ( NEW.logtime >= DATE '2012-11-01' AND NEW.logtime < DATE '2012-12-01' ) THEN
        INSERT INTO p_201211 VALUES (NEW.*);
    ELSIF ( NEW.logtime >= DATE '2012-12-01' AND NEW.logtime < DATE '2013-01-01' ) THEN
        INSERT INTO p_201212 VALUES (NEW.*);
    ELSIF ( NEW.logtime >= DATE '2013-01-01' OR NEW.logtime < DATE '2012-01-01' ) THEN
        INSERT INTO p_default VALUES (NEW.*);
    ELSE
        RAISE EXCEPTION 'Date out of range.  Fix the p_insert_trigger() function!';
    END IF;
    RETURN NULL;
END;
$$ LANGUAGE plpgsql;



CREATE OR REPLACE FUNCTION p_delete_trigger()
RETURNS TRIGGER AS $$
BEGIN
    IF    ( OLD.logtime >= DATE '2012-01-01' AND OLD.logtime < DATE '2012-02-01' ) THEN
        DELETE FROM p_201201 WHERE logtime=OLD.logtime;
    ELSIF ( OLD.logtime >= DATE '2012-02-01' AND OLD.logtime < DATE '2012-03-01' ) THEN
        DELETE FROM p_201202 WHERE logtime=OLD.logtime;
    ELSIF ( OLD.logtime >= DATE '2012-03-01' AND OLD.logtime < DATE '2012-04-01' ) THEN
        DELETE FROM p_201203 WHERE logtime=OLD.logtime;
    ELSIF ( OLD.logtime >= DATE '2012-04-01' AND OLD.logtime < DATE '2012-05-01' ) THEN
        DELETE FROM p_201204 WHERE logtime=OLD.logtime;
    ELSIF ( OLD.logtime >= DATE '2012-05-01' AND OLD.logtime < DATE '2012-06-01' ) THEN
        DELETE FROM p_201205 WHERE logtime=OLD.logtime;
    ELSIF ( OLD.logtime >= DATE '2012-06-01' AND OLD.logtime < DATE '2012-07-01' ) THEN
        DELETE FROM p_201206 WHERE logtime=OLD.logtime;
    ELSIF ( OLD.logtime >= DATE '2012-07-01' AND OLD.logtime < DATE '2012-08-01' ) THEN
        DELETE FROM p_201207 WHERE logtime=OLD.logtime;
    ELSIF ( OLD.logtime >= DATE '2012-08-01' AND OLD.logtime < DATE '2012-09-01' ) THEN
        DELETE FROM p_201208 WHERE logtime=OLD.logtime;
    ELSIF ( OLD.logtime >= DATE '2012-09-01' AND OLD.logtime < DATE '2012-10-01' ) THEN
        DELETE FROM p_201209 WHERE logtime=OLD.logtime;
    ELSIF ( OLD.logtime >= DATE '2012-10-01' AND OLD.logtime < DATE '2012-11-01' ) THEN
        DELETE FROM p_201210 WHERE logtime=OLD.logtime;
    ELSIF ( OLD.logtime >= DATE '2012-11-01' AND OLD.logtime < DATE '2012-12-01' ) THEN
        DELETE FROM p_201211 WHERE logtime=OLD.logtime;
    ELSIF ( OLD.logtime >= DATE '2012-12-01' AND OLD.logtime < DATE '2013-01-01' ) THEN
        DELETE FROM p_201212 WHERE logtime=OLD.logtime;
    ELSIF ( OLD.logtime >= DATE '2013-01-01' OR OLD.logtime < DATE '2012-01-01' ) THEN
        DELETE FROM p_default WHERE logtime=OLD.logtime;
    ELSE
        RAISE EXCEPTION 'Date out of range.  Fix the p_insert_trigger() function!';
    END IF;
    RETURN NULL;
END;
$$ LANGUAGE plpgsql;



CREATE TRIGGER insert_p_trigger
    BEFORE INSERT ON p
    FOR EACH ROW EXECUTE PROCEDURE p_insert_trigger();

CREATE TRIGGER delete_p_trigger
    BEFORE DELETE ON p
    FOR EACH ROW EXECUTE PROCEDURE p_delete_trigger();
	
	
	
INSERT INTO p (city_id, logtime, peaktemp, unitsales) VALUES (1, timestamp '2012-01-02 12:59:59', 20, 10);
INSERT INTO p (city_id, logtime, peaktemp, unitsales) VALUES (2, timestamp '2012-02-02 12:59:59', 20, 10);
INSERT INTO p (city_id, logtime, peaktemp, unitsales) VALUES (3, timestamp '2012-03-02 12:59:59', 20, 10);
INSERT INTO p (city_id, logtime, peaktemp, unitsales) VALUES (4, timestamp '2012-04-02 12:59:59', 20, 10);
INSERT INTO p (city_id, logtime, peaktemp, unitsales) VALUES (5, timestamp '2012-05-02 12:59:59', 20, 10);
INSERT INTO p (city_id, logtime, peaktemp, unitsales) VALUES (6, timestamp '2012-06-02 12:59:59', 20, 10);
INSERT INTO p (city_id, logtime, peaktemp, unitsales) VALUES (7, timestamp '2012-07-02 12:59:59', 20, 10);
INSERT INTO p (city_id, logtime, peaktemp, unitsales) VALUES (8, timestamp '2012-08-02 12:59:59', 20, 10);
INSERT INTO p (city_id, logtime, peaktemp, unitsales) VALUES (9, timestamp '2012-09-02 12:59:59', 20, 10);
INSERT INTO p (city_id, logtime, peaktemp, unitsales) VALUES (10, timestamp '2012-10-02 12:59:59', 20, 10);
INSERT INTO p (city_id, logtime, peaktemp, unitsales) VALUES (11, timestamp '2012-11-02 12:59:59', 20, 10);
INSERT INTO p (city_id, logtime, peaktemp, unitsales) VALUES (12, timestamp '2012-12-02 12:59:59', 20, 10);
INSERT INTO p (city_id, logtime, peaktemp, unitsales) VALUES (13, timestamp '2013-01-02 12:59:59', 20, 10);
INSERT INTO p (city_id, logtime, peaktemp, unitsales) VALUES (14, timestamp '2011-12-02 12:59:59', 20, 10);


INSERT INTO p (city_id, logtime, peaktemp, unitsales) select m, timestamp '2012-02-02 12:59:59', 20, 10 from generate_series(1,10000) m;


explain select * from p_201202 where city_id=2 and logtime=timestamp '2012-02-02 12:59:59';




相关实践学习
使用PolarDB和ECS搭建门户网站
本场景主要介绍基于PolarDB和ECS实现搭建门户网站。
阿里云数据库产品家族及特性
阿里云智能数据库产品团队一直致力于不断健全产品体系,提升产品性能,打磨产品功能,从而帮助客户实现更加极致的弹性能力、具备更强的扩展能力、并利用云设施进一步降低企业成本。以云原生+分布式为核心技术抓手,打造以自研的在线事务型(OLTP)数据库Polar DB和在线分析型(OLAP)数据库Analytic DB为代表的新一代企业级云原生数据库产品体系, 结合NoSQL数据库、数据库生态工具、云原生智能化数据库管控平台,为阿里巴巴经济体以及各个行业的企业客户和开发者提供从公共云到混合云再到私有云的完整解决方案,提供基于云基础设施进行数据从处理、到存储、再到计算与分析的一体化解决方案。本节课带你了解阿里云数据库产品家族及特性。
目录
相关文章
|
存储 安全 关系型数据库
PostgreSQL物化视图增量更新扩展 -- pg_ivm
PostgreSQL不支持物化视图增量更新,需要定期执行REFRESH MATERIALIZED VIEW命令刷新物化视图。Incremental View Maintenance (IVM)是一种使物化视图保持最新的方法,其中只计算增量更改并将其应用于视图,而不是REFRESH MATERIALIZED VIEW那样从头开始重新计算内容。当只更改视图的一小部分时,IVM可以比重新计算更高效地更新物化视图。
|
4月前
|
监控 关系型数据库 数据库
PostgreSQL的索引优化策略?
【8月更文挑战第26天】PostgreSQL的索引优化策略?
100 1
|
4月前
|
缓存 关系型数据库 数据库
PostgreSQL性能
【8月更文挑战第26天】PostgreSQL性能
72 1
|
3月前
|
缓存 关系型数据库 数据库
PostgreSQL的性能
PostgreSQL的性能
155 2
|
4月前
|
缓存 关系型数据库 数据库
PostgreSQL 查询性能
【8月更文挑战第5天】PostgreSQL 查询性能
82 8
|
关系型数据库 数据管理 Go
《PostgreSQL数据分区:原理与实战》
《PostgreSQL数据分区:原理与实战》
202 0
|
存储 SQL 运维
PolarDB MySQL大表实践-分区表篇
背景:分区表到底是什么?分区作为传统企业级数据库的特性,早已经在很多大数据和数仓场景中得到广泛应用。基于维基百科的解释,分区是将逻辑数据库或其组成元素如表、表空间等划分为不同的独立部分。数据库分区通常是出于可管理性、性能或可用性的原因,或者是为了负载平衡。它在分布式数据库管理系统中很流行,其中每个分区可能分布在多个节点上,节点上的用户在分区上执行本地事务。这提高了具有涉及某些数据视图的常规事务的站
814 0
PolarDB MySQL大表实践-分区表篇
|
存储 SQL 监控
16PostgreSQL 本地分区表的用法和优化|学习笔记
快速学习16PostgreSQL 本地分区表的用法和优化
889 0
16PostgreSQL 本地分区表的用法和优化|学习笔记
|
存储 关系型数据库 数据库
如何在PostgreSQL中更新大表
如何在PostgreSQL中更新大表
|
存储 设计模式 关系型数据库
MySQL分区表最佳实践
分区是一种表的设计模式,通俗地讲表分区是将一大表,根据条件分割成若干个小表。但是对于应用程序来讲,分区的表和没有分区的表是一样的。换句话来讲,分区对于应用是透明的,只是数据库对于数据的重新整理。本篇文章给大家带来的内容是关于MySQL中分区表的介绍及使用场景,有需要的朋友可以参考一下,希望对你有所帮助。
632 0