开发者学堂课程【PostgreSQL云上开发实践:阿里云PostgreSQL_开发实践_3】学习笔记,与课程紧密联系,让用户快速学习知识。
课程地址:https://developer.aliyun.com/learning/course/528/detail/7106
阿里云PostgreSQL_开发实践_3
目录:
一. 数据同步
二. 数据订阅
三. 跨库访问
四. 外部表
五. FDW外部表–数据融合
六. 定时任务
七.执行计划
九.JOIN优化
十.递归查询
十一. AOI优化
十二. 空间数据使用建议
一. 数据同步
1.DTS
2.DATAX
3.rds_dbsync
4.PG10 订阅功能
二. 数据订阅
集群级订阅
- https://github.com/digoal/blog/blob/master/201707/
表级订阅
- https://github.com/digoal/blog/blob/master/201702/20170227_01.md
多通道订阅
- https://github.com/digoal/blog/blob/master/201706/20170624_01.md
DDL 订阅
1、主库:创建存储 DDL 的目标表
create table ddl_record(
id serial8 primary key, -- 主键
content text, -- DDL内容
tbl_name name, -- 表名
sub boolean, -- 是否需要发布此表, true 表示发布, false 表示不发布
pub_name name[], -- 发布到哪些publication, 使用数组表示
crt_time timestamp -- 时间
);
2、主库:创建测试表
create table t_test (
id serial primary key,
info text,
crt_time timestamp
);
3、主库:创建封装 DDL 的 UDF
create or replace function exec_ddl(v_content text, v_tbl_name name default null, v_sub boolean default null, v_pub_name name[] default null) returns void as $$
declare
pub_name name;
begin
-- 检查DDL与tbl_name中对象名是否一致。初略检查。
if v_content !~* v_tbl_name then
raise exception 'you must specify correct table name with $1 and $2';
end if;
-- 执行DDL
execute v_content;
-- 插入订阅表
insert into ddl_record(content, tbl_name, sub, pub_name, crt_time) values (v_content, v_tbl_name, v_sub, v_pub_name, clock_timestamp());
-- 如果包含create table,并选择了发布,那么发布此表
if v_sub and lower(v_content) ~ 'create +table' then
foreach pub_name in array v_pub_name loop
execute format ('ALTER PUBLICATION %s add table %s', pub_name, v_tbl_name);
end loop;
end if;
end;
$$ language plpgsql strict;
4、主库:创建发布
CREATE PUBLICATION pub1;
ALTER PUBLICATION pub1 ADD TABLE ddl_record, t_test;
5、备库:创建初次订阅
初次订阅,需要订阅 ddl_RECORD和t_test
首先要创建 TABLE
create table ddl_record(
id serial8 primary key, -- 主键
content text, -- DDL内容
tbl_name name, -- 表名
sub boolean, -- 是否需要订阅此表
pub_name name[], -- 发布到哪些publication
crt_time timestamp -- 时间
);
create table t_test (
id serial primary key,
info text,
crt_time timestamp
);
订阅,指定发布端的连接地址,注意如果需要密码,也请输入。
create subscription sub1 connection 'hostaddr=127.0.0.1 port=1999 user=postgres dbname=postgres' publication pub1;
- https://github.com/digoal/blog/blob/master/201712/20171204_04.md
三. 跨库访问
dblink
- create extension dblink;
- PPAS支持PostgreSQL, Oracle 两种DBLINK
l https://github.com/digoal/blog/blob/master/201801/20180111.md
l
CREATE DATABASE LINK chicago CONNECT TO admin IDENTIFIED BY'mypassword'USING oci '//127.0.0.1/acctg';
l
CREATE DATABASE LINK boston CONNECT TO admin IDENTIFIED BY'mypassword' USING libpq 'host=127.0.0.1 dbname=sales';
四. 外部表
l 基于 dblink 的视图
-PPAS 支持 PostgreSQL, Oracle 两种 DBLINK
- https://www.postgresql.org/docs/10/static/dblink.html
l postgres_fdw 外部表
- https://www.postgresql.org/docs/10/static/postgres-fcw.html
l oracle_fdw 外部表
- https://pgxn.org/dist/oracle_fdw/
五. FDW 外部表–数据融合
l 用于分级存储、数据库互通
l OSS 外部表
-分级存储:
RDS PG OSS 外部表文档:
https://help.alivyun.com/knowledge_detail/43352.html-
HDB PG oss 外部表文档:
https://help.alivumn.com/document_detail/35457.html
其他外部表
- https://wiki.postgresql.org/wiki/Fdw-数据库互通。
- file
- oracle
-mysql
- sqlserver
- hadoop.....
五. 分区表
l PG内置分区表语法
-https://github.com/digoal/blog/blob/master/201612/20161215_01.md
-https://github.com/digoal/blog/blob/master/201802/20180205_02.md
l PG传统分区表
- https://github.com/digoal/blog/blob/master/201711/201711
l pg_pathman分区表
-https://github.com/digoal/blog/blob/master/201710/20171015_01.md
- https://github.com/digoal/blog/blob/master/201610/20161024 01.md
六. 定时任务
l Data Studio
l Crontab
- https://github.com/digoal/blog/blob/master/201305/20130531_02.md
l pgagent
- https://github.com/digoal/blog/blob/master/201305/20130531_01.md
七.执行计划
l
postgres=# explain (analyze,verbose,timing,costs,buffers) select count(*) from a where id=1;
l
QUERY PLAN
l
Aggregate (cost=2.85..2.86 rows=1 width=8)(actual time=0.543..0.543 rows=1 loops=1)
l
Output: count(*)
l
Buffers: shared read=4
l
lndex Only Scan using a_pkey on public.a (cost=0.43.2.85 rows=1 width=0)(actual time=0.532..0.533 rows=1loops=1)
l
Output: id
l
lndex Cond: (a.id = 1)
Heap Fetches: 1
Buffers: shared read=4
l
Planning time: 0.914 ms
l
Execution time: 0.591 ms
l
(10 rows)
CBO成本因子
l
#seq_page_cost = 1.0
l
#
measured on an arbitrary scale
l
#random_page_cost = 4.0
l
# same scale as above
l
#cpu_tuple_cost = 0.01
l
#
same scale as above
l
#cpu_index_tuple_cost = 0.005
l
# same scale as above
l
#cpu_operator_cost = 0.0025
l
# same scale as above
l
#parallel_tuple_cost = 0.1
l
# same scale as above
l
#parallel_setup_cost = 1000.0
l
# same scale as above
l
#min_parallel_table_scan_size = 8M
l
#min_parallel_index_scan_size = 512kB
l
#effective_cache_size = 4GB
八. JOIN 优化
l 多表 JOIN 时,JOIN 顺序直接决定了最终成本
–类似经典的商旅问题(走完所有的点),实际更复杂
-TSP (traveling salesman problem)
优化器遗传算法设置
l 解决多表 JOIN 优化器穷举带来的性能问题
l
GEQo (geqo_threshod)
- traveling salesman problem (TSP)
-D. Whitley's Genitor algorithm
- https://www.postgresql.org/docs/10/static/geqo.html
l
All Pairs shortest Path,Johnson's Algorithm.
l
All Pairs Shortest Path,Floyd-Warshall Algorithm
l
Shortest Path A*
l
Bi-directional Dijkstra Shortest Path
l
Bi-directional A* Shortest Path
l
Shortest Path Dijkstra
l
Driving Distance
l
K-Shortest Path,Multiple Alternative Paths
l
K-Dijkstra, One to Many Shortest Path
l
Traveling Sales Person
l
Turn Restriction Shortest Path (TRSP)
九. 强制固定JOIN顺序
l 1 disables collapsing of explicit JOIN clauses
l 控制子查询提升
-# from_collapse_limit = 8
l 控制显示INNER JOIN顺序
- # join_collapse_limit = 8
递归查询案例-图式搜索
十、递归查询
l UNION 去重复(去重复时 NULL 视为等同)
l 图中所有输出都涉及 UNION[ALL] 的操作,包含以往返回的记录和当前返回的记录
空间查询优化
l GIST 索引面收敛查询优化
- https://github.com/digoal/blog/blob/master/201711/20171122_03.md
十一. AOI 优化
l GiST 空间索引结构为 bound box,对于不规则多边形,会引入一些边界放大问题
-CPU放大
-IO放大
l 优化方法
-空间SPLIT
l https://github.com/digoal/blog/blob/master/201710/20171oo4 oi.md
空间查询优化
l GIST索引面收敛查询优化
- https://github.com/digoal/blog/blob/master/201711/20171122_03.md
例子:
1、创建测试表
create table tbl (id int, pos geometry);
2、写入1000万个空间点
insert into tbl select id,
st_setsrid(
st_makepoint(
round((random()*(135.085831-73.406586)+73.406586)::numeric,6),
round((random()*(53.880950-3.408477)+3.408477)::numeric,6)
),
4326
) from generate_series(1,10000000) t(id);
3、创建空间索引
create index idx_tbl_pos on tbl using gist(pos);
4、使用多个polygon构造成一个multi polygon
select st_union(
array[
st_setsrid(st_makepolygon(ST_GeomFromText('LINESTRING(75.15 29.53,77 29,77.6 29.5, 75.15 29.53)')), 4326),
5、使用 multi polygon 搜索
explain (analyze,verbose,timing,costs,buffers) select * from tbl where st_contains(
st_union(
array[
Buffers: shared hit=822
Planning time: 0.268 ms
Execution time: 5062.947 ms
(12 rows)
6、使用多个 polgon 搜索,使用 union all 合并结果
explain (analyze,verbose,timing,costs,buffers)
select * from tbl where st_contains(st_setsrid(st_makepolygon(ST_GeomFromText('LINESTRING(75.15 29.53,77 29,77.6 29.5, 75.15 29.53)')), 4326), pos)
7、写 UDF,简化写多个 UNION ALL
create or replace function q_mp(VARIADIC arr geometry[]) returns setof record as $$
declare
sql text := '';
var geometry;
begin
-菜鸟、高德、HELLOBIKE、新零售、空间透视分析…
十二、 空间数据使用建议
l 球面距离计算
-STDistanceSpheroid(geom,ST_SetSRID(ST_Point(102,24),4326
),'SPHEROID["WGS84",6378137,298.257223563]')
- vspheroid := 'SPHEROID["WGS84",6378137,298.257223563]';
--wGS84 球体参数定义
- vcurrentpoint := ST_SetSRID(ST_Point(ix,iy),4326);
·设置 SRID
l KNN 查询优化
https://github.com/digoal/blog/blob/master/201308/20130806
l 时间、空间、对象多维搜索
-位图扫描
gist 索引
btree_gist 插件
l 分区