阿里云PostgreSQL_案例介绍3

开发者学堂课程【PostgreSQL云上开发实践:阿里云PostgreSQL_案例介绍3】学习笔记，与课程紧密联系，让用户快速学习知识。

课程地址：https://developer.aliyun.com/learning/course/528/detail/7102

内容介绍：

一、Case9（订阅、单元化、容灾、多写）

二、Case10（跨域、跨库、sharding）

三、Case11（流式处理-阅后即焚）

四、Case12（秒杀）

五、Case13共享xxx（多表 JOIN 实时分析）

一、Case9（订阅、单元化、容灾、多写）

· 线上 RDS PG

· 逻辑订阅、或物理订阅到用户端 PG

案例-架构设计、代码、实操手册:

· 订阅功能(单元化)

https://github.com/digoal/blog/blob/master/201707/20170711_01.md

二、Case10（跨域、跨库、sharding）

· 传统企业数据库上云，突破单库容量限制

——多库组集群、相互可访问、可写、可同步。

——功能点: FDW (外部表、远程表)、DBLINK 、匿名、逻辑订阅。

案例-架构设计、代码、实操手册：

· FDW

-总览:https://wiki.postgresql.org/wiki/Fdw

-文件

fdw:https://www.postgresql.org/docs/10/static/file-fdw.html

-PGfdw:https://www.postgresql.org/docs/10/static/postgres-fdw.html -MySQLfdw:https://github.com/EnterpriseDB/mysgl_fdw

-OracleFDW:https://pgxn.org/dist/oracle_fdw/

-SQL Server FDW:https://pgxn.org/dist/tds_fdw/

· DBLINK

https://www.postgresql.org/docs/10/static/dblink.html

· 订阅功能(单元化)

https://github.com/digoal/blog/blob/master/201702/20170227_01.md https://github.com/digoal/blog/blob/master/201707/20170711_01.md

三、Case11（流式处理-阅后即焚）

流式处理-阅后即焚-消费	10亿，消费395.2万行/s	56	3952	14毫秒
物联网-阅后即焚-读写并测	写入：193万行/s,消费：418万行/s	56
物理网-阅后即焚-JSON+函数流计算-读写并测	写入：180万行/s,消费：145.8万行/s	56

· 流式处理，高并发写入，快速消费处理。

· 处理后的数据被删除。

· 要求:

——数据快速写入式、事务

——数据写入后必须持久化

——快速消费被写入的记录（例如订阅，或者用于业务上的流式计算，计算结果保留）

——消费和计算必须在一个事务完成

流计算和数据库有关系吗？——有

为什么需要流计算？

· 实时分析需求

-大查询的时效性

· 过滤有效数据

-实时数据清洗

· 预警需求

-实时数据预警，电子围栏、物联网异常指标、监控系统

PostgreSQL 流计算原理：

方法1、pipelineDB (批处理，低延迟，大吞吐，100+万行/s)

方法2、rule、trigger (实时处理，实时，小吞吐，单步写30+万行/s，批量写100+万行/s)

方法3、insert on conflict (实时处理，实时，小吞吐，单步写30+万行/s，批量写100+万行/s)

方法4、阅后即焚(批处理，低延迟，大吞吐，100+万行/s)

案例1-流式预警：

· 根据规则发现数据异常，并通知应用程序

· 传统手段

- 异步查询、实效性较差、重复劳动较多

· 流计算手段

- 实时或异步、异步消息通道

select xxx from xxx

where 指标字段>=阈值

and xxxx

order by xxx;

问题:

1. 索引巨大、

2. 存在大量无用功查询。

create index idx on table（字段）where 阈值>xxx;

只有异常数据才建索引。Partial index

Syntax:

NOTIFY channel [ , payload ]

Syntax:

LISTEN channel

· create table tbl(sid int,content jsonb);

· create or replace function monitor（jisonb)returns boolean as $$

· declare

· begin

——if xxx then return true; --条件1

——else if xxx then return true; --件2,

——else if .....; --条件N

——else return false;

——end if;

· end;

· $$ language plpgsql strict;

· create or replace rule r1 as on insert to tbl do {also | instead} select pg_notify(‘channel_name',(NEW.content):：text) where monitor(NEW.content);

——可落明细、可不落明细、可按需落明细，实时压缩(例如5分钟落一个明细点，其他只落告警点)。

· client:

——listen 'channel_name';

案例2-流式统计(avg,count,min,max,sum)

https://github.com/digoal/blog/blob/master/201711/20171123_02.md

分区+批量写入336万行/s:

insert into tbl(sid,v1,crttime)values(:sid，:v1，now())

on conflict (sid) do update set

v1=excluded.v1,

crt_time=excluded.crt_time,

cnt=tbl.cnt+1,

sum_v=case tb1.cnt when 1 then tbl.v1+excluded.v1 else tb1.sum_v+excluded.v1end ,

min_v=least(tbl.min_v,excluded.v1), max_v=greatest(tbl.max_v,excluded.v1);

多维流式计算

1、定义明细表

create table tb1(c1 int not null, c2 int not null,c3 int not null,c4 int not null,c5 int not null);

2、定义每个维度的目标统计表

Create table cv1_tb1(c1 int primary key,cnt int8 default 1);

Create table cv2_tb1(c2 int,c3 int,c5 int,sum_v float8 default 0,cnt int8 default 1, primary key (c2,c3));

其他维度

3、定义维度表的 insert on conflict SQL

· insert into cv1 tb1(c1) values(NEW.c1)on conflict(c1) do update set cnt=cv1 tb1.cnt+1;

· insert into cv2tb1(c2c3c5)values(NEW.c2,NEW.c3,NEW.c5)on conflict(c2,c3) do update set cnt=cv2_tb1.cnt+1

sum_v=case cv2 tb1.cnt when 1 then cv2_ tb1.c5+excluded.c5 else cv2_tb1.sum_v+excluded.c5 end;

4、定义明细表 trigger 或 rule，顺序调用 insertonconflict 写入多个维度表

· create rule r1 as on insert to tb1 do {instead | also} insert into cv1_tb1(c1) values(NEW.c1) on conflict(c1) do update set cnt=cv1_tb1.cnt+1;

· create rule r2 as on insert to tb1 do {instead | also} insert into cv2_tb1(c2,c3,c5) values(NEW.c2,NEW.c3,NEW.c5) on conflict(c2,c3)do update set cnt=cv2_tb1.cnt+1,sum_v=case cv2_tb1.cnt when 1 then cv2_tb1.c5+excluded.c5 else cv2_tb1.sum_v+excluded.c5 end;

案例3-流式概率计算

http://docs.pipelinedb.com/probabilistic.html

- bloom filter: distinct概率判定,select distinct (...) 快速判断

- count-min sketch: per element's freq 概率 select

col,count(*) group by col 快速统计+判定

- filtered-space saving top-k: element's bucket & freq 概率 - HLL: COUNT(DISTINCT ...)

- T-Digest:Rank base statistic.percentile_cont 水位值.

案例3-流式实时估算

https://github.com/digoal/blog/blob/master/201711/20171123_02.md

· HLL插件

· create extension hll;

· create table tbl(

- grpid int, userid int,dt date,

- cnt int8 default 1,

- hll_userid hll default hll_empty(),--估算字段

- primary key(grpid,dt)

- );

·insert into tbl (grpid,userid,dt) values()on conflict (grpid,dt)

· do update set

· cnt=tblcnt+1

hll_userid=case tbl.cnt when 1 then hll add(hll_add(tbl.hll_userid,hll_hash_integer tb1.userid))hll_hash_integer(excluded.userid)else

hll_add(tb1.hll _userid, hll_hash_integer (excluded.userid)) end;

滑窗分析- RDS PG与HDB PG都适用

· 估值滑窗(最近7天UV)

-SELECT date,#hll_union_agg(users)OVER seven_days

FROM daily_uniques WINDOW seven_days AS(ORDER BY date ASC ROWS 6 PRECEDING);

·统计滑窗(最近7天精确UV，SUM，AVG。。。)

-SELECT date,count(distinct users)OVER seven_days,

sum(x）OVER seven_days,avg（x）OVER seven_days FROM daily_uniques WINDOW seven_days AS(ORDER BY date ASC ROWS 6 PRECEDING);

案例4-流式阅后即焚:

· 流式处理，高并发写入，快速消费处理。

· 处理后的数据被删除。

· 要求:

——数据快速写入

——数据写入后必须持久化

——快速消费被写入的记录（例如订阅，或者用于业务上的流式计算，计算结果保留）

——消费和计算必须在一个事务完成

https://github.com/digoal/blog/blob/master/201711/20171107_32.md

· 流式处理应用

· create table tbl1(id int, info jsonb);

· insert into tbl ....

· 异步处理

· UDF 写法，UDF 内实现阅后即焚

· CTE 写法，单个 SQL 实现阅后即焚

—with t1 as (delete from tb1 where ctid = any (array(select ctid from tb1 limit10)) returning*)

—select pg_notify('channel_name', values) from t1;

—-- deal with t1's values;

案例5-实时经营分析1

经营分析系统、决策系统

https://github.com/digoal/blog/blob/master/201711/20171126_01.md

案例-架构设计、代码、实操手册

https://github.com/digoal/blog/blob/master/201711/20171111_01.md https://github.com/digoal/blog/blob/master/201711/20171107_28.md https://github.com/digoal/blog/blob/master/201711/20171107_32.md https://github.com/digoal/blog/blob/master/201711/20171107_33.md https://github.com/digoal/blog/blob/master/201608/20160827_01.md https://github.com/digoal/blog/blob/master/201711/20171123_02.md https://github.com/digoal/ blog/blob/master/201711 20171126_01.md

四、Case12（秒杀）

· 超轻锁( advisoryLOCK )解决高并发锁竞争问题

- 手段:在 CPU 运算发现行锁之前就知道是不是有冲突，大大缩短 CPU 计算资源，等待资源

ADLock 代替行锁-秒杀

· 高并发扣减库存

· 高并发争抢锁

—update tb1 set x=x where id=? and pg_try_advisory_xact _lock(id)returning*;

1.连接 redis 判断是否还有库存

2.有，去 PG 扣减( ADLock )。没有则直接返回。

3.扣减成功，去 redis 更新库存

案例-架构设计、代码、实操手册

https://github.com/digoal/blog/blob/master/201711/20171107_31.md

https://github.com/digoal/blog/blob/master/201611/20161117_01.md

https://github.com/digoal/blog/blob/master/201509/20150914_01.md

五、Case13共享xxx（多表JOIN实时分析）

· Itress 树类型、消除 JOIN

——（空间换世界、物化视图）

Itree 树类型+R-T索引

· Itree

-https://www.postgresgl.org/docs/10/static/ltree.html

-SELECT path FROM test WHERE path~’*.Astronomy.*1’；

-SELECT path FROM test WHERE path~’*.!pictures@.*.Astronomy.*’;

-SELECT path FROM test WHERE path @'Astro*%&!pictures@';

-SELECT path FROM test WHERE path @'Astro*&!pictures@';

案例-架构设计、代码、实操手册

https://github.com/digoal/blog/blob/master/201709/20170923_01.md

场景映射与特性匹配

· 非结构化数据类型选择 -json,hstore,xml 类型

· 全文检索需求 -采用 tsvector 类型

· 模糊查询

-含前缀时，使用 b-tree 索引

-含后缀时，使用 reverse(col)b-tree 表达式索引

· 前后模糊 -采用 pg_trgm 插件，gin 索引。

· 相似搜索 -采用 pg_trgm 插件，gin 索引。

· 短文特征向量，海明相似 -采用 mlar 插件，gin 索引。

· 多字段任意搜索

-采用 gin 复合索引。

-采用多个单列索引。

· 多值类型搜索

- 采用数组类型。

-采用 gin 索引。

阿里云PostgreSQL_案例介绍3 | 学习笔记