【重新发现PostgreSQL之美】- 23 彭祖的长寿秘诀-阿里云开发者社区

开发者社区> 传说中的德哥> 正文

【重新发现PostgreSQL之美】- 23 彭祖的长寿秘诀

简介: 大家好,这里是重新发现PostgreSQL之美 - 23 彭祖的长寿秘诀
+关注继续查看

背景


场景:

  • 论坛、短视频、社交、电商等推荐业务.
  • 在推荐的过程中要过滤已读列表.

挑战:

  • 已读列表巨大, 普通数据库没有多值列类型, 每个已读ID需要存储1条.
  • 使用not in过滤, 性能极差.

PG 解决方案:

  • 1、多值列, 一个用户已读只需要存储1条
  • 2、datasketch 近似类型, 几KB空间可以存储上亿唯一值.
  • 3、压缩存储多值列 roaringbitmap类型, 效率和空间的平衡.
  • 4、使用partial index, 拆散过滤列表, 降低每次请求的无效数据.

例子


本文不谈推荐算法, 有兴趣的读者可以参考末尾文档或我的github, 为了观察方便, 本文尽量简化过程, 就用一个排序维度进行过滤.

传统方案

1、已读列表, 存储已读视频ID

uid int,  -- UID
vid int   -- 已读VID
);
create index idx_u_readlist on u_readlist (uid);

2、视频ID表, 存储推荐分数

vid int primary key,  -- 视频ID
score float4,   -- 推荐分数
ts timestamp
);
create index idx_vscore on vscore (score desc);

3、插入1000万条视频

4、插入1000个用户, 每个用户已读5万条视频.

insert into u_readlist select generate_series(1,1000),vid from a;

5、传统数据库没有多值列数据类型, 需要5000万条记录存储已读列表.

6、为1号用户推荐视频并过滤已读, 取出100个vid.

where vid not in (select vid from u_readlist where uid=1)
order by score desc
limit 100;
QUERY PLAN
-------------------------------------------------------------------------------------------------
Limit  (cost=48446.32..48451.80 rows=100 width=8) (actual time=103.616..103.704 rows=100 loops=1)
->  Index Scan using idx_vscore on vscore  (cost=48446.32..322781.27 rows=4999930 width=8) (actual time=103.615..103.693 rows=100 loops=1)
Filter: (NOT (hashed SubPlan 1))
Rows Removed by Filter: 50000
SubPlan 1
->  Bitmap Heap Scan on u_readlist  (cost=434.17..48321.27 rows=49846 width=4) (actual time=13.819..53.265 rows=50000 loops=1)
Recheck Cond: (uid = 1)
Heap Blocks: exact=50000
->  Bitmap Index Scan on idx_u_readlist  (cost=0.00..421.71 rows=49846 width=0) (actual time=4.610..4.610 rows=50000 loops=1)
Index Cond: (uid = 1)
Planning Time: 0.086 ms
Execution Time: 104.279 ms
(12 rows)

PG 解决方案1

1、PG可以使用数组存储多个已读VID,记录数节省5万倍.

uid int,  -- UID
vid int[]   -- 已读VID
);
create index idx_p_u_readlist on p_u_readlist (uid);

2、5000万条压缩到1000条

3、空间约仅占1/10

4、为1号用户推荐视频并过滤已读, 取出100个vid.

where vid not in (select unnest(vid) from p_u_readlist where uid=1)
order by score desc
limit 100;
Limit  (cost=3.01..8.50 rows=100 width=8) (actual time=56.343..56.438 rows=100 loops=1)
->  Index Scan using idx_vscore on vscore  (cost=3.01..274337.96 rows=4999930 width=8) (actual time=56.342..56.427 rows=100 loops=1)
Filter: (NOT (hashed SubPlan 1))
Rows Removed by Filter: 50000
SubPlan 1
->  ProjectSet  (cost=0.28..2.55 rows=10 width=4) (actual time=0.113..3.904 rows=50000 loops=1)
->  Index Scan using idx_p_u_readlist on p_u_readlist  (cost=0.28..2.49 rows=1 width=18) (actual time=0.018..0.022 rows=1 loops=1)
Index Cond: (uid = 1)
Planning Time: 0.112 ms
Execution Time: 56.814 ms
(10 rows)

PG 解决方案2

在VID表存储已读UID, 而不是在UID中存储已读VID

vid int primary key,  -- 视频ID
score float4,   -- 推荐分数
uid int[],   -- 已读UID
ts timestamp
);
create index idx_p_vscore on p_vscore (score desc);

插入1000万条视频, score排名前5万个视频全部被1000个用户已读.

update p_vscore set
uid=array(select generate_series(1,1000))
where vid in (
select vid from p_vscore order by score desc limit 50000
);
public | p_vscore      | table             | postgres | unlogged    | heap          | 687 MB  |

为1号用户推荐视频并过滤已读, 取出100个vid

where uid is null or not uid @> array[1]
order by score desc
limit 100;
Limit  (cost=0.43..3.08 rows=100 width=8) (actual time=384.599..384.696 rows=100 loops=1)
->  Index Scan using idx_p_vscore on p_vscore  (cost=0.43..264791.46 rows=9999712 width=8) (actual time=384.598..384.685 rows=100 loops=1)
Filter: ((uid IS NULL) OR (NOT (uid @> '{1}'::integer[])))
Rows Removed by Filter: 50000
Planning Time: 0.062 ms
Execution Time: 384.713 ms
(6 rows)

和之前视频讲的一样, 为什么这种方式更慢了? 具有olap属性, 如下:
《重新发现PostgreSQL之美- 21 探访宇航员的食物》

PG 解决方案3

1、使用HLL、roaringbitmap存储过滤列表, 可以节省列表的存储空间, 有些用户可能已读VID达到上百万, 需要大量的存储空间.

create unlogged table p1_u_readlist (
uid int,  -- UID
vid hll   -- 已读VID
);
create index idx_p1_u_readlist on p1_u_readlist (uid);

5000万条压缩到1000条

public | p1_u_readlist | table             | postgres | unlogged    | heap          | 1376 kB |

为1号用户推荐视频并过滤已读, 取出100个vid.

where not exists (select 1 from p1_u_readlist where uid=1 and p1_u_readlist.vid = hll_add(p1_u_readlist.vid,hll_hash_integer(vscore.vid)))
order by score desc
limit 100;
Limit  (cost=0.71..5.23 rows=100 width=8) (actual time=602.038..632.683 rows=100 loops=1)
->  Nested Loop Anti Join  (cost=0.71..449335.43 rows=9949861 width=8) (actual time=602.037..632.672 rows=100 loops=1)
Join Filter: (p1_u_readlist.vid = hll_add(p1_u_readlist.vid, hll_hash_integer(vscore.vid, 0)))
Rows Removed by Join Filter: 100
->  Index Scan using idx_vscore on vscore  (cost=0.43..249335.74 rows=9999860 width=8) (actual time=0.010..37.183 rows=52695 loops=1)
->  Materialize  (cost=0.28..2.50 rows=1 width=1287) (actual time=0.000..0.000 rows=1 loops=52695)
->  Index Scan using idx_p1_u_readlist on p1_u_readlist  (cost=0.28..2.49 rows=1 width=1287) (actual time=0.019..0.021 rows=1 loops=1)
Index Cond: (uid = 1)
Planning Time: 0.146 ms
Execution Time: 632.714 ms
(10 rows)

和之前视频讲的一样, 为什么这种方式更慢了? 具有olap属性, 如下:
《重新发现PostgreSQL之美- 21 探访宇航员的食物》

PG 解决方案4

在VID表存储已读UID, 而不是在UID中存储已读VID

使用HLL、roaringbitmap存储过滤列表

在VID表存储已读UID, 而不是在UID中存储已读VID

vid int primary key,  -- 视频ID
score float4,   -- 推荐分数
uid hll,   -- 已读UID
ts timestamp
);
create index idx_p1_vscore on p1_vscore (score desc);

插入1000万条视频, score排名前5万个视频全部被1000个用户已读.

update p1_vscore set
uid= (select hll_add_agg(hll_hash_integer(i)) from generate_series(1,1000) i)
where vid in (
select vid from p1_vscore order by score desc limit 50000
);
public | p_vscore      | table             | postgres | unlogged    | heap          | 687 MB  |
public | p1_vscore | table | postgres | unlogged    | heap          | 488 MB |

为1号用户推荐视频并过滤已读, 取出100个vid.

where uid is null or uid <> hll_add(uid,hll_hash_integer(1))
order by score desc
limit 100;
Limit  (cost=0.43..3.38 rows=100 width=8) (actual time=565.654..565.773 rows=100 loops=1)
->  Index Scan using idx_p1_vscore on p1_vscore  (cost=0.43..339509.68 rows=11541416 width=8) (actual time=565.653..565.763 rows=100 loops=1)
Filter: ((uid IS NULL) OR (uid <> hll_add(uid, '-8604791237420463362'::hll_hashval)))
Rows Removed by Filter: 50000
Planning Time: 0.090 ms
Execution Time: 565.795 ms
(6 rows)

PG 解决方案5

采用partial index, 注意看清楚, 这个方法直接将过滤已读从5万缩小到1000

uid int,  -- UID
hashid int,  -- partial index hashid
vid int[]   -- 已读VID
);
create index idx_pp_u_readlist on pp_u_readlist (uid,hashid);

假设hash 50片, 5000万条压缩到50000条,

public | pp_u_readlist | table | postgres | unlogged    | heap          | 229 MB |

视频表创建partial index

declare
begin
for i in 0..49 loop
execute format( 'create index p_vscore_p%s on vscore (score desc) where abs(mod(hashint4(vid),50))=%s', i,i);
end loop;
end;
$$;
List of relations
Schema |     Name      | Type  |  Owner   |  Table   | Persistence | Access method |  Size   | Description
--------+---------------+-------+----------+----------+-------------+---------------+---------+-------------
public | p_vscore_p0   | index | postgres | vscore   | unlogged    | btree         | 4416 kB |
public | p_vscore_p1   | index | postgres | vscore   | unlogged    | btree         | 4416 kB |
public | p_vscore_p10  | index | postgres | vscore   | unlogged    | btree         | 4408 kB |
public | p_vscore_p11  | index | postgres | vscore   | unlogged    | btree         | 4416 kB |
public | p_vscore_p12  | index | postgres | vscore   | unlogged    | btree         | 4432 kB |
public | p_vscore_p13  | index | postgres | vscore   | unlogged    | btree         | 4408 kB |
public | p_vscore_p14  | index | postgres | vscore   | unlogged    | btree         | 4400 kB |
public | p_vscore_p15  | index | postgres | vscore   | unlogged    | btree         | 4416 kB |
public | p_vscore_p16  | index | postgres | vscore   | unlogged    | btree         | 4416 kB |
public | p_vscore_p17  | index | postgres | vscore   | unlogged    | btree         | 4424 kB |
public | p_vscore_p18  | index | postgres | vscore   | unlogged    | btree         | 4424 kB |
public | p_vscore_p19  | index | postgres | vscore   | unlogged    | btree         | 4416 kB |
public | p_vscore_p2   | index | postgres | vscore   | unlogged    | btree         | 4408 kB |
public | p_vscore_p20  | index | postgres | vscore   | unlogged    | btree         | 4416 kB |
public | p_vscore_p21  | index | postgres | vscore   | unlogged    | btree         | 4416 kB |
public | p_vscore_p22  | index | postgres | vscore   | unlogged    | btree         | 4408 kB |
public | p_vscore_p23  | index | postgres | vscore   | unlogged    | btree         | 4432 kB |
public | p_vscore_p24  | index | postgres | vscore   | unlogged    | btree         | 4408 kB |
public | p_vscore_p25  | index | postgres | vscore   | unlogged    | btree         | 4416 kB |
public | p_vscore_p26  | index | postgres | vscore   | unlogged    | btree         | 4408 kB |
public | p_vscore_p27  | index | postgres | vscore   | unlogged    | btree         | 4408 kB |
public | p_vscore_p28  | index | postgres | vscore   | unlogged    | btree         | 4416 kB |
public | p_vscore_p29  | index | postgres | vscore   | unlogged    | btree         | 4416 kB |
public | p_vscore_p3   | index | postgres | vscore   | unlogged    | btree         | 4424 kB |
public | p_vscore_p30  | index | postgres | vscore   | unlogged    | btree         | 4432 kB |
public | p_vscore_p31  | index | postgres | vscore   | unlogged    | btree         | 4416 kB |
public | p_vscore_p32  | index | postgres | vscore   | unlogged    | btree         | 4424 kB |
public | p_vscore_p33  | index | postgres | vscore   | unlogged    | btree         | 4416 kB |
public | p_vscore_p34  | index | postgres | vscore   | unlogged    | btree         | 4424 kB |
public | p_vscore_p35  | index | postgres | vscore   | unlogged    | btree         | 4400 kB |
public | p_vscore_p36  | index | postgres | vscore   | unlogged    | btree         | 4424 kB |
public | p_vscore_p37  | index | postgres | vscore   | unlogged    | btree         | 4408 kB |
public | p_vscore_p38  | index | postgres | vscore   | unlogged    | btree         | 4416 kB |
public | p_vscore_p39  | index | postgres | vscore   | unlogged    | btree         | 4408 kB |
public | p_vscore_p4   | index | postgres | vscore   | unlogged    | btree         | 4432 kB |
public | p_vscore_p40  | index | postgres | vscore   | unlogged    | btree         | 4432 kB |
public | p_vscore_p41  | index | postgres | vscore   | unlogged    | btree         | 4400 kB |
public | p_vscore_p42  | index | postgres | vscore   | unlogged    | btree         | 4416 kB |
public | p_vscore_p43  | index | postgres | vscore   | unlogged    | btree         | 4416 kB |
public | p_vscore_p44  | index | postgres | vscore   | unlogged    | btree         | 4424 kB |
public | p_vscore_p45  | index | postgres | vscore   | unlogged    | btree         | 4416 kB |
public | p_vscore_p46  | index | postgres | vscore   | unlogged    | btree         | 4424 kB |
public | p_vscore_p47  | index | postgres | vscore   | unlogged    | btree         | 4400 kB |
public | p_vscore_p48  | index | postgres | vscore   | unlogged    | btree         | 4416 kB |
public | p_vscore_p49  | index | postgres | vscore   | unlogged    | btree         | 4424 kB |
public | p_vscore_p5   | index | postgres | vscore   | unlogged    | btree         | 4408 kB |
public | p_vscore_p6   | index | postgres | vscore   | unlogged    | btree         | 4408 kB |
public | p_vscore_p7   | index | postgres | vscore   | unlogged    | btree         | 4416 kB |
public | p_vscore_p8   | index | postgres | vscore   | unlogged    | btree         | 4424 kB |
public | p_vscore_p9   | index | postgres | vscore   | unlogged    | btree         | 4408 kB |

为1号用户推荐视频并过滤已读, 取出100个vid.
每次获取时业务随机或采用round robin负载均衡方式从0-49个分片中获取100个.

where vid not in (select unnest(vid) from pp_u_readlist where uid=1 and hashid=0)
and abs(mod(hashint4(vid),50))=0
order by score desc
limit 100;
Limit  (cost=3.01..157.54 rows=100 width=8) (actual time=1.223..1.331 rows=100 loops=1)
->  Index Scan using p_vscore_p0 on vscore  (cost=3.01..38635.65 rows=25000 width=8) (actual time=1.222..1.317 rows=100 loops=1)
Filter: (NOT (hashed SubPlan 1))
Rows Removed by Filter: 973
SubPlan 1
->  ProjectSet  (cost=0.29..2.57 rows=10 width=4) (actual time=0.021..0.115 rows=973 loops=1)
->  Index Scan using idx_pp_u_readlist on pp_u_readlist  (cost=0.29..2.51 rows=1 width=18) (actual time=0.007..0.008 rows=1 loops=1)
Index Cond: ((uid = 1) AND (hashid = 0))
Planning Time: 0.515 ms
Execution Time: 1.366 ms
(10 rows)

其他方案略

datasketch
roaringbitmap

总结

推荐查询性能提升:

已读列表存储空间节约:

原理

1、多值列, 一个用户已读只需要存储1条
2、datasketch 近似类型, 几KB空间可以存储上亿唯一值.
3、压缩存储多值列roaringbitmap类型, 效率和空间的平衡.
4、使用partial index, 拆散过滤列表, 降低每次请求的无效数据.

参考

《PostgreSQL 大量IO扫描、计算浪费的优化- 推荐模块, 过滤已推荐. (热点用户、已推荐列表超大)》

《推荐系统, 已阅读过滤, 大量CPU和IO浪费的优化思路2 - partial index - hash 分片,降低过滤量》

《PostgreSQL 推荐系统优化总计- 空间、时间、标量等混合多模查询场景, 大量已读过滤导致CPU IO剧增(类挖矿概率下降优化)》

《近似查询处理(Approximate Query Processing) - DataSketches - 压缩、distinct、分位、rank、高
频柱状图(count distinct, quantiles, most-frequent items, joins, matrix computations, and graph analysis) 等实时大数据近似分析》

《PostgreSQL pg_roaringbitmap - 用户画像、标签、高效检索》

除了本文的应用, hll, roaringbitmap还经常被用在实时分析系统, 画像系统中, 支持高速的PV UV统计, 滑窗分析, 人群圈选等.

 

版权声明:本文内容由阿里云实名注册用户自发贡献,版权归原作者所有,阿里云开发者社区不拥有其著作权,亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容,填写侵权投诉表单进行举报,一经查实,本社区将立刻删除涉嫌侵权内容。

相关文章
一文快速搞定Redis_数据类型及JavaApi操作
大家好,我是**ChinaManor**,直译过来就是中国码农的意思,我希望自己能成为国家复兴道路的铺路人,大数据领域的耕耘者,平凡但不甘于平庸的人。
9 0
计算机基础1 | 学习笔记
快速学习计算机基础1。
8 0
计算机基础2 | 学习笔记
快速学习计算机基础2。
11 0
Vue 仿钉钉流程图(流程节点绘制 vue+Ant【如果用其他UI库需要替换几个组件】 附 demo)
# [这里是git地址](https://gitee.com/xiaoyaoluntian/imitating-dingding-flow-chart/tree/comdemo/)
6 0
数据类型-数值和字符串 | 学习笔记
快速学习数据类型-数值和字符串。
5 0
PG+MySQL第9课-实时精准营销
通常业务场景会涉及基于标签条件圈选目标客户、基于用户特征值扩选相似人群、群体用户画像分析这些技术,本文将围绕这三个场景去介绍在实施精准营销里面的PG数据库的使用
8 0
冬季实战营第一期学习报告
通过五天学习与实操,对ECS云服务器入门、快速搭建LAMP环境、部署MySQL数据库、回顾搭建Docker环境和Spring Boot以及使用PolarDB和ECS搭建门户网站操作,对本期学习与实操的认识。
9 0
Redis高可用架构演进
Redis是目前使用最广泛的缓存程序之一,也被应用于多种场景,例如数据缓存、分布式锁等,Redis官方提供了多种部署架构,以满足不同应用场景下对于高可用和扩展性的要求。
8 0
冬季实战营第一期:从零到一上手玩转云服务器实验报告
第一期主要进行了六次实验,分别是《动手实操ECS云服务器》、《动手实操快速搭建LAMP环境》、《使用ECS服务器部署MySQL数据库》、《通过workbench远程登录ECS,快速搭建Docker环境》、《从零搭建Spring Boot的Hello World》以及《使用PolarDB和ECS搭建门户网站》。首先远程登陆ECS实例,搭建LAMP环境,其中LAMP分别代表Linux、Apache、MySQL和PHP。然后配置及远程访问MySQL。冯晓帅老师在直播上带大家通过workbench登录ECS并快速搭建Docker环境,运行Spring Boot,最后安装WordPress并搭建博客。
9 0
MySQL高可用架构演进
MySQL是数据库领域当之无愧的霸主之一,其在各行各业被广泛应用,随着广泛使用,对于MySQL本身的高可用性的要求就是不可避免的话题,而MySQL的高可用方案也随着MySQL功能的完善经历了多次升级,本文将对MySQL的各种高可用架构进行分析,以此来了解架构的演进。
9 0
172
文章
0
问答
来源圈子
更多
德哥:阿里巴巴高级产品专家, 阿里巴巴钻石布道师, 42项数据库专利, 目前任职于数据库架构组, 负责数据库开源,同时也是PostgreSQL 中国社区发起人之一, 负责PostgreSQL数据库在中国的技术落地与推广、人才培养,开设有开源社、德说、重新发现PG之美等专栏。
+ 订阅
文章排行榜
最热
最新
相关电子书
更多
《2021云上架构与运维峰会演讲合集》
立即下载
《零基础CSS入门教程》
立即下载
《零基础HTML入门教程》
立即下载