【重新发现PostgreSQL之美】- 23 彭祖的长寿秘诀-阿里云开发者社区

背景

场景:

论坛、短视频、社交、电商等推荐业务.
在推荐的过程中要过滤已读列表.

挑战:

已读列表巨大, 普通数据库没有多值列类型, 每个已读ID需要存储1条.
使用not in过滤, 性能极差.

PG 解决方案:

1、多值列, 一个用户已读只需要存储1条
2、datasketch 近似类型, 几KB空间可以存储上亿唯一值.
3、压缩存储多值列 roaringbitmap类型, 效率和空间的平衡.
4、使用partial index, 拆散过滤列表, 降低每次请求的无效数据.

例子

本文不谈推荐算法, 有兴趣的读者可以参考末尾文档或我的github, 为了观察方便, 本文尽量简化过程, 就用一个排序维度进行过滤.

传统方案

1、已读列表, 存储已读视频ID

uid int,  -- UID

vid int   -- 已读VID

);

create index idx_u_readlist on u_readlist (uid);

2、视频ID表, 存储推荐分数

vid int primary key,  -- 视频ID

score float4,   -- 推荐分数

ts timestamp

);

create index idx_vscore on vscore (score desc);

3、插入1000万条视频

4、插入1000个用户, 每个用户已读5万条视频.

insert into u_readlist select generate_series(1,1000),vid from a;

5、传统数据库没有多值列数据类型, 需要5000万条记录存储已读列表.

6、为1号用户推荐视频并过滤已读, 取出100个vid.

where vid not in (select vid from u_readlist where uid=1)

order by score desc

limit 100;

QUERY PLAN

-------------------------------------------------------------------------------------------------

Limit  (cost=48446.32..48451.80 rows=100 width=8) (actual time=103.616..103.704 rows=100 loops=1)

->  Index Scan using idx_vscore on vscore  (cost=48446.32..322781.27 rows=4999930 width=8) (actual time=103.615..103.693 rows=100 loops=1)

Filter: (NOT (hashed SubPlan 1))

Rows Removed by Filter: 50000

SubPlan 1

->  Bitmap Heap Scan on u_readlist  (cost=434.17..48321.27 rows=49846 width=4) (actual time=13.819..53.265 rows=50000 loops=1)

Recheck Cond: (uid = 1)

Heap Blocks: exact=50000

->  Bitmap Index Scan on idx_u_readlist  (cost=0.00..421.71 rows=49846 width=0) (actual time=4.610..4.610 rows=50000 loops=1)

Index Cond: (uid = 1)

Planning Time: 0.086 ms

Execution Time: 104.279 ms

(12 rows)

PG 解决方案1

1、PG可以使用数组存储多个已读VID,记录数节省5万倍.

uid int,  -- UID

vid int[]   -- 已读VID

);

create index idx_p_u_readlist on p_u_readlist (uid);

2、5000万条压缩到1000条

3、空间约仅占1/10

4、为1号用户推荐视频并过滤已读, 取出100个vid.

where vid not in (select unnest(vid) from p_u_readlist where uid=1)

order by score desc

limit 100;

Limit  (cost=3.01..8.50 rows=100 width=8) (actual time=56.343..56.438 rows=100 loops=1)

->  Index Scan using idx_vscore on vscore  (cost=3.01..274337.96 rows=4999930 width=8) (actual time=56.342..56.427 rows=100 loops=1)

Filter: (NOT (hashed SubPlan 1))

Rows Removed by Filter: 50000

SubPlan 1

->  ProjectSet  (cost=0.28..2.55 rows=10 width=4) (actual time=0.113..3.904 rows=50000 loops=1)

->  Index Scan using idx_p_u_readlist on p_u_readlist  (cost=0.28..2.49 rows=1 width=18) (actual time=0.018..0.022 rows=1 loops=1)

Index Cond: (uid = 1)

Planning Time: 0.112 ms

Execution Time: 56.814 ms

(10 rows)

PG 解决方案2

在VID表存储已读UID, 而不是在UID中存储已读VID

vid int primary key,  -- 视频ID

score float4,   -- 推荐分数

uid int[],   -- 已读UID

ts timestamp

);

create index idx_p_vscore on p_vscore (score desc);

插入1000万条视频, score排名前5万个视频全部被1000个用户已读.

update p_vscore set

uid=array(select generate_series(1,1000))

where vid in (

select vid from p_vscore order by score desc limit 50000

);

public | p_vscore      | table             | postgres | unlogged    | heap          | 687 MB  |

为1号用户推荐视频并过滤已读, 取出100个vid

where uid is null or not uid @> array[1]

order by score desc

limit 100;

Limit  (cost=0.43..3.08 rows=100 width=8) (actual time=384.599..384.696 rows=100 loops=1)

->  Index Scan using idx_p_vscore on p_vscore  (cost=0.43..264791.46 rows=9999712 width=8) (actual time=384.598..384.685 rows=100 loops=1)

Filter: ((uid IS NULL) OR (NOT (uid @> '{1}'::integer[])))

Rows Removed by Filter: 50000

Planning Time: 0.062 ms

Execution Time: 384.713 ms

(6 rows)

和之前视频讲的一样, 为什么这种方式更慢了? 具有olap属性, 如下:
《重新发现PostgreSQL之美- 21 探访宇航员的食物》

PG 解决方案3

1、使用HLL、roaringbitmap存储过滤列表, 可以节省列表的存储空间, 有些用户可能已读VID达到上百万, 需要大量的存储空间.

create unlogged table p1_u_readlist (

uid int,  -- UID

vid hll   -- 已读VID

);

create index idx_p1_u_readlist on p1_u_readlist (uid);

5000万条压缩到1000条

public | p1_u_readlist | table             | postgres | unlogged    | heap          | 1376 kB |

为1号用户推荐视频并过滤已读, 取出100个vid.

where not exists (select 1 from p1_u_readlist where uid=1 and p1_u_readlist.vid = hll_add(p1_u_readlist.vid,hll_hash_integer(vscore.vid)))

order by score desc

limit 100;

Limit  (cost=0.71..5.23 rows=100 width=8) (actual time=602.038..632.683 rows=100 loops=1)

->  Nested Loop Anti Join  (cost=0.71..449335.43 rows=9949861 width=8) (actual time=602.037..632.672 rows=100 loops=1)

Join Filter: (p1_u_readlist.vid = hll_add(p1_u_readlist.vid, hll_hash_integer(vscore.vid, 0)))

Rows Removed by Join Filter: 100

->  Index Scan using idx_vscore on vscore  (cost=0.43..249335.74 rows=9999860 width=8) (actual time=0.010..37.183 rows=52695 loops=1)

->  Materialize  (cost=0.28..2.50 rows=1 width=1287) (actual time=0.000..0.000 rows=1 loops=52695)

->  Index Scan using idx_p1_u_readlist on p1_u_readlist  (cost=0.28..2.49 rows=1 width=1287) (actual time=0.019..0.021 rows=1 loops=1)

Index Cond: (uid = 1)

Planning Time: 0.146 ms

Execution Time: 632.714 ms

(10 rows)

和之前视频讲的一样, 为什么这种方式更慢了? 具有olap属性, 如下:
《重新发现PostgreSQL之美- 21 探访宇航员的食物》

PG 解决方案4

在VID表存储已读UID, 而不是在UID中存储已读VID

使用HLL、roaringbitmap存储过滤列表

在VID表存储已读UID, 而不是在UID中存储已读VID

vid int primary key,  -- 视频ID

score float4,   -- 推荐分数

uid hll,   -- 已读UID

ts timestamp

);

create index idx_p1_vscore on p1_vscore (score desc);

插入1000万条视频, score排名前5万个视频全部被1000个用户已读.

update p1_vscore set

uid= (select hll_add_agg(hll_hash_integer(i)) from generate_series(1,1000) i)

where vid in (

select vid from p1_vscore order by score desc limit 50000

);

public | p_vscore      | table             | postgres | unlogged    | heap          | 687 MB  |

public | p1_vscore | table | postgres | unlogged    | heap          | 488 MB |

为1号用户推荐视频并过滤已读, 取出100个vid.

where uid is null or uid <> hll_add(uid,hll_hash_integer(1))

order by score desc

limit 100;

Limit  (cost=0.43..3.38 rows=100 width=8) (actual time=565.654..565.773 rows=100 loops=1)

->  Index Scan using idx_p1_vscore on p1_vscore  (cost=0.43..339509.68 rows=11541416 width=8) (actual time=565.653..565.763 rows=100 loops=1)

Filter: ((uid IS NULL) OR (uid <> hll_add(uid, '-8604791237420463362'::hll_hashval)))

Rows Removed by Filter: 50000

Planning Time: 0.090 ms

Execution Time: 565.795 ms

(6 rows)

PG 解决方案5

采用partial index, 注意看清楚, 这个方法直接将过滤已读从5万缩小到1000

uid int,  -- UID

hashid int,  -- partial index hashid

vid int[]   -- 已读VID

);

create index idx_pp_u_readlist on pp_u_readlist (uid,hashid);

假设hash 50片, 5000万条压缩到50000条,

public | pp_u_readlist | table | postgres | unlogged    | heap          | 229 MB |

视频表创建partial index

declare

begin

for i in 0..49 loop

execute format( 'create index p_vscore_p%s on vscore (score desc) where abs(mod(hashint4(vid),50))=%s', i,i);

end loop;

end;

$$;

List of relations

Schema |     Name      | Type  |  Owner   |  Table   | Persistence | Access method |  Size   | Description

--------+---------------+-------+----------+----------+-------------+---------------+---------+-------------

public | p_vscore_p0   | index | postgres | vscore   | unlogged    | btree         | 4416 kB |

public | p_vscore_p1   | index | postgres | vscore   | unlogged    | btree         | 4416 kB |

public | p_vscore_p10  | index | postgres | vscore   | unlogged    | btree         | 4408 kB |

public | p_vscore_p11  | index | postgres | vscore   | unlogged    | btree         | 4416 kB |

public | p_vscore_p12  | index | postgres | vscore   | unlogged    | btree         | 4432 kB |

public | p_vscore_p13  | index | postgres | vscore   | unlogged    | btree         | 4408 kB |

public | p_vscore_p14  | index | postgres | vscore   | unlogged    | btree         | 4400 kB |

public | p_vscore_p15  | index | postgres | vscore   | unlogged    | btree         | 4416 kB |

public | p_vscore_p16  | index | postgres | vscore   | unlogged    | btree         | 4416 kB |

public | p_vscore_p17  | index | postgres | vscore   | unlogged    | btree         | 4424 kB |

public | p_vscore_p18  | index | postgres | vscore   | unlogged    | btree         | 4424 kB |

public | p_vscore_p19  | index | postgres | vscore   | unlogged    | btree         | 4416 kB |

public | p_vscore_p2   | index | postgres | vscore   | unlogged    | btree         | 4408 kB |

public | p_vscore_p20  | index | postgres | vscore   | unlogged    | btree         | 4416 kB |

public | p_vscore_p21  | index | postgres | vscore   | unlogged    | btree         | 4416 kB |

public | p_vscore_p22  | index | postgres | vscore   | unlogged    | btree         | 4408 kB |

public | p_vscore_p23  | index | postgres | vscore   | unlogged    | btree         | 4432 kB |

public | p_vscore_p24  | index | postgres | vscore   | unlogged    | btree         | 4408 kB |

public | p_vscore_p25  | index | postgres | vscore   | unlogged    | btree         | 4416 kB |

public | p_vscore_p26  | index | postgres | vscore   | unlogged    | btree         | 4408 kB |

public | p_vscore_p27  | index | postgres | vscore   | unlogged    | btree         | 4408 kB |

public | p_vscore_p28  | index | postgres | vscore   | unlogged    | btree         | 4416 kB |

public | p_vscore_p29  | index | postgres | vscore   | unlogged    | btree         | 4416 kB |

public | p_vscore_p3   | index | postgres | vscore   | unlogged    | btree         | 4424 kB |

public | p_vscore_p30  | index | postgres | vscore   | unlogged    | btree         | 4432 kB |

public | p_vscore_p31  | index | postgres | vscore   | unlogged    | btree         | 4416 kB |

public | p_vscore_p32  | index | postgres | vscore   | unlogged    | btree         | 4424 kB |

public | p_vscore_p33  | index | postgres | vscore   | unlogged    | btree         | 4416 kB |

public | p_vscore_p34  | index | postgres | vscore   | unlogged    | btree         | 4424 kB |

public | p_vscore_p35  | index | postgres | vscore   | unlogged    | btree         | 4400 kB |

public | p_vscore_p36  | index | postgres | vscore   | unlogged    | btree         | 4424 kB |

public | p_vscore_p37  | index | postgres | vscore   | unlogged    | btree         | 4408 kB |

public | p_vscore_p38  | index | postgres | vscore   | unlogged    | btree         | 4416 kB |

public | p_vscore_p39  | index | postgres | vscore   | unlogged    | btree         | 4408 kB |

public | p_vscore_p4   | index | postgres | vscore   | unlogged    | btree         | 4432 kB |

public | p_vscore_p40  | index | postgres | vscore   | unlogged    | btree         | 4432 kB |

public | p_vscore_p41  | index | postgres | vscore   | unlogged    | btree         | 4400 kB |

public | p_vscore_p42  | index | postgres | vscore   | unlogged    | btree         | 4416 kB |

public | p_vscore_p43  | index | postgres | vscore   | unlogged    | btree         | 4416 kB |

public | p_vscore_p44  | index | postgres | vscore   | unlogged    | btree         | 4424 kB |

public | p_vscore_p45  | index | postgres | vscore   | unlogged    | btree         | 4416 kB |

public | p_vscore_p46  | index | postgres | vscore   | unlogged    | btree         | 4424 kB |

public | p_vscore_p47  | index | postgres | vscore   | unlogged    | btree         | 4400 kB |

public | p_vscore_p48  | index | postgres | vscore   | unlogged    | btree         | 4416 kB |

public | p_vscore_p49  | index | postgres | vscore   | unlogged    | btree         | 4424 kB |

public | p_vscore_p5   | index | postgres | vscore   | unlogged    | btree         | 4408 kB |

public | p_vscore_p6   | index | postgres | vscore   | unlogged    | btree         | 4408 kB |

public | p_vscore_p7   | index | postgres | vscore   | unlogged    | btree         | 4416 kB |

public | p_vscore_p8   | index | postgres | vscore   | unlogged    | btree         | 4424 kB |

public | p_vscore_p9   | index | postgres | vscore   | unlogged    | btree         | 4408 kB |

为1号用户推荐视频并过滤已读, 取出100个vid.
每次获取时业务随机或采用round robin负载均衡方式从0-49个分片中获取100个.

where vid not in (select unnest(vid) from pp_u_readlist where uid=1 and hashid=0)

and abs(mod(hashint4(vid),50))=0

order by score desc

limit 100;

Limit  (cost=3.01..157.54 rows=100 width=8) (actual time=1.223..1.331 rows=100 loops=1)

->  Index Scan using p_vscore_p0 on vscore  (cost=3.01..38635.65 rows=25000 width=8) (actual time=1.222..1.317 rows=100 loops=1)

Filter: (NOT (hashed SubPlan 1))

Rows Removed by Filter: 973

SubPlan 1

->  ProjectSet  (cost=0.29..2.57 rows=10 width=4) (actual time=0.021..0.115 rows=973 loops=1)

->  Index Scan using idx_pp_u_readlist on pp_u_readlist  (cost=0.29..2.51 rows=1 width=18) (actual time=0.007..0.008 rows=1 loops=1)

Index Cond: ((uid = 1) AND (hashid = 0))

Planning Time: 0.515 ms

Execution Time: 1.366 ms

(10 rows)

其他方案略

datasketch
roaringbitmap

总结

推荐查询性能提升:

已读列表存储空间节约:

原理

1、多值列, 一个用户已读只需要存储1条
2、datasketch 近似类型, 几KB空间可以存储上亿唯一值.
3、压缩存储多值列roaringbitmap类型, 效率和空间的平衡.
4、使用partial index, 拆散过滤列表, 降低每次请求的无效数据.

参考

《PostgreSQL 大量IO扫描、计算浪费的优化- 推荐模块, 过滤已推荐. (热点用户、已推荐列表超大)》

《近似查询处理(Approximate Query Processing) - DataSketches - 压缩、distinct、分位、rank、高
频柱状图(count distinct, quantiles, most-frequent items, joins, matrix computations, and graph analysis) 等实时大数据近似分析》

《PostgreSQL pg_roaringbitmap - 用户画像、标签、高效检索》

除了本文的应用, hll, roaringbitmap还经常被用在实时分析系统, 画像系统中, 支持高速的PV UV统计, 滑窗分析, 人群圈选等.

【重新发现PostgreSQL之美】- 23 彭祖的长寿秘诀

背景

例子

传统方案

PG 解决方案1

PG 解决方案2

PG 解决方案3

PG 解决方案4

PG 解决方案5

其他方案略

总结

原理

参考

传说中的德哥

热门文章

最新文章

相关课程

相关电子书

相关实验场景

推荐镜像

热门

活动广场

任务中心

开发者评测

高校计划

乘风者计划

训练营

阿里云MVP

话题

直播

下载

镜像站

技术资料

插件

【重新发现PostgreSQL之美】- 23 彭祖的长寿秘诀

背景

例子

传统方案

PG 解决方案1

PG 解决方案2

PG 解决方案3

PG 解决方案4

PG 解决方案5

其他方案略

总结

原理

参考

传说中的德哥

热门文章

最新文章

相关课程

相关电子书

相关实验场景

推荐镜像