PostgreSQL 标签系统 bit 位运算查询性能-阿里云开发者社区

PostgreSQL 标签系统 bit 位运算查询性能

2016-05-15 6046

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

本文涉及的产品

云数据库 RDS SQL Server，基础系列 2核4GB

RDS MySQL Serverless 基础系列，0.5-2RCU 50GB

云原生数据库 PolarDB 分布式版，标准版 2核8GB

简介： 在标签系统中，通常会有多个属性，每个属性使用一个标签标示，最简单的标签是用0和1来表示，代表true和false。我们可以把所有的标签转换成比特位，例如系统中一共有200个标签，5000万用户。那么我们可以通过标签的位运算来圈定特定的人群。这样就会涉及BIT位的运算。那么我们来看看PostgreS.

在标签系统中，通常会有多个属性，每个属性使用一个标签标示，最简单的标签是用0和1来表示，代表true和false。
我们可以把所有的标签转换成比特位，例如系统中一共有200个标签，5000万用户。
那么我们可以通过标签的位运算来圈定特定的人群。
这样就会涉及BIT位的运算。
那么我们来看看PostgreSQL位运算的性能如何？
PostgreSQL 9.5

postgres=# create table t_bit2 (id bit(200));
CREATE TABLE
Time: 1.018 ms
postgres=# insert into t_bit2 select B'10101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010' from generate_series(1,50000000);
INSERT 0 50000000
Time: 47203.497 ms
postgres=# select count(*) from t_bit2 where bitand(id, '10101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010')=B'10101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010';
  count   
----------
 50000000
(1 row)

Time: 14216.286 ms
postgres=# \dt+ t_bit2
                     List of relations
 Schema |  Name  | Type  |  Owner   |  Size   | Description 
--------+--------+-------+----------+---------+-------------
 public | t_bit2 | table | postgres | 2873 MB | 
(1 row)

PostgreSQL 9.6支持并行查询

postgres=#  create table t_bit2 (id bit(200));
CREATE TABLE
Time: 0.933 ms
postgres=# insert into t_bit2 select B'10101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010' from generate_series(1,50000000);
INSERT 0 50000000
Time: 51485.962 ms
postgres=# explain (analyze,verbose,timing,costs,buffers) select count(*) from t_bit2 where bitand(id, '10101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010')=B'10101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010';
                                                                                                                                                                                                                                        QUERY
 PLAN                                                                                                                                                                                                                                        
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=471554.70..471554.71 rows=1 width=8) (actual time=9667.464..9667.465 rows=1 loops=1)
   Output: count(*)
   Buffers: shared hit=368140 dirtied=145199
   ->  Gather  (cost=471554.07..471554.68 rows=6 width=8) (actual time=9667.433..9667.454 rows=7 loops=1)
         Output: (PARTIAL count(*))
         Workers Planned: 6
         Workers Launched: 6
         Buffers: shared hit=368140 dirtied=145199
         ->  Partial Aggregate  (cost=470554.07..470554.08 rows=1 width=8) (actual time=9663.423..9663.424 rows=1 loops=7)
               Output: PARTIAL count(*)
               Buffers: shared hit=367648 dirtied=145199
               Worker 0: actual time=9662.545..9662.546 rows=1 loops=1
                 Buffers: shared hit=49944 dirtied=19645
               Worker 1: actual time=9661.922..9661.922 rows=1 loops=1
                 Buffers: shared hit=49405 dirtied=19198
               Worker 2: actual time=9662.924..9662.925 rows=1 loops=1
                 Buffers: shared hit=49968 dirtied=19641
               Worker 3: actual time=9662.483..9662.484 rows=1 loops=1
                 Buffers: shared hit=49301 dirtied=19403
               Worker 4: actual time=9663.341..9663.342 rows=1 loops=1
                 Buffers: shared hit=49825 dirtied=19814
               Worker 5: actual time=9663.605..9663.605 rows=1 loops=1
                 Buffers: shared hit=49791 dirtied=19586
               ->  Parallel Seq Scan on public.t_bit2  (cost=0.00..470468.39 rows=34274 width=0) (actual time=0.039..5724.642 rows=7142857 loops=7)
                     Output: id
                     Filter: (bitand(t_bit2.id, B'1010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101
0101010101010'::"bit") = B'10101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010'::"bit")
                     Buffers: shared hit=367648 dirtied=145199
                     Worker 0: actual time=0.038..5676.776 rows=6792384 loops=1
                       Buffers: shared hit=49944 dirtied=19645
                     Worker 1: actual time=0.046..5675.846 rows=6719080 loops=1
                       Buffers: shared hit=49405 dirtied=19198
                     Worker 2: actual time=0.040..5678.657 rows=6795648 loops=1
                       Buffers: shared hit=49968 dirtied=19641
                     Worker 3: actual time=0.037..5678.587 rows=6704936 loops=1
                       Buffers: shared hit=49301 dirtied=19403
                     Worker 4: actual time=0.039..5667.813 rows=6776072 loops=1
                       Buffers: shared hit=49825 dirtied=19814
                     Worker 5: actual time=0.051..5677.367 rows=6771576 loops=1
                       Buffers: shared hit=49791 dirtied=19586
 Planning time: 0.100 ms
 Execution time: 9772.925 ms
(41 rows)
Time: 9773.874 ms
postgres=# select count(*) from t_bit2 where bitand(id, '10101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010')=B'10101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010';
  count   
----------
 50000000
(1 row)
Time: 2326.541 ms

PostgreSQL 9.6的性能提升：

PostgreSQL 标签系统 bit 位运算查询性能

关系型数据库

热门文章

最新文章

相关产品

相关课程

相关电子书

相关实验场景

推荐镜像

PostgreSQL 标签系统 bit 位运算 查询性能

关系型数据库

热门文章

最新文章

相关产品

相关课程

相关电子书

相关实验场景

推荐镜像

PostgreSQL 标签系统 bit 位运算查询性能