PostgreSQL 1000亿数据量 正则匹配 速度与激情

本文涉及的产品
云数据库 RDS SQL Server,基础系列 2核4GB
云原生数据库 PolarDB 分布式版,标准版 2核8GB
RDS MySQL Serverless 基础系列,0.5-2RCU 50GB
简介: 承接上一篇https://yq.aliyun.com/articles/7444 测试环境为 8台主机(16 core/主机)的PostgreSQL集群,一共240个数据节点,单表数据量1008亿。性能图表 : 如果要获得更快的响应速度,可以通过增加主机和节点数(或者通过增加CPU和节点数),缩

承接上一篇
https://yq.aliyun.com/articles/7444

测试环境为 8台主机(16c/host)的 PostgreSQL集群,一共240个数据节点,测试数据量1008亿。
性能图表 :
_
如果要获得更快的响应速度,可以通过增加主机和节点数(或者通过增加CPU和节点数),缩短recheck的处理时间。

数据生成方法:

#!/bin/bash  
#      截取通过random()计算得到的MD5 128bit hex的前48bit, 转成字符串,得到[0-9]和[a-f]组成的12个随机字符串。  
  
psql digoal digoal -c "create table t_regexp_100billion distributed randomly"  
  
for ((i=1;i<=1008;i++))  
do  
  psql digoal digoal -c "copy (select substring(md5(random()::text),1,12) from generate_series(1,100000000)) to stdout" | psql digoal digoal -c "copy t_regexp_100billion from stdin"  
done  
  
psql digoal digoal -c "set maintenance_work_mem='4GB'; create index idx_t_regexp_100billion_1 on t_regexp_100billion(info)"  
psql digoal digoal -c "set maintenance_work_mem='4GB'; create index idx_t_regexp_100billion_2 on t_regexp_100billion(reverse(info))"  
psql digoal digoal -c "set maintenance_work_mem='4GB'; create index idx_t_regexp_100billion_gin on t_regexp_100billion using gin (info gin_trgm_ops)"  

数据概貌

digoal=> select count(*) from t_regexp_100billion ;  
    count       
--------------  
 100800000000  
(1 row)  
Time: 228721.386 ms  

表大小

digoal=> \dt+ t_regexp_100billion   
                           List of relations  
 Schema |        Name         | Type  | Owner  |  Size   | Description   
--------+---------------------+-------+--------+---------+-------------  
 public | t_regexp_100billion | table | digoal | 4158 GB |   
(1 row)  

索引大小

idx_t_regexp_100billion_1     2961 GB  
idx_t_regexp_100billion_1     2961 GB  
idx_t_regexp_100billion_gin   2300 GB  

测试数据展示:

digoal=> select * from t_regexp_100billion offset 1000000 limit 10;  
     info       
--------------  
 bca0fb45367e  
 3051ca8a9a38  
 fadc91a3a4de  
 710b9c60417e  
 279dd9832cc3  
 f4743fe2e83b  
 9ce9e42d4039  
 65e64742fd3f  
 db3d0e0edc52  
 7cfb00bb38ec  
(10 rows)  

重复度取样, 计算random() md5得到的字符串,可以确保非常低的重复度:

digoal=> select count(distinct info) from (select * from t_regexp_100billion offset 1299422811 limit 1000000) t;  
 count    
--------  
 999750  
(1 row)  

统计信息展示:

digoal=> alter table t_regexp_100billion alter column info set statistics 10000;  
ALTER TABLE  
digoal=> analyze t_regexp_100billion ;  
ANALYZE  
  
schemaname             | public  
tablename              | t_regexp_100billion  
attname                | info  
inherited              | f  
null_frac              | 0  
avg_width              | 13  
n_distinct             | -0.836834             # 采样统计信息,约83.6834%的唯一值  
most_common_vals       | (pg_catalog.text){7f68d12d2205,00083380706d,00154b6d79e8,...    
most_common_freqs      | {1e-06,6.66667e-07,6.66667e-07,6.66667e-07,.....        单个最高频值的占比为1e-06, 也就是说1000亿记录中出现10万次。  
histogram_bounds       | (pg_catalog.text){0000008123b7,00066c71c9bb,000d672de234,...  
correlation            | 0.000237291  
most_common_elems      |   
most_common_elem_freqs |   
elem_count_histogram   |   

7f68d12d2205 实际的出现次数,可能是采样时7f68d12d2205被采样到的块较多,所以数据库认为它的占比较多:

digoal=> select count(*) from t_regexp_100billion where info='7f68d12d2205';  
-[ RECORD 1 ]  
count | 54  
  
digoal=> select ctid from t_regexp_100billion where info='7f68d12d2205' order by 1;  
     ctid        
---------------  
 (15343,114)  
 (62134,39)  
 (96808,112)  
 (116492,176)  
 (194615,143)  
 (328074,116)  
 (364037,115)  
 (375240,158)  
 (376187,152)  
 (602144,81)  
 (664026,6)  
 (689501,136)  
 (695345,130)  
 (697374,126)  
 (714719,148)  
 (743169,20)  
 (802326,139)  
 (833830,41)  
 (839417,185)  
 (892417,78)  
 (892493,149)  
 (907979,52)  
 (967078,163)  
 (990313,159)  
 (1007998,27)  
 (1106961,57)  
 (1142731,165)  
 (1148427,67)  
 (1156654,156)  
 (1205854,137)  
 (1243429,68)  
 (1277287,165)  
 (1328836,98)  
 (1331727,150)  
 (1337534,3)  
 (1360947,104)  
 (1438970,97)  
 (1476941,22)  
 (1482022,82)  
 (1486307,69)  
 (1548445,155)  
 (1557209,82)  
 (1564980,158)  
 (1646685,76)  
 (1663018,99)  
 (1678604,77)  
 (1755845,177)  
 (1981937,153)  
 (1984723,98)  
 (2071955,59)  
 (2093147,149)  
 (2199794,102)  
 (2204957,44)  
 (2234820,142)  
(54 rows)  

性能测试:
前缀匹配查询速度:

digoal=> select ctid,tableoid,info from t_regexp_100billion where info ~ '^80ebcdd47';  
     ctid      | tableoid |     info       
---------------+----------+--------------  
 (124741,60)   |    16677 | 80ebcdd47006  
 (896121,64)   |    16659 | 80ebcdd47006  
 (1124495,97)  |    16659 | 80ebcdd47006  
 (1126474,141) |    16659 | 80ebcdd47006  
 (1059471,62)  |    16659 | 80ebcdd47006  
 (1296562,115) |    16659 | 80ebcdd47006  
 (1190941,122) |    16659 | 80ebcdd47006  
 (680853,129)  |    16659 | 80ebcdd47006  
 (1010667,15)  |    16659 | 80ebcdd47006  
 (1386348,25)  |    16659 | 80ebcdd47006  
 (1522827,90)  |    16659 | 80ebcdd47006  
 (2204071,129) |    16659 | 80ebcdd47006  
 (1570431,114) |    16659 | 80ebcdd47006  
 (888185,38)   |    16659 | 80ebcdd47006  
 (605886,160)  |    16659 | 80ebcdd47006  
 (1306061,123) |    16659 | 80ebcdd47006  
 (757157,47)   |    16659 | 80ebcdd47006  
 (1166290,83)  |    16659 | 80ebcdd47006  
 (419730,1)    |    16659 | 80ebcdd47006  
 (1833853,131) |    16659 | 80ebcdd47006  
 (964866,120)  |    16659 | 80ebcdd47006  
 (904961,175)  |    16659 | 80ebcdd47006  
 (984373,32)   |    16659 | 80ebcdd47006  
 (891018,145)  |    16659 | 80ebcdd47006  
 (1520483,121) |    16659 | 80ebcdd47006  
 (571001,124)  |    16659 | 80ebcdd47006  
 (802093,55)   |    16659 | 80ebcdd47006  
 (6831,172)    |    16659 | 80ebcdd47006  
 (1169137,84)  |    16659 | 80ebcdd47006  
 (77398,164)   |    16659 | 80ebcdd47006  
 (24132,98)    |    16659 | 80ebcdd47006  
 (564322,152)  |    16659 | 80ebcdd47006  
 (357087,172)  |    16659 | 80ebcdd47006  
 (1823628,60)  |    16659 | 80ebcdd47006  
 (2153609,52)  |    16659 | 80ebcdd47006  
 (816401,140)  |    16659 | 80ebcdd47006  
 (542383,53)   |    16662 | 80ebcdd47006  
 (1340971,64)  |    16662 | 80ebcdd47006  
 (1239166,108) |    16662 | 80ebcdd47006  
 (2033648,39)  |    16662 | 80ebcdd47006  
 (1890808,93)  |    16662 | 80ebcdd47006  
 (1213124,4)   |    16662 | 80ebcdd47006  
 (1025184,106) |    16662 | 80ebcdd47006  
 (620238,131)  |    16662 | 80ebcdd47006  
 (583064,74)   |    16662 | 80ebcdd47006  
 (1454680,42)  |    16671 | 80ebcdd47006  
 (417385,74)   |    16671 | 80ebcdd47006  
 (323669,61)   |    16671 | 80ebcdd47006  
 (1759181,138) |    16671 | 80ebcdd47006  
 (2112157,146) |    16671 | 80ebcdd47006  
 (431326,92)   |    16671 | 80ebcdd47006  
 (2097356,110) |    16671 | 80ebcdd47006  
(52 rows)  
Time: 3226.393 ms  
  
digoal=> explain (analyze,verbose,buffers,costs,timing) select ctid,tableoid,info from t_regexp_100billion where info ~ '^80ebcdd47';  
 Remote Fast Query Execution  (cost=0.00..0.00 rows=0 width=0) (actual time=3085.502..3112.273 rows=52 loops=1)  
   Output: t_regexp_100billion.ctid, t_regexp_100billion.tableoid, t_regexp_100billion.info  
   Node/s: h1_data1, h1_data10, h1_data11, h1_data12, h1_data13, h1_data14, h1_data15, h1_data16, h1_data17, h1_data18, h1_data19, h1_data2, h1_data20, h1_data21, h1_data22, h1_data23, h1_data24, h1_data25, h1_data26, h1_data27, h1_data2  
8, h1_data29, h1_data3, h1_data30, h1_data4, h1_data5, h1_data6, h1_data7, h1_data8, h1_data9, h2_data1, h2_data10, h2_data11, h2_data12, h2_data13, h2_data14, h2_data15, h2_data16, h2_data17, h2_data18, h2_data19, h2_data2, h2_data20, h  
2_data21, h2_data22, h2_data23, h2_data24, h2_data25, h2_data26, h2_data27, h2_data28, h2_data29, h2_data3, h2_data30, h2_data4, h2_data5, h2_data6, h2_data7, h2_data8, h2_data9, h3_data1, h3_data10, h3_data11, h3_data12, h3_data13, h3_d  
ata14, h3_data15, h3_data16, h3_data17, h3_data18, h3_data19, h3_data2, h3_data20, h3_data21, h3_data22, h3_data23, h3_data24, h3_data25, h3_data26, h3_data27, h3_data28, h3_data29, h3_data3, h3_data30, h3_data4, h3_data5, h3_data6, h3_d  
ata7, h3_data8, h3_data9, h4_data1, h4_data10, h4_data11, h4_data12, h4_data13, h4_data14, h4_data15, h4_data16, h4_data17, h4_data18, h4_data19, h4_data2, h4_data20, h4_data21, h4_data22, h4_data23, h4_data24, h4_data25, h4_data26, h4_d  
ata27, h4_data28, h4_data29, h4_data3, h4_data30, h4_data4, h4_data5, h4_data6, h4_data7, h4_data8, h4_data9, h5_data1, h5_data10, h5_data11, h5_data12, h5_data13, h5_data14, h5_data15, h5_data16, h5_data17, h5_data18, h5_data19, h5_data  
2, h5_data20, h5_data21, h5_data22, h5_data23, h5_data24, h5_data25, h5_data26, h5_data27, h5_data28, h5_data29, h5_data3, h5_data30, h5_data4, h5_data5, h5_data6, h5_data7, h5_data8, h5_data9, h6_data1, h6_data10, h6_data11, h6_data12,   
h6_data13, h6_data14, h6_data15, h6_data16, h6_data17, h6_data18, h6_data19, h6_data2, h6_data20, h6_data21, h6_data22, h6_data23, h6_data24, h6_data25, h6_data26, h6_data27, h6_data28, h6_data29, h6_data3, h6_data30, h6_data4, h6_data5,  
 h6_data6, h6_data7, h6_data8, h6_data9, h7_data1, h7_data10, h7_data11, h7_data12, h7_data13, h7_data14, h7_data15, h7_data16, h7_data17, h7_data18, h7_data19, h7_data2, h7_data20, h7_data21, h7_data22, h7_data23, h7_data24, h7_data25,   
h7_data26, h7_data27, h7_data28, h7_data29, h7_data3, h7_data30, h7_data4, h7_data5, h7_data6, h7_data7, h7_data8, h7_data9, h8_data1, h8_data10, h8_data11, h8_data12, h8_data13, h8_data14, h8_data15, h8_data16, h8_data17, h8_data18, h8_  
data19, h8_data2, h8_data20, h8_data21, h8_data22, h8_data23, h8_data24, h8_data25, h8_data26, h8_data27, h8_data28, h8_data29, h8_data3, h8_data30, h8_data4, h8_data5, h8_data6, h8_data7, h8_data8, h8_data9  
   Remote query: SELECT ctid, tableoid, info FROM t_regexp_100billion WHERE (info ~ '^80ebcdd47'::text)  
 Planning time: 0.061 ms  
 Execution time: 3112.296 ms  
(6 rows)  
Time: 3139.928 ms  

后缀匹配查询速度

digoal=> select ctid,tableoid,info from t_regexp_100billion where reverse(info) ~ '^f42d12089b';  
     ctid      | tableoid |     info       
---------------+----------+--------------  
 (124741,26)   |    16677 | f3b98021d24f  
 (1696888,151) |    16659 | f3b98021d24f  
 (1278911,101) |    16659 | f3b98021d24f  
 (1427480,157) |    16659 | f3b98021d24f  
 (449192,30)   |    16659 | f3b98021d24f  
 (1833887,81)  |    16659 | f3b98021d24f  
 (229525,72)   |    16659 | f3b98021d24f  
 (1353789,17)  |    16659 | f3b98021d24f  
 (1875911,148) |    16659 | f3b98021d24f  
 (1847078,35)  |    16659 | f3b98021d24f  
 (316780,156)  |    16659 | f3b98021d24f  
 (1265453,120) |    16659 | f3b98021d24f  
 (100075,60)   |    16659 | f3b98021d24f  
 (1924176,2)   |    16659 | f3b98021d24f  
 (279583,2)    |    16659 | f3b98021d24f  
 (1631226,23)  |    16659 | f3b98021d24f  
 (1906666,50)  |    16659 | f3b98021d24f  
 (1640803,116) |    16659 | f3b98021d24f  
 (629651,46)   |    16659 | f3b98021d24f  
 (134982,13)   |    16659 | f3b98021d24f  
 (380660,123)  |    16659 | f3b98021d24f  
 (2158193,31)  |    16659 | f3b98021d24f  
 (324901,64)   |    16659 | f3b98021d24f  
 (1243973,160) |    16659 | f3b98021d24f  
 (540958,139)  |    16659 | f3b98021d24f  
 (441475,99)   |    16659 | f3b98021d24f  
 (1207114,121) |    16659 | f3b98021d24f  
 (574598,21)   |    16659 | f3b98021d24f  
 (1253283,185) |    16659 | f3b98021d24f  
 (1396717,142) |    16659 | f3b98021d24f  
 (149738,9)    |    16659 | f3b98021d24f  
 (764749,26)   |    16659 | f3b98021d24f  
 (1211899,5)   |    16659 | f3b98021d24f  
 (1626746,65)  |    16659 | f3b98021d24f  
 (1342895,124) |    16659 | f3b98021d24f  
 (733794,136)  |    16659 | f3b98021d24f  
 (417796,2)    |    16659 | f3b98021d24f  
 (555520,163)  |    16659 | f3b98021d24f  
 (232038,105)  |    16659 | f3b98021d24f  
 (355107,127)  |    16659 | f3b98021d24f  
 (352143,175)  |    16662 | f3b98021d24f  
 (1856293,69)  |    16662 | f3b98021d24f  
 (1405106,105) |    16662 | f3b98021d24f  
 (47689,79)    |    16662 | f3b98021d24f  
 (679310,7)    |    16671 | f3b98021d24f  
 (1076234,164) |    16671 | f3b98021d24f  
(46 rows)  
Time: 3140.835 ms  
  
  
digoal=> explain (verbose,costs,timing,buffers,analyze) select ctid,tableoid,info from t_regexp_100billion where reverse(info) ~ '^f42d12089b';  
 Remote Fast Query Execution  (cost=0.00..0.00 rows=0 width=0) (actual time=3085.738..3112.216 rows=46 loops=1)  
   Output: t_regexp_100billion.ctid, t_regexp_100billion.tableoid, t_regexp_100billion.info  
   Node/s: h1_data1, h1_data10, h1_data11, h1_data12, h1_data13, h1_data14, h1_data15, h1_data16, h1_data17, h1_data18, h1_data19, h1_data2, h1_data20, h1_data21, h1_data22, h1_data23, h1_data24, h1_data25, h1_data26, h1_data27, h1_data2  
8, h1_data29, h1_data3, h1_data30, h1_data4, h1_data5, h1_data6, h1_data7, h1_data8, h1_data9, h2_data1, h2_data10, h2_data11, h2_data12, h2_data13, h2_data14, h2_data15, h2_data16, h2_data17, h2_data18, h2_data19, h2_data2, h2_data20, h  
2_data21, h2_data22, h2_data23, h2_data24, h2_data25, h2_data26, h2_data27, h2_data28, h2_data29, h2_data3, h2_data30, h2_data4, h2_data5, h2_data6, h2_data7, h2_data8, h2_data9, h3_data1, h3_data10, h3_data11, h3_data12, h3_data13, h3_d  
ata14, h3_data15, h3_data16, h3_data17, h3_data18, h3_data19, h3_data2, h3_data20, h3_data21, h3_data22, h3_data23, h3_data24, h3_data25, h3_data26, h3_data27, h3_data28, h3_data29, h3_data3, h3_data30, h3_data4, h3_data5, h3_data6, h3_d  
ata7, h3_data8, h3_data9, h4_data1, h4_data10, h4_data11, h4_data12, h4_data13, h4_data14, h4_data15, h4_data16, h4_data17, h4_data18, h4_data19, h4_data2, h4_data20, h4_data21, h4_data22, h4_data23, h4_data24, h4_data25, h4_data26, h4_d  
ata27, h4_data28, h4_data29, h4_data3, h4_data30, h4_data4, h4_data5, h4_data6, h4_data7, h4_data8, h4_data9, h5_data1, h5_data10, h5_data11, h5_data12, h5_data13, h5_data14, h5_data15, h5_data16, h5_data17, h5_data18, h5_data19, h5_data  
2, h5_data20, h5_data21, h5_data22, h5_data23, h5_data24, h5_data25, h5_data26, h5_data27, h5_data28, h5_data29, h5_data3, h5_data30, h5_data4, h5_data5, h5_data6, h5_data7, h5_data8, h5_data9, h6_data1, h6_data10, h6_data11, h6_data12,   
h6_data13, h6_data14, h6_data15, h6_data16, h6_data17, h6_data18, h6_data19, h6_data2, h6_data20, h6_data21, h6_data22, h6_data23, h6_data24, h6_data25, h6_data26, h6_data27, h6_data28, h6_data29, h6_data3, h6_data30, h6_data4, h6_data5,  
 h6_data6, h6_data7, h6_data8, h6_data9, h7_data1, h7_data10, h7_data11, h7_data12, h7_data13, h7_data14, h7_data15, h7_data16, h7_data17, h7_data18, h7_data19, h7_data2, h7_data20, h7_data21, h7_data22, h7_data23, h7_data24, h7_data25,   
h7_data26, h7_data27, h7_data28, h7_data29, h7_data3, h7_data30, h7_data4, h7_data5, h7_data6, h7_data7, h7_data8, h7_data9, h8_data1, h8_data10, h8_data11, h8_data12, h8_data13, h8_data14, h8_data15, h8_data16, h8_data17, h8_data18, h8_  
data19, h8_data2, h8_data20, h8_data21, h8_data22, h8_data23, h8_data24, h8_data25, h8_data26, h8_data27, h8_data28, h8_data29, h8_data3, h8_data30, h8_data4, h8_data5, h8_data6, h8_data7, h8_data8, h8_data9  
   Remote query: SELECT ctid, tableoid, info FROM t_regexp_100billion WHERE (reverse(info) ~ '^f42d12089b'::text)  
 Planning time: 0.063 ms  
 Execution time: 3112.236 ms  
(6 rows)  
  
Time: 3139.890 ms  

前后模糊查询速度:

digoal=> select ctid,tableoid,info from t_regexp_100billion where info ~ 'e7add04871';  
     ctid      | tableoid |     info       
---------------+----------+--------------  
 (124741,45)   |    16677 | be7add048713  
 (49315,69)    |    16659 | be7add048713  
 (1770876,21)  |    16659 | be7add048713  
 (199079,143)  |    16659 | be7add048713  
 (151110,141)  |    16659 | be7add048713  
 (1597384,137) |    16659 | be7add048713  
 (1693453,25)  |    16659 | be7add048713  
 (101576,132)  |    16659 | be7add048713  
 (1110249,50)  |    16659 | be7add048713  
 (792326,68)   |    16659 | be7add048713  
 (1676705,68)  |    16659 | be7add048713  
 (1269148,101) |    16659 | be7add048713  
 (1027442,113) |    16659 | be7add048713  
 (1078144,100) |    16659 | be7add048713  
 (584038,141)  |    16659 | be7add048713  
 (1245454,80)  |    16659 | be7add048713  
 (1551184,102) |    16659 | be7add048713  
 (1326266,17)  |    16659 | be7add048713  
 (432025,101)  |    16659 | be7add048713  
 (300650,152)  |    16659 | be7add048713  
 (1322140,15)  |    16662 | be7add048713  
 (1424768,25)  |    16662 | be7add048713  
 (391150,31)   |    16662 | be7add048713  
 (254014,170)  |    16662 | be7add048713  
 (1758616,85)  |    16662 | be7add048713  
 (1720990,105) |    16662 | be7add048713  
 (345908,68)   |    16662 | be7add048713  
 (1592333,102) |    16662 | be7add048713  
 (1843902,130) |    16671 | be7add048713  
 (898136,121)  |    16671 | be7add048713  
 (1469985,138) |    16671 | be7add048713  
 (1287666,51)  |    16671 | be7add048713  
(32 rows)  
  
Time: 4970.662 ms  
  
digoal=> explain (analyze,verbose,timing,costs,buffers) select ctid,tableoid,info from t_regexp_100billion where info ~ 'e7add04871';  
 Remote Fast Query Execution  (cost=0.00..0.00 rows=0 width=0) (actual time=4712.916..4897.512 rows=32 loops=1)  
   Output: t_regexp_100billion.ctid, t_regexp_100billion.tableoid, t_regexp_100billion.info  
   Node/s: h1_data1, h1_data10, h1_data11, h1_data12, h1_data13, h1_data14, h1_data15, h1_data16, h1_data17, h1_data18, h1_data19, h1_data2, h1_data20, h1_data21, h1_data22, h1_data23, h1_data24, h1_data25, h1_data26, h1_data27, h1_data2  
8, h1_data29, h1_data3, h1_data30, h1_data4, h1_data5, h1_data6, h1_data7, h1_data8, h1_data9, h2_data1, h2_data10, h2_data11, h2_data12, h2_data13, h2_data14, h2_data15, h2_data16, h2_data17, h2_data18, h2_data19, h2_data2, h2_data20, h  
2_data21, h2_data22, h2_data23, h2_data24, h2_data25, h2_data26, h2_data27, h2_data28, h2_data29, h2_data3, h2_data30, h2_data4, h2_data5, h2_data6, h2_data7, h2_data8, h2_data9, h3_data1, h3_data10, h3_data11, h3_data12, h3_data13, h3_d  
ata14, h3_data15, h3_data16, h3_data17, h3_data18, h3_data19, h3_data2, h3_data20, h3_data21, h3_data22, h3_data23, h3_data24, h3_data25, h3_data26, h3_data27, h3_data28, h3_data29, h3_data3, h3_data30, h3_data4, h3_data5, h3_data6, h3_d  
ata7, h3_data8, h3_data9, h4_data1, h4_data10, h4_data11, h4_data12, h4_data13, h4_data14, h4_data15, h4_data16, h4_data17, h4_data18, h4_data19, h4_data2, h4_data20, h4_data21, h4_data22, h4_data23, h4_data24, h4_data25, h4_data26, h4_d  
ata27, h4_data28, h4_data29, h4_data3, h4_data30, h4_data4, h4_data5, h4_data6, h4_data7, h4_data8, h4_data9, h5_data1, h5_data10, h5_data11, h5_data12, h5_data13, h5_data14, h5_data15, h5_data16, h5_data17, h5_data18, h5_data19, h5_data  
2, h5_data20, h5_data21, h5_data22, h5_data23, h5_data24, h5_data25, h5_data26, h5_data27, h5_data28, h5_data29, h5_data3, h5_data30, h5_data4, h5_data5, h5_data6, h5_data7, h5_data8, h5_data9, h6_data1, h6_data10, h6_data11, h6_data12,   
h6_data13, h6_data14, h6_data15, h6_data16, h6_data17, h6_data18, h6_data19, h6_data2, h6_data20, h6_data21, h6_data22, h6_data23, h6_data24, h6_data25, h6_data26, h6_data27, h6_data28, h6_data29, h6_data3, h6_data30, h6_data4, h6_data5,  
 h6_data6, h6_data7, h6_data8, h6_data9, h7_data1, h7_data10, h7_data11, h7_data12, h7_data13, h7_data14, h7_data15, h7_data16, h7_data17, h7_data18, h7_data19, h7_data2, h7_data20, h7_data21, h7_data22, h7_data23, h7_data24, h7_data25,   
h7_data26, h7_data27, h7_data28, h7_data29, h7_data3, h7_data30, h7_data4, h7_data5, h7_data6, h7_data7, h7_data8, h7_data9, h8_data1, h8_data10, h8_data11, h8_data12, h8_data13, h8_data14, h8_data15, h8_data16, h8_data17, h8_data18, h8_  
data19, h8_data2, h8_data20, h8_data21, h8_data22, h8_data23, h8_data24, h8_data25, h8_data26, h8_data27, h8_data28, h8_data29, h8_data3, h8_data30, h8_data4, h8_data5, h8_data6, h8_data7, h8_data8, h8_data9  
   Remote query: SELECT ctid, tableoid, info FROM t_regexp_100billion WHERE (info ~ 'e7add04871'::text)  
 Planning time: 0.063 ms  
 Execution time: 4897.532 ms  
(6 rows)  
  
Time: 4925.741 ms  

正则匹配查询速度

digoal=> select ctid,tableoid,info from t_regexp_100billion where info ~ '.3918.209f';  
     ctid      | tableoid |     info       
---------------+----------+--------------  
 (124741,29)   |    16677 | 0b39188209f2  
 (1443707,79)  |    16659 | 0b39188209f2  
 (596962,50)   |    16659 | 0b39188209f2  
 (1763787,145) |    16659 | 0b39188209f2  
 (2192691,24)  |    16659 | 0b39188209f2  
 (425121,26)   |    16659 | 0b39188209f2  
 (2157735,117) |    16659 | 0b39188209f2  
 (826685,32)   |    16659 | 0b39188209f2  
 (507417,51)   |    16659 | 0b39188209f2  
 (1168854,22)  |    16659 | 0b39188209f2  
 (178112,96)   |    16659 | 0b39188209f2  
 (1609343,84)  |    16659 | 0b39188209f2  
 (1883190,161) |    16659 | 0b39188209f2  
 (1879921,82)  |    16659 | 0b39188209f2  
 (187722,148)  |    16659 | 0b39188209f2  
 (411680,31)   |    16659 | 0b39188209f2  
 (1103474,74)  |    16659 | 0b39188209f2  
 (1756318,139) |    16659 | 0b39188209f2  
 (760475,112)  |    16659 | 0b39188209f2  
 (656098,47)   |    16659 | 0b39188209f2  
 (2015224,31)  |    16659 | 0b39188209f2  
 (401158,64)   |    16659 | 0b39188209f2  
 (1001315,155) |    16659 | 0b39188209f2  
 (527643,24)   |    16659 | 0b39188209f2  
 (51198,95)    |    16659 | 0b39188209f2  
 (1709591,26)  |    16659 | 0b39188209f2  
 (1235618,22)  |    16659 | 0b39188209f2  
 (542813,107)  |    16659 | 0b39188209f2  
 (331468,156)  |    16659 | 0b39188209f2  
 (940954,68)   |    16662 | 0b39188209f2  
 (1295686,169) |    16662 | 0b39188209f2  
 (825955,109)  |    16668 | 0b39188209f2  
 (2025210,165) |    16671 | 0b39188209f2  
 (1639115,139) |    16671 | 0b39188209f2  
 (422678,79)   |    16671 | 0b39188209f2  
 (379949,175)  |    16671 | 0b39188209f2  
 (455206,96)   |    16671 | 0b39188209f2  
 (1745081,184) |    16671 | 0b39188209f2  
(38 rows)  
Time: 3580.536 ms  
  
digoal=> explain (verbose,analyze,timing,costs,buffers) select ctid,tableoid,info from t_regexp_100billion where info ~ '.3918.209f';  
 Remote Fast Query Execution  (cost=0.00..0.00 rows=0 width=0) (actual time=3407.156..3621.601 rows=38 loops=1)  
   Output: t_regexp_100billion.ctid, t_regexp_100billion.tableoid, t_regexp_100billion.info  
   Node/s: h1_data1, h1_data10, h1_data11, h1_data12, h1_data13, h1_data14, h1_data15, h1_data16, h1_data17, h1_data18, h1_data19, h1_data2, h1_data20, h1_data21, h1_data22, h1_data23, h1_data24, h1_data25, h1_data26, h1_data27, h1_data2  
8, h1_data29, h1_data3, h1_data30, h1_data4, h1_data5, h1_data6, h1_data7, h1_data8, h1_data9, h2_data1, h2_data10, h2_data11, h2_data12, h2_data13, h2_data14, h2_data15, h2_data16, h2_data17, h2_data18, h2_data19, h2_data2, h2_data20, h  
2_data21, h2_data22, h2_data23, h2_data24, h2_data25, h2_data26, h2_data27, h2_data28, h2_data29, h2_data3, h2_data30, h2_data4, h2_data5, h2_data6, h2_data7, h2_data8, h2_data9, h3_data1, h3_data10, h3_data11, h3_data12, h3_data13, h3_d  
ata14, h3_data15, h3_data16, h3_data17, h3_data18, h3_data19, h3_data2, h3_data20, h3_data21, h3_data22, h3_data23, h3_data24, h3_data25, h3_data26, h3_data27, h3_data28, h3_data29, h3_data3, h3_data30, h3_data4, h3_data5, h3_data6, h3_d  
ata7, h3_data8, h3_data9, h4_data1, h4_data10, h4_data11, h4_data12, h4_data13, h4_data14, h4_data15, h4_data16, h4_data17, h4_data18, h4_data19, h4_data2, h4_data20, h4_data21, h4_data22, h4_data23, h4_data24, h4_data25, h4_data26, h4_d  
ata27, h4_data28, h4_data29, h4_data3, h4_data30, h4_data4, h4_data5, h4_data6, h4_data7, h4_data8, h4_data9, h5_data1, h5_data10, h5_data11, h5_data12, h5_data13, h5_data14, h5_data15, h5_data16, h5_data17, h5_data18, h5_data19, h5_data  
2, h5_data20, h5_data21, h5_data22, h5_data23, h5_data24, h5_data25, h5_data26, h5_data27, h5_data28, h5_data29, h5_data3, h5_data30, h5_data4, h5_data5, h5_data6, h5_data7, h5_data8, h5_data9, h6_data1, h6_data10, h6_data11, h6_data12,   
h6_data13, h6_data14, h6_data15, h6_data16, h6_data17, h6_data18, h6_data19, h6_data2, h6_data20, h6_data21, h6_data22, h6_data23, h6_data24, h6_data25, h6_data26, h6_data27, h6_data28, h6_data29, h6_data3, h6_data30, h6_data4, h6_data5,  
 h6_data6, h6_data7, h6_data8, h6_data9, h7_data1, h7_data10, h7_data11, h7_data12, h7_data13, h7_data14, h7_data15, h7_data16, h7_data17, h7_data18, h7_data19, h7_data2, h7_data20, h7_data21, h7_data22, h7_data23, h7_data24, h7_data25,   
h7_data26, h7_data27, h7_data28, h7_data29, h7_data3, h7_data30, h7_data4, h7_data5, h7_data6, h7_data7, h7_data8, h7_data9, h8_data1, h8_data10, h8_data11, h8_data12, h8_data13, h8_data14, h8_data15, h8_data16, h8_data17, h8_data18, h8_  
data19, h8_data2, h8_data20, h8_data21, h8_data22, h8_data23, h8_data24, h8_data25, h8_data26, h8_data27, h8_data28, h8_data29, h8_data3, h8_data30, h8_data4, h8_data5, h8_data6, h8_data7, h8_data8, h8_data9  
   Remote query: SELECT ctid, tableoid, info FROM t_regexp_100billion WHERE (info ~ '.3918.209f'::text)  
 Planning time: 0.072 ms  
 Execution time: 3621.626 ms  
(6 rows)  
Time: 3650.045 ms  
  
digoal=> select ctid,tableoid,info from t_regexp_100billion where info ~ 'ab2..d[1|f]3c8';  
     ctid      | tableoid |     info       
---------------+----------+--------------  
 (899065,160)  |    16659 | 4eab207df3c8  
 (2100060,157) |    16659 | a2ab2fbdf3c8  
 (162213,12)   |    16659 | a2ab2fbdf3c8  
 (637030,50)   |    16659 | 4eab207df3c8  
 (1325830,35)  |    16659 | a2ab2fbdf3c8  
 (197454,129)  |    16659 | 4eab207df3c8  
 (2000258,158) |    16659 | a2ab2fbdf3c8  
 (765698,19)   |    16659 | a2ab2fbdf3c8  
 (935743,59)   |    16659 | 4eab207df3c8  
 (2203339,96)  |    16659 | a2ab2fbdf3c8  
 (701234,118)  |    16659 | a2ab2fbdf3c8  
 (971717,57)   |    16659 | a2ab2fbdf3c8  
 (1164498,54)  |    16659 | 4eab207df3c8  
 (393227,147)  |    16659 | 4eab207df3c8  
 (1439445,94)  |    16659 | a2ab2fbdf3c8  
 (1549135,146) |    16659 | 4eab207df3c8  
 (1551991,36)  |    16659 | 4eab207df3c8  
 (2206488,3)   |    16659 | a2ab2fbdf3c8  
 (481614,118)  |    16659 | 4eab207df3c8  
 (1809085,7)   |    16659 | a2ab2fbdf3c8  
 (173214,139)  |    16659 | 4eab207df3c8  
 (1021816,28)  |    16659 | 4eab207df3c8  
 (829846,43)   |    16659 | a2ab2fbdf3c8  
 (1899020,79)  |    16659 | 4eab207df3c8  
 (6241,163)    |    16659 | 4eab207df3c8  
 (1205920,5)   |    16659 | a2ab2fbdf3c8  
 (412014,52)   |    16659 | 4eab207df3c8  
 (1122051,14)  |    16659 | 4eab207df3c8  
 (284493,87)   |    16659 | 4eab207df3c8  
 (374322,83)   |    16659 | a2ab2fbdf3c8  
 (189124,19)   |    16659 | 4eab207df3c8  
 (747428,175)  |    16659 | a2ab2fbdf3c8  
 (795035,152)  |    16659 | a2ab2fbdf3c8  
 (1949396,25)  |    16659 | a2ab2fbdf3c8  
 (154445,167)  |    16659 | a2ab2fbdf3c8  
 (859513,82)   |    16659 | a2ab2fbdf3c8  
 (31337,41)    |    16659 | a2ab2fbdf3c8  
 (1393343,136) |    16659 | 4eab207df3c8  
 (63555,82)    |    16659 | a2ab2fbdf3c8  
 (608980,177)  |    16659 | 4eab207df3c8  
 (250484,31)   |    16659 | a2ab2fbdf3c8  
 (1696502,87)  |    16659 | 4eab207df3c8  
 (2021326,68)  |    16659 | a2ab2fbdf3c8  
 (397967,70)   |    16659 | a2ab2fbdf3c8  
 (2083071,101) |    16659 | a2ab2fbdf3c8  
 (98554,23)    |    16659 | a2ab2fbdf3c8  
 (1247891,182) |    16659 | 4eab207df3c8  
 (1533143,51)  |    16659 | a2ab2fbdf3c8  
 (1280652,28)  |    16659 | a2ab2fbdf3c8  
 (1337921,119) |    16659 | a2ab2fbdf3c8  
 (446914,180)  |    16659 | a2ab2fbdf3c8  
 (1810263,161) |    16659 | a2ab2fbdf3c8  
 (350272,51)   |    16659 | 4eab207df3c8  
 (909148,37)   |    16659 | 4eab207df3c8  
 (197153,108)  |    16659 | a2ab2fbdf3c8  
 (207423,96)   |    16659 | 4eab207df3c8  
 (1097934,22)  |    16659 | a2ab2fbdf3c8  
 (12605,49)    |    16659 | 4eab207df3c8  
 (65244,28)    |    16659 | 4eab207df3c8  
 (10274,30)    |    16659 | a2ab2fbdf3c8  
 (1547771,91)  |    16659 | 4eab207df3c8  
 (55044,64)    |    16659 | 4eab207df3c8  
 (1286116,136) |    16659 | 4eab207df3c8  
 (797831,10)   |    16659 | a2ab2fbdf3c8  
 (450949,98)   |    16659 | a2ab2fbdf3c8  
 (563308,46)   |    16659 | 4eab207df3c8  
 (1815443,179) |    16659 | a2ab2fbdf3c8  
 (279403,105)  |    16659 | 4eab207df3c8  
 (1953284,11)  |    16659 | 4eab207df3c8  
 (2068896,15)  |    16659 | 4eab207df3c8  
 (1230212,18)  |    16659 | 4eab207df3c8  
 (1513277,18)  |    16659 | 4eab207df3c8  
 (1675223,30)  |    16659 | a2ab2fbdf3c8  
 (966609,80)   |    16662 | a2ab2fbdf3c8  
 (118085,180)  |    16662 | a2ab2fbdf3c8  
 (1557051,116) |    16662 | a2ab2fbdf3c8  
 (1848877,62)  |    16662 | 4eab207df3c8  
 (2224775,3)   |    16662 | 4eab207df3c8  
 (1196571,72)  |    16662 | 4eab207df3c8  
 (1799448,154) |    16662 | 4eab207df3c8  
 (2246230,68)  |    16662 | a2ab2fbdf3c8  
 (984529,120)  |    16662 | a2ab2fbdf3c8  
 (1361482,97)  |    16662 | 4eab207df3c8  
 (1935512,51)  |    16662 | a2ab2fbdf3c8  
 (816119,95)   |    16662 | a2ab2fbdf3c8  
 (770381,45)   |    16662 | 4eab207df3c8  
 (1943960,146) |    16662 | a2ab2fbdf3c8  
 (346006,160)  |    16671 | a2ab2fbdf3c8  
 (1873262,96)  |    16671 | 4eab207df3c8  
 (1219041,118) |    16671 | a2ab2fbdf3c8  
 (418076,24)   |    16671 | a2ab2fbdf3c8  
 (724463,28)   |    16671 | a2ab2fbdf3c8  
 (1471492,164) |    16671 | a2ab2fbdf3c8  
 (975490,122)  |    16671 | a2ab2fbdf3c8  
 (1885629,34)  |    16671 | 4eab207df3c8  
(95 rows)  
Time: 4718.459 ms  
  
digoal=> explain (verbose,timing,costs,buffers,analyze) select ctid,tableoid,info from t_regexp_100billion where info ~ 'ab2..d[1|f]3c8';  
 Remote Fast Query Execution  (cost=0.00..0.00 rows=0 width=0) (actual time=4386.010..4648.614 rows=95 loops=1)  
   Output: t_regexp_100billion.ctid, t_regexp_100billion.tableoid, t_regexp_100billion.info  
   Node/s: h1_data1, h1_data10, h1_data11, h1_data12, h1_data13, h1_data14, h1_data15, h1_data16, h1_data17, h1_data18, h1_data19, h1_data2, h1_data20, h1_data21, h1_data22, h1_data23, h1_data24, h1_data25, h1_data26, h1_data27, h1_data2  
8, h1_data29, h1_data3, h1_data30, h1_data4, h1_data5, h1_data6, h1_data7, h1_data8, h1_data9, h2_data1, h2_data10, h2_data11, h2_data12, h2_data13, h2_data14, h2_data15, h2_data16, h2_data17, h2_data18, h2_data19, h2_data2, h2_data20, h  
2_data21, h2_data22, h2_data23, h2_data24, h2_data25, h2_data26, h2_data27, h2_data28, h2_data29, h2_data3, h2_data30, h2_data4, h2_data5, h2_data6, h2_data7, h2_data8, h2_data9, h3_data1, h3_data10, h3_data11, h3_data12, h3_data13, h3_d  
ata14, h3_data15, h3_data16, h3_data17, h3_data18, h3_data19, h3_data2, h3_data20, h3_data21, h3_data22, h3_data23, h3_data24, h3_data25, h3_data26, h3_data27, h3_data28, h3_data29, h3_data3, h3_data30, h3_data4, h3_data5, h3_data6, h3_d  
ata7, h3_data8, h3_data9, h4_data1, h4_data10, h4_data11, h4_data12, h4_data13, h4_data14, h4_data15, h4_data16, h4_data17, h4_data18, h4_data19, h4_data2, h4_data20, h4_data21, h4_data22, h4_data23, h4_data24, h4_data25, h4_data26, h4_d  
ata27, h4_data28, h4_data29, h4_data3, h4_data30, h4_data4, h4_data5, h4_data6, h4_data7, h4_data8, h4_data9, h5_data1, h5_data10, h5_data11, h5_data12, h5_data13, h5_data14, h5_data15, h5_data16, h5_data17, h5_data18, h5_data19, h5_data  
2, h5_data20, h5_data21, h5_data22, h5_data23, h5_data24, h5_data25, h5_data26, h5_data27, h5_data28, h5_data29, h5_data3, h5_data30, h5_data4, h5_data5, h5_data6, h5_data7, h5_data8, h5_data9, h6_data1, h6_data10, h6_data11, h6_data12,   
h6_data13, h6_data14, h6_data15, h6_data16, h6_data17, h6_data18, h6_data19, h6_data2, h6_data20, h6_data21, h6_data22, h6_data23, h6_data24, h6_data25, h6_data26, h6_data27, h6_data28, h6_data29, h6_data3, h6_data30, h6_data4, h6_data5,  
 h6_data6, h6_data7, h6_data8, h6_data9, h7_data1, h7_data10, h7_data11, h7_data12, h7_data13, h7_data14, h7_data15, h7_data16, h7_data17, h7_data18, h7_data19, h7_data2, h7_data20, h7_data21, h7_data22, h7_data23, h7_data24, h7_data25,   
h7_data26, h7_data27, h7_data28, h7_data29, h7_data3, h7_data30, h7_data4, h7_data5, h7_data6, h7_data7, h7_data8, h7_data9, h8_data1, h8_data10, h8_data11, h8_data12, h8_data13, h8_data14, h8_data15, h8_data16, h8_data17, h8_data18, h8_  
data19, h8_data2, h8_data20, h8_data21, h8_data22, h8_data23, h8_data24, h8_data25, h8_data26, h8_data27, h8_data28, h8_data29, h8_data3, h8_data30, h8_data4, h8_data5, h8_data6, h8_data7, h8_data8, h8_data9  
   Remote query: SELECT ctid, tableoid, info FROM t_regexp_100billion WHERE (info ~ 'ab2..d[1|f]3c8'::text)  
 Planning time: 0.058 ms  
 Execution time: 4648.638 ms  
(6 rows)  
Time: 4676.919 ms  
相关实践学习
使用PolarDB和ECS搭建门户网站
本场景主要介绍基于PolarDB和ECS实现搭建门户网站。
阿里云数据库产品家族及特性
阿里云智能数据库产品团队一直致力于不断健全产品体系,提升产品性能,打磨产品功能,从而帮助客户实现更加极致的弹性能力、具备更强的扩展能力、并利用云设施进一步降低企业成本。以云原生+分布式为核心技术抓手,打造以自研的在线事务型(OLTP)数据库Polar DB和在线分析型(OLAP)数据库Analytic DB为代表的新一代企业级云原生数据库产品体系, 结合NoSQL数据库、数据库生态工具、云原生智能化数据库管控平台,为阿里巴巴经济体以及各个行业的企业客户和开发者提供从公共云到混合云再到私有云的完整解决方案,提供基于云基础设施进行数据从处理、到存储、再到计算与分析的一体化解决方案。本节课带你了解阿里云数据库产品家族及特性。
目录
相关文章
|
Web App开发 关系型数据库 数据库
用PostgreSQL 做实时高效 搜索引擎 - 全文检索、模糊查询、正则查询、相似查询、ADHOC查询
用PostgreSQL 做实时高效 搜索引擎 - 全文检索、模糊查询、正则查询、相似查询、ADHOC查询作者digoal 日期2017-12-05 标签PostgreSQL , 搜索引擎 , GIN , ranking , high light , 全文检索 , 模糊查询 , 正则查询 , 相似查询 , ADHOC查询 背景字符串搜索是非常常见的业务需求,它包括: 1、前缀+模糊查询。
11626 1
|
7月前
|
SQL 关系型数据库 PostgreSQL
|
关系型数据库 数据库 PostgreSQL
PostgreSQL 百亿数据 秒级响应 正则及模糊查询
正则匹配和模糊匹配通常是搜索引擎的特长,但是如果你使用的是 PostgreSQL 数据库照样能实现,并且性能不赖,加上分布式方案 (譬如 plproxy, pg_shard, fdw shard, pg-xc, pg-xl, greenplum),处理百亿以上数据量的正则匹配和模糊匹配效果杠杠的,.
28982 0
|
关系型数据库 PostgreSQL
postgresql 正则暗坑
--注意pg的正则使用不正会有隐患 postgres=# select * from t; id | name ----+------ 1 | bill 1 | bill ...
875 0
|
关系型数据库 分布式数据库 PolarDB
《阿里云产品手册2022-2023 版》——PolarDB for PostgreSQL
《阿里云产品手册2022-2023 版》——PolarDB for PostgreSQL
374 0
|
存储 缓存 关系型数据库

相关产品

  • 云原生数据库 PolarDB