PostgreSQL 拥有很强的文本搜索能力,除了支持全文检索,还支持模糊查询和正则查询。PostgreSQL 不仅内置了一般数据库都没有的 pg_trgm 插件,还内置了表达式索引和 GIN 索引的功能,为加速各类需求的查询提供了有力条件。
对于正则查询,PostgreSQL 可以通过 pg_trgm 插件来加速查询。模糊查询包括前模糊(有前缀的模糊)、后模糊(有后缀的模糊)和前后模糊(无前后缀的模糊),针对不同的模糊查询需求,PostgreSQL 有不同的优化方法:
注意:索引和查询条件的 collate 必须一致才能使用索引。
test=# drop table test; DROP TABLE test=# create table test(id int, info text); CREATE TABLE test=# insert into test select generate_series(1,1000000),md5(random()::text); INSERT 0 1000000 test=# create index idx on test(info text_pattern_ops); CREATE INDEX test=# explain (analyze,verbose,timing,costs,buffers) select * from test where info like 'abcd%' collate "zh_CN"; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------- Index Scan using idx on public.test (cost=0.42..16.76 rows=100 width=37) (actual time=0.038..0.059 rows=12 loops=1) Output: id, info Index Cond: ((test.info ~>=~ 'abcd'::text) AND (test.info ~<~ 'abce'::text)) Filter: (test.info ~~ 'abcd%'::text COLLATE "zh_CN") Buffers: shared hit=12 read=3 Planning time: 0.253 ms Execution time: 0.081 ms (7 rows) test=# explain (analyze,verbose,timing,costs,buffers) select * from test where info like 'abcd%' collate "C"; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------- Index Scan using idx on public.test (cost=0.42..16.76 rows=100 width=37) (actual time=0.027..0.050 rows=12 loops=1) Output: id, info Index Cond: ((test.info ~>=~ 'abcd'::text) AND (test.info ~<~ 'abce'::text)) Filter: (test.info ~~ 'abcd%'::text COLLATE "C") Buffers: shared hit=15 Planning time: 0.141 ms Execution time: 0.072 ms (7 rows)另外,使用类型对应的 pattern ops 时,索引搜索不仅支持 LIKE 的写法,还支持规则表达式的写法,如下所示: test=# explain (analyze,verbose,timing,costs,buffers) select * from test where info ~ '^abcd'; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------- Index Scan using idx on public.test (cost=0.42..16.76 rows=100 width=37) (actual time=0.031..0.061 rows=12 loops=1) Output: id, info Index Cond: ((test.info ~>=~ 'abcd'::text) AND (test.info ~<~ 'abce'::text)) Filter: (test.info ~ '^abcd'::text) Buffers: shared hit=15 Planning time: 0.213 ms Execution time: 0.083 ms (7 rows)
注意:索引和查询条件的 collate 必须一致才能使用索引。
test=# create index idx1 on test(reverse(info) collate "C"); CREATE INDEX test=# select * from test limit 1; id | info ----+---------------------------------- 1 | b3275976cdd437a033d4329775a52514 (1 row) test=# explain (analyze,verbose,timing,costs,buffers) select * from test where reverse(info) like '4152%' collate "C"; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------- Index Scan using idx1 on public.test (cost=0.42..4009.43 rows=5000 width=37) (actual time=0.061..0.097 rows=18 loops=1) Output: id, info Index Cond: ((reverse(test.info) >= '4152'::text) AND (reverse(test.info) < '4153'::text)) Filter: (reverse(test.info) ~~ '4152%'::text COLLATE "C") Buffers: shared hit=18 read=3 Planning time: 0.128 ms Execution time: 0.122 ms (7 rows) test=# select * from test where reverse(info) like '4152%' collate "C"; id | info --------+---------------------------------- 847904 | abe2ecd90393b5275df8e34a39702514 414702 | 97f66d26545329321164042657d02514 191232 | 7820972c6220c2b01d46c11ebb532514 752742 | 93232ac39c6632e2540df44627c42514 217302 | 39e518893a1a7b1e691619bd1fc42514 1 | b3275976cdd437a033d4329775a52514 615718 | 4948f94c484c13dc6c4fae8a3db52514 308815 | fc2918ceff7c7a4dafd2e04031062514 149521 | 546d963842ea5ca593e622c810262514 811093 | 4b6eca2eb6b665af67b2813e91a62514 209000 | 1dfd0d4e326715c1739f031cca992514 937616 | 8827fd81f5b673fb5afecbe3e11b2514 419553 | bd6e01ce360af16137e8b6abc8ab2514 998324 | 7dff51c19dc5e5d9979163e7d14c2514 771518 | 8a54e30003a48539fff0aedc73ac2514 691566 | f90368348e3b6bf983fcbe10db2d2514 652274 | 8bf4a97b5f122a5540a21fa85ead2514 233437 | 739ed715fc203d47e37e79b5bcbe2514 (18 rows)test=# create index idx1 on test(reverse(info) text_pattern_ops); CREATE INDEX test=# explain (analyze,verbose,timing,costs,buffers) select * from test where reverse(info) like '4152%'; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------- Index Scan using idx1 on public.test (cost=0.42..4009.43 rows=5000 width=37) (actual time=0.026..0.049 rows=12 loops=1) Output: id, info Index Cond: ((reverse(test.info) ~>=~ '4152'::text) AND (reverse(test.info) ~<~ '4153'::text)) Filter: (reverse(test.info) ~~ '4152%'::text) Buffers: shared hit=15 Planning time: 0.102 ms Execution time: 0.072 ms (7 rows) test=# explain (analyze,verbose,timing,costs,buffers) select * from test where reverse(info) ~ '^4152'; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------- Index Scan using idx1 on public.test (cost=0.42..4009.43 rows=5000 width=37) (actual time=0.031..0.063 rows=12 loops=1) Output: id, info Index Cond: ((reverse(test.info) ~>=~ '4152'::text) AND (reverse(test.info) ~<~ '4153'::text)) Filter: (reverse(test.info) ~ '^4152'::text) Buffers: shared hit=15 Planning time: 0.148 ms Execution time: 0.087 ms (7 rows)
注意:索引和查询条件的 collate 必须一致才能使用索引。
版权声明:本文内容由阿里云实名注册用户自发贡献,版权归原作者所有,阿里云开发者社区不拥有其著作权,亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容,填写侵权投诉表单进行举报,一经查实,本社区将立刻删除涉嫌侵权内容。