标签
PostgreSQL , bloom filter , bloom filter index
背景
凡是支持HASH函数,以及相等operator的类型,都可以使用bloom filter index .
扩展方法见本文。
原文
https://obartunov.livejournal.com/201027.html
Bloom index by default works for int4 and text, but other types with hash function and equality operator could be supported.
Just use opclass interface, for example, for type bigint
create extension bloom;
postgres=# select * from pg_opclass where opcname='bigint_ops';
opcmethod | opcname | opcnamespace | opcowner | opcfamily | opcintype | opcdefault | opckeytype
-----------+---------+--------------+----------+-----------+-----------+------------+------------
(0 rows)
创建bigint的bloom filter index支持。
CREATE OPERATOR CLASS bigint_ops
DEFAULT FOR TYPE bigint USING bloom AS
OPERATOR 1 = (bigint, bigint),
FUNCTION 1 hashint8(bigint);
postgres=# select * from pg_opclass where opcname='bigint_ops';
opcmethod | opcname | opcnamespace | opcowner | opcfamily | opcintype | opcdefault | opckeytype
-----------+------------+--------------+----------+-----------+-----------+------------+------------
136065 | bigint_ops | 2200 | 10 | 136074 | 20 | t | 0
(1 row)
Now, you can build bloom index for bigint data type.
Data types, which could be supported by bloom index.
查询可以支持bloom filter的类型,HASH函数
凡是支持HASH函数,以及相等operator操作符的类型,都可以使用bloom filter index .
postgres=# select oid,* from pg_am;
oid | amname | amhandler | amtype
--------+--------+-------------+--------
403 | btree | bthandler | i
405 | hash | hashhandler | i
783 | gist | gisthandler | i
2742 | gin | ginhandler | i
4000 | spgist | spghandler | i
3580 | brin | brinhandler | i
18204 | rum | rumhandler | i
136050 | bloom | blhandler | i
(8 rows)
查询可以支持bloom filter的类型,HASH函数
SELECT oc.opcintype::regtype, p.amproc FROM pg_opclass oc
JOIN pg_amproc p ON p.amprocfamily = oc.opcfamily
WHERE oc.opcmethod = 405 -- hash am
AND oc.opcdefault -- 默认proc for this am
-- https://www.postgresql.org/docs/devel/static/xindex.html Strategies number
AND p.amprocnum = 1
AND p.amproclefttype = oc.opcintype
AND p.amprocrighttype = oc.opcintype;
opcintype | amproc
-----------------------------+----------------
character | hashbpchar
"char" | hashchar
date | hashint4
anyarray | hash_array
real | hashfloat4
double precision | hashfloat8
inet | hashinet
smallint | hashint2
integer | hashint4
bigint | hashint8
interval | interval_hash
macaddr | hashmacaddr
name | hashname
oid | hashoid
oidvector | hashoidvector
text | hashtext
time without time zone | time_hash
numeric | hash_numeric
timestamp with time zone | timestamp_hash
time with time zone | timetz_hash
timestamp without time zone | timestamp_hash
boolean | hashchar
bytea | hashvarlena
xid | hashint4
cid | hashint4
abstime | hashint4
reltime | hashint4
aclitem | hash_aclitem
uuid | uuid_hash
pg_lsn | pg_lsn_hash
macaddr8 | hashmacaddr8
anyenum | hashenum
anyrange | hash_range
jsonb | jsonb_hash
(34 rows)
创建索引,例子
postgres=# create table test(id int, c1 int8, c2 int8, c3 int8);
CREATE TABLE
postgres=# create index idx_test_1 on test using bloom (c1,c2,c3);
CREATE INDEX
postgres=# set enable_seqscan=off;
SET
postgres=# explain select * from test where c1=1::int8 and c2=1::int8 and c3=1::int8;
QUERY PLAN
----------------------------------------------------------------------------------------
Bitmap Heap Scan on test (cost=15.73..15.75 rows=1 width=28)
Recheck Cond: ((c1 = '1'::bigint) AND (c2 = '1'::bigint) AND (c3 = '1'::bigint))
-> Bitmap Index Scan on idx_test_1 (cost=0.00..15.73 rows=1 width=0)
Index Cond: ((c1 = '1'::bigint) AND (c2 = '1'::bigint) AND (c3 = '1'::bigint))
(4 rows)
postgres=# set enable_seqscan =on;
SET
postgres=# explain select * from test where c1=1::int8 and c2=1::int8 and c3=1::int8;
QUERY PLAN
------------------------------------------------------------------------------
Seq Scan on test (cost=0.00..9.85 rows=1 width=28)
Filter: ((c1 = '1'::bigint) AND (c2 = '1'::bigint) AND (c3 = '1'::bigint))
(2 rows)