标签
PostgreSQL , 多维 , 图像 , cube , imgsmlr , 压缩 , 分段 , 异步并行 , 欧式距离
背景
向量相似搜索越来越流行,PostgreSQL imgsmlr,cube插件分别应用于向量类型的存储,以及向量距离排序输出搜索。
imgsmlr插件的signature类型为64字节的数据类型,存储16个FLOAT4的浮点数向量(实际上是图像特征值向量)。可用于向量搜索。
cube,存储的是float8浮点数向量,最多存储100个维度。可应用于更高维度的搜索。
下面是一些例子。
《PostgreSQL 11 相似图像搜索插件 imgsmlr 性能测试与优化 3 - citus 8机128shard (4亿图像)》
《PostgreSQL 11 相似图像搜索插件 imgsmlr 性能测试与优化 2 - 单机分区表 (dblink 异步调用并行) (4亿图像)》
《PostgreSQL 11 相似图像搜索插件 imgsmlr 性能测试与优化 1 - 单机单表 (4亿图像)》
《PostgreSQL 相似搜索插件介绍大汇总 (cube,rum,pg_trgm,smlar,imgsmlr,pg_similarity) (rum,gin,gist)》
《PostgreSQL 多维空间几何对象 相交、包含 高效率检索实践 - cube》
《PostgreSQL 相似人群圈选,人群扩选,向量相似 使用实践 - cube》
由于CUBE使用的是FLOAT8存储,所以空间消耗比IMGSMLR插件翻倍。另一方面,如果向量维度特别多,即使使用GIST索引,搜索效率也会下降(特别是数据非常凌乱时)。
如何优化?
1、你可以压缩向量,比如压缩到16个FLOAT4,那么就可以使用signature存储。(图像就是这么做的,实际上imgsmlr插件中还有一个pattern类型,有16KB大小,我们根据signature向量相似排序后,可以再根据pattern精确排序)。
2、你可以把多个向量,拆分成若干组(例如每16个一组),每个组里面取TOP N(top n可以任意指定,比如每个组取100条),然后再计算所有维度的欧式距离。从而得到精确排序。
优化例子
使用分组的方法。
1、创建插件
create extension imgsmlr;
create extension cube;
2、创建测试函数,生成若干个元素的向量。
create or replace function gen_rand_sig (
int, -- 向量值元素取值范围
int -- 向量个数
) returns text as $$
select string_agg((random()*$1)::int::text,',') from generate_series(1,$2);
$$ language sql strict;
查询结果举例
postgres=# select gen_rand_sig(10000,16);
gen_rand_sig
--------------------------------------------------------------------------------
7850,5374,2172,304,8818,2998,2770,6007,5231,7403,5174,3458,8077,6957,9327,5752
(1 row)
3、创建测试表,一个80维的向量,拆分成5组,每组16个FLOAT4。
create table test_sig (
id int primary key, -- 主键
c1 text, -- 第一组,16个float4,逗号分隔
c2 text,
c3 text,
c4 text,
c5 text -- 第五组
);
插入100万行数据
insert into test_sig select id,
gen_rand_sig(10000,16),
gen_rand_sig(10000,16),
gen_rand_sig(10000,16),
gen_rand_sig(10000,16),
gen_rand_sig(10000,16)
from generate_series(1,1000000) t(id); -- 100万记录
示例数据
postgres=# \x
Expanded display is on.
postgres=# select * from test_sig limit 2;
-[ RECORD 1 ]-----------------------------------------------------------------------
id | 1
c1 | 9239,1446,880,6788,7451,3268,8408,2867,6087,4559,2866,7627,5542,8344,4197,3393
c2 | 3717,6369,3697,2535,9367,6466,8542,4598,3869,3716,8056,1947,673,7384,7699,9912
c3 | 8830,8579,6699,6281,1847,5107,9147,7934,9666,2014,5561,5209,357,9758,8601,4075
c4 | 6127,2298,6610,5494,8764,5152,93,2634,8868,8149,4580,9542,5532,2279,9454,4362
c5 | 858,6153,643,2705,1260,9790,639,927,1804,6200,6135,2161,5958,4737,6236,2085
-[ RECORD 2 ]-----------------------------------------------------------------------
id | 2
c1 | 7035,2846,7579,5799,7998,7671,8433,6866,5820,3013,6408,1352,5292,5862,5714,6150
c2 | 2015,6357,8855,3275,6147,9494,4201,7951,5694,337,112,1651,5073,6347,3736,2108
c3 | 9193,1315,7907,7191,8986,6340,4057,4806,9353,465,6159,4646,6327,1873,796,8342
c4 | 8230,9651,1616,4377,9145,5818,2328,4839,6154,2440,6490,1228,8787,226,3336,7980
c5 | 1541,1243,5171,527,7583,9229,5333,6936,9694,1491,1581,6021,3364,2377,4362,1594
4、创建索引
创建索引(使用imgsmlr signature类型)
create index idx_test_sig_1 on test_sig using gist ((('('||c1||')')::signature));
create index idx_test_sig_2 on test_sig using gist ((('('||c2||')')::signature));
create index idx_test_sig_3 on test_sig using gist ((('('||c3||')')::signature));
create index idx_test_sig_4 on test_sig using gist ((('('||c4||')')::signature));
create index idx_test_sig_5 on test_sig using gist ((('('||c5||')')::signature));
创建索引(使用cube类型)(可选,除非你没有imgsmlr插件,否则建议使用imgsmlr,因为只需要64字节)
/*
create index idx_test_sig_1 on test_sig using gist ((('('||c1||')')::cube));
create index idx_test_sig_2 on test_sig using gist ((('('||c2||')')::cube));
create index idx_test_sig_3 on test_sig using gist ((('('||c3||')')::cube));
create index idx_test_sig_4 on test_sig using gist ((('('||c4||')')::cube));
create index idx_test_sig_5 on test_sig using gist ((('('||c5||')')::cube));
*/
创建80个维度的大索引(用于对比性能)
create index idx_test_sig_6 on test_sig using gist ((('('||c1||','||c2||','||c3||','||c4||','||c5||')')::cube));
5、生成5组测试向量,用于如下测试
postgres=# select gen_rand_sig(10000,16);
gen_rand_sig
-------------------------------------------------------------------------------
4010,8018,7566,9150,5278,7782,5188,7960,98,5960,9605,6981,6882,9398,2356,8905
(1 row)
postgres=# select gen_rand_sig(10000,16);
gen_rand_sig
------------------------------------------------------------------------------
4811,2484,345,2856,5851,4647,5846,246,1029,5395,5519,1372,966,2449,1414,4976
(1 row)
postgres=# select gen_rand_sig(10000,16);
gen_rand_sig
--------------------------------------------------------------------------------
467,8980,4127,5745,6762,9314,3705,6860,5274,3311,3842,2156,2708,6197,1061,7519
(1 row)
postgres=# select gen_rand_sig(10000,16);
gen_rand_sig
-------------------------------------------------------------------------------
8682,1406,375,4533,6053,6221,4778,7083,1616,297,8454,2581,2746,9869,7558,3213
(1 row)
postgres=# select gen_rand_sig(10000,16);
gen_rand_sig
--------------------------------------------------------------------------------
8849,1684,8958,5611,998,2663,2471,6273,5974,6312,8429,8682,2510,9490,6201,1191
(1 row)
6、查询语句如下,每一组返回欧式距离最小的100条,然后再计算80个维度,最后选出1条欧式距离最小的。
如果使用的是cube类型,写法如下
with
a as (select id from test_sig order by (('('||c1||')')::cube) <-> cube '(4010,8018,7566,9150,5278,7782,5188,7960,98,5960,9605,6981,6882,9398,2356,8905)' limit 100),
b as (select id from test_sig order by (('('||c2||')')::cube) <-> cube '(4811,2484,345,2856,5851,4647,5846,246,1029,5395,5519,1372,966,2449,1414,4976)' limit 100),
c as (select id from test_sig order by (('('||c3||')')::cube) <-> cube '(467,8980,4127,5745,6762,9314,3705,6860,5274,3311,3842,2156,2708,6197,1061,7519)' limit 100),
d as (select id from test_sig order by (('('||c4||')')::cube) <-> cube '(8682,1406,375,4533,6053,6221,4778,7083,1616,297,8454,2581,2746,9869,7558,3213)' limit 100),
e as (select id from test_sig order by (('('||c5||')')::cube) <-> cube '(8849,1684,8958,5611,998,2663,2471,6273,5974,6312,8429,8682,2510,9490,6201,1191)' limit 100)
select id, (('('||c1||','||c2||','||c3||','||c4||','||c5||')')::cube) sig from test_sig where id = any (array(
select id from a
union all
select id from b
union all
select id from c
union all
select id from d
union all
select id from e
)) order by (('('||c1||','||c2||','||c3||','||c4||','||c5||')')::cube) <->
cube '(4010,8018,7566,9150,5278,7782,5188,7960,98,5960,9605,6981,6882,9398,2356,8905,4811,2484,345,2856,5851,4647,5846,246,1029,5395,5519,1372,966,2449,1414,4976,467,8980,4127,5745,6762,9314,3705,6860,5274,3311,3842,2156,2708,6197,1061,7519,8682,1406,375,4533,6053,6221,4778,7083,1616,297,8454,2581,2746,9869,7558,3213,8849,1684,8958,5611,998,2663,2471,6273,5974,6312,8429,8682,2510,9490,6201,1191)'
limit 1;
如果使用的是imgsmlr signature类型,写法如下
with
a as (select id from test_sig order by (('('||c1||')')::signature) <-> signature '(4010,8018,7566,9150,5278,7782,5188,7960,98,5960,9605,6981,6882,9398,2356,8905)' limit 100),
b as (select id from test_sig order by (('('||c2||')')::signature) <-> signature '(4811,2484,345,2856,5851,4647,5846,246,1029,5395,5519,1372,966,2449,1414,4976)' limit 100),
c as (select id from test_sig order by (('('||c3||')')::signature) <-> signature '(467,8980,4127,5745,6762,9314,3705,6860,5274,3311,3842,2156,2708,6197,1061,7519)' limit 100),
d as (select id from test_sig order by (('('||c4||')')::signature) <-> signature '(8682,1406,375,4533,6053,6221,4778,7083,1616,297,8454,2581,2746,9869,7558,3213)' limit 100),
e as (select id from test_sig order by (('('||c5||')')::signature) <-> signature '(8849,1684,8958,5611,998,2663,2471,6273,5974,6312,8429,8682,2510,9490,6201,1191)' limit 100)
select id, (('('||c1||','||c2||','||c3||','||c4||','||c5||')')::cube) sig from test_sig where id = any (array(
select id from a
union all
select id from b
union all
select id from c
union all
select id from d
union all
select id from e
)) order by (('('||c1||','||c2||','||c3||','||c4||','||c5||')')::cube) <->
cube '(4010,8018,7566,9150,5278,7782,5188,7960,98,5960,9605,6981,6882,9398,2356,8905,4811,2484,345,2856,5851,4647,5846,246,1029,5395,5519,1372,966,2449,1414,4976,467,8980,4127,5745,6762,9314,3705,6860,5274,3311,3842,2156,2708,6197,1061,7519,8682,1406,375,4533,6053,6221,4778,7083,1616,297,8454,2581,2746,9869,7558,3213,8849,1684,8958,5611,998,2663,2471,6273,5974,6312,8429,8682,2510,9490,6201,1191)'
limit 1;
以上CTE语法,每一条SQL都走索引,但是串行执行。为了解决串行问题,使用dblink异步调用。
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=22.12..22.18 rows=1 width=44)
CTE a
-> Limit (cost=0.36..3.53 rows=100 width=8)
-> Index Scan using idx_test_sig_1 on test_sig test_sig_1 (cost=0.36..31722.96 rows=999994 width=8)
Order By: (((('('::text || c1) || ')'::text))::signature <-> '(4010.000000, 8018.000000, 7566.000000, 9150.000000, 5278.000000, 7782.000000, 5188.000000, 7960.000000, 98.000000, 5960.000000, 9605.000000, 6981.000000, 68
82.000000, 9398.000000, 2356.000000, 8905.000000)'::signature)
CTE b
-> Limit (cost=0.36..3.53 rows=100 width=8)
-> Index Scan using idx_test_sig_2 on test_sig test_sig_2 (cost=0.36..31754.59 rows=999994 width=8)
Order By: (((('('::text || c2) || ')'::text))::signature <-> '(4811.000000, 2484.000000, 345.000000, 2856.000000, 5851.000000, 4647.000000, 5846.000000, 246.000000, 1029.000000, 5395.000000, 5519.000000, 1372.000000, 96
6.000000, 2449.000000, 1414.000000, 4976.000000)'::signature)
CTE c
-> Limit (cost=0.36..3.52 rows=100 width=8)
-> Index Scan using idx_test_sig_3 on test_sig test_sig_3 (cost=0.36..31603.68 rows=999994 width=8)
Order By: (((('('::text || c3) || ')'::text))::signature <-> '(467.000000, 8980.000000, 4127.000000, 5745.000000, 6762.000000, 9314.000000, 3705.000000, 6860.000000, 5274.000000, 3311.000000, 3842.000000, 2156.000000, 2
708.000000, 6197.000000, 1061.000000, 7519.000000)'::signature)
CTE d
-> Limit (cost=0.36..3.53 rows=100 width=8)
-> Index Scan using idx_test_sig_4 on test_sig test_sig_4 (cost=0.36..31746.30 rows=999994 width=8)
Order By: (((('('::text || c4) || ')'::text))::signature <-> '(8682.000000, 1406.000000, 375.000000, 4533.000000, 6053.000000, 6221.000000, 4778.000000, 7083.000000, 1616.000000, 297.000000, 8454.000000, 2581.000000, 27
46.000000, 9869.000000, 7558.000000, 3213.000000)'::signature)
CTE e
-> Limit (cost=0.36..3.53 rows=100 width=8)
-> Index Scan using idx_test_sig_5 on test_sig test_sig_5 (cost=0.36..31757.63 rows=999994 width=8)
Order By: (((('('::text || c5) || ')'::text))::signature <-> '(8849.000000, 1684.000000, 8958.000000, 5611.000000, 998.000000, 2663.000000, 2471.000000, 6273.000000, 5974.000000, 6312.000000, 8429.000000, 8682.000000, 2
510.000000, 9490.000000, 6201.000000, 1191.000000)'::signature)
InitPlan 6 (returns $5)
-> Append (cost=0.00..0.24 rows=500 width=4)
-> CTE Scan on a (cost=0.00..0.04 rows=100 width=4)
-> CTE Scan on b (cost=0.00..0.04 rows=100 width=4)
-> CTE Scan on c (cost=0.00..0.04 rows=100 width=4)
-> CTE Scan on d (cost=0.00..0.04 rows=100 width=4)
-> CTE Scan on e (cost=0.00..0.04 rows=100 width=4)
-> Result (cost=4.25..4.82 rows=10 width=44)
-> Sort (cost=4.25..4.27 rows=10 width=402)
Sort Key: ((((((((((((('('::text || test_sig.c1) || ','::text) || test_sig.c2) || ','::text) || test_sig.c3) || ','::text) || test_sig.c4) || ','::text) || test_sig.c5) || ')'::text))::cube <-> '(4010, 8018, 7566, 9150, 5
278, 7782, 5188, 7960, 98, 5960, 9605, 6981, 6882, 9398, 2356, 8905, 4811, 2484, 345, 2856, 5851, 4647, 5846, 246, 1029, 5395, 5519, 1372, 966, 2449, 1414, 4976, 467, 8980, 4127, 5745, 6762, 9314, 3705, 6860, 5274, 3311, 3842, 2156, 270
8, 6197, 1061, 7519, 8682, 1406, 375, 4533, 6053, 6221, 4778, 7083, 1616, 297, 8454, 2581, 2746, 9869, 7558, 3213, 8849, 1684, 8958, 5611, 998, 2663, 2471, 6273, 5974, 6312, 8429, 8682, 2510, 9490, 6201, 1191)'::cube))
-> Index Scan using test_sig_pkey on test_sig (cost=0.37..4.21 rows=10 width=402)
Index Cond: (id = ANY ($5))
(33 rows)
Time: 1.549 ms
CTE 分5组查询性能
-[ RECORD 1 ]-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
id | 747179
sig | (2078, 3409, 9574, 9870, 3416, 1144, 494, 7580, 582, 4150, 4434, 5115, 4206, 8239, 2058, 9233, 6658, 1164, 1592, 4742, 5276, 2063, 3176, 2736, 2019, 9209, 5868, 2924, 1246, 3781, 8324, 3324, 7191, 7898, 3194, 606, 9042, 3688, 8187, 9625, 7838, 2621, 4740, 2044, 860, 6798, 1277, 7519, 7962, 2869, 2261, 3238, 4932, 5436, 5974, 6951, 4645, 1841, 9875, 5891, 5623, 8199, 9215, 2813, 6097, 2409, 3420, 5140, 6097, 1607, 4764, 3935, 4228, 9504, 5979, 5088, 6302, 7256, 2607, 4265)
Time: 1634.734 ms (00:01.635)
80维直接查询 - 索引性能
--
80维直接查询 - 全表扫描性能
set enable_indexscan=off;
set enable_bitmapscan=off;
select * from test_sig order by (('('||c1||','||c2||','||c3||','||c4||','||c5||')')::cube) <->
cube '(4010,8018,7566,9150,5278,7782,5188,7960,98,5960,9605,6981,6882,9398,2356,8905,4811,2484,345,2856,5851,4647,5846,246,1029,5395,5519,1372,966,2449,1414,4976,467,8980,4127,5745,6762,9314,3705,6860,5274,3311,3842,2156,2708,6197,1061,7519,8682,1406,375,4533,6053,6221,4778,7083,1616,297,8454,2581,2746,9869,7558,3213,8849,1684,8958,5611,998,2663,2471,6273,5974,6312,8429,8682,2510,9490,6201,1191)'
limit 1;
-[ RECORD 1 ]-----------------------------------------------------------------------
id | 747179
c1 | 2078,3409,9574,9870,3416,1144,494,7580,582,4150,4434,5115,4206,8239,2058,9233
c2 | 6658,1164,1592,4742,5276,2063,3176,2736,2019,9209,5868,2924,1246,3781,8324,3324
c3 | 7191,7898,3194,606,9042,3688,8187,9625,7838,2621,4740,2044,860,6798,1277,7519
c4 | 7962,2869,2261,3238,4932,5436,5974,6951,4645,1841,9875,5891,5623,8199,9215,2813
c5 | 6097,2409,3420,5140,6097,1607,4764,3935,4228,9504,5979,5088,6302,7256,2607,4265
Time: 14548.096 ms (00:14.548)
5组异步并行调用加速
1、创建dblink插件
create extension if not exists dblink;
2、创建一个建立连接函数,不报错
create or replace function conn(
name, -- dblink名字
text -- 连接串,URL
) returns void as $$
declare
begin
perform dblink_connect($1, $2);
return;
exception when others then
return;
end;
$$ language plpgsql strict;
3、创建并行查询函数
如果使用的是imgsmlr signature类型,那么函数如下:
create or replace function parallel_img_search(
vc1 text, -- 第1组向量
vc2 text, -- 第2组向量
vc3 text, -- 第3组向量
vc4 text, -- 第4组向量
vc5 text, -- 第5组向量
lmt1 int, -- 每一个维度返回TOP N条 欧式距离最小的记录
lmt2 int, -- 最终返回N条 所有维度上 欧式距离最小的记录
conn text default format('hostaddr=%s port=%s user=%s dbname=%s application_name=', '127.0.0.1', current_setting('port'), current_user, current_database()) -- dblink连接
)
returns setof record as
$$
declare
app_prefix text := 'abc';
sql text[];
vsql text;
i int :=1 ;
ids int[];
ids1 int[] := '{}'::int[];
ts1 timestamp;
begin
sql := array[
format(
$_$
select id from test_sig order by (('('||c1||')')::signature) <-> signature '(%s)' limit %s
$_$,vc1,lmt1),
format(
$_$
select id from test_sig order by (('('||c2||')')::signature) <-> signature '(%s)' limit %s
$_$,vc2,lmt1),
format(
$_$
select id from test_sig order by (('('||c3||')')::signature) <-> signature '(%s)' limit %s
$_$,vc3,lmt1),
format(
$_$
select id from test_sig order by (('('||c4||')')::signature) <-> signature '(%s)' limit %s
$_$,vc4,lmt1),
format(
$_$
select id from test_sig order by (('('||c5||')')::signature) <-> signature '(%s)' limit %s
$_$,vc5,lmt1)];
foreach vsql in array sql loop
perform conn(app_prefix||i, conn||app_prefix||i);
perform id from dblink_get_result(app_prefix||i, false) as t(id int);
perform dblink_send_query(app_prefix||i, vsql);
i := i+1;
end loop;
ts1 := clock_timestamp();
for i in 1..5 loop
select array_agg(id) into ids from dblink_get_result(app_prefix||i, false) as t(id int);
ids1 := ids1||ids;
end loop;
raise notice '%', clock_timestamp()-ts1;
return query execute
format($_$
select * from test_sig where id = any (%L) order by (('('||c1||','||c2||','||c3||','||c4||','||c5||')')::cube) <->
cube '(%s,%s,%s,%s,%s)'
limit %s $_$,ids1,vc1,vc2,vc3,vc4,vc5,lmt2);
return;
end;
$$ language plpgsql strict;
如果使用的是cube类型,那么函数如下:
create or replace function parallel_img_search(
vc1 text, -- 第1组向量
vc2 text, -- 第2组向量
vc3 text, -- 第3组向量
vc4 text, -- 第4组向量
vc5 text, -- 第5组向量
lmt1 int, -- 每一个维度返回TOP N条 欧式距离最小的记录
lmt2 int, -- 最终返回N条 所有维度上 欧式距离最小的记录
conn text default format('hostaddr=%s port=%s user=%s dbname=%s application_name=', '127.0.0.1', current_setting('port'), current_user, current_database()) -- dblink连接
)
returns setof record as
$$
declare
app_prefix text := 'abc';
sql text[];
vsql text;
i int :=1 ;
ids int[];
ids1 int[] := '{}'::int[];
ts1 timestamp;
begin
sql := array[
format(
$_$
select id from test_sig order by (('('||c1||')')::cube) <-> cube '(%s)' limit %s
$_$,vc1,lmt1),
format(
$_$
select id from test_sig order by (('('||c2||')')::cube) <-> cube '(%s)' limit %s
$_$,vc2,lmt1),
format(
$_$
select id from test_sig order by (('('||c3||')')::cube) <-> cube '(%s)' limit %s
$_$,vc3,lmt1),
format(
$_$
select id from test_sig order by (('('||c4||')')::cube) <-> cube '(%s)' limit %s
$_$,vc4,lmt1),
format(
$_$
select id from test_sig order by (('('||c5||')')::cube) <-> cube '(%s)' limit %s
$_$,vc5,lmt1)];
foreach vsql in array sql loop
perform conn(app_prefix||i, conn||app_prefix||i);
perform id from dblink_get_result(app_prefix||i, false) as t(id int);
perform dblink_send_query(app_prefix||i, vsql);
i := i+1;
end loop;
ts1 := clock_timestamp();
for i in 1..5 loop
select array_agg(id) into ids from dblink_get_result(app_prefix||i, false) as t(id int);
ids1 := ids1||ids;
end loop;
raise notice '%', clock_timestamp()-ts1;
return query execute
format($_$
select * from test_sig where id = any (%L) order by (('('||c1||','||c2||','||c3||','||c4||','||c5||')')::cube) <->
cube '(%s,%s,%s,%s,%s)'
limit %s $_$,ids1,vc1,vc2,vc3,vc4,vc5,lmt2);
return;
end;
$$ language plpgsql strict;
4、性能测试
select * from parallel_img_search(
'4010,8018,7566,9150,5278,7782,5188,7960,98,5960,9605,6981,6882,9398,2356,8905',
'4811,2484,345,2856,5851,4647,5846,246,1029,5395,5519,1372,966,2449,1414,4976',
'467,8980,4127,5745,6762,9314,3705,6860,5274,3311,3842,2156,2708,6197,1061,7519',
'8682,1406,375,4533,6053,6221,4778,7083,1616,297,8454,2581,2746,9869,7558,3213',
'8849,1684,8958,5611,998,2663,2471,6273,5974,6312,8429,8682,2510,9490,6201,1191',
100,
1
)
as t (id int, c1 text, c2 text, c3 text, c4 text, c5 text);
-[ RECORD 1 ]-----------------------------------------------------------------------
id | 747179
c1 | 2078,3409,9574,9870,3416,1144,494,7580,582,4150,4434,5115,4206,8239,2058,9233
c2 | 6658,1164,1592,4742,5276,2063,3176,2736,2019,9209,5868,2924,1246,3781,8324,3324
c3 | 7191,7898,3194,606,9042,3688,8187,9625,7838,2621,4740,2044,860,6798,1277,7519
c4 | 7962,2869,2261,3238,4932,5436,5974,6951,4645,1841,9875,5891,5623,8199,9215,2813
c5 | 6097,2409,3420,5140,6097,1607,4764,3935,4228,9504,5979,5088,6302,7256,2607,4265
Time: 356.775 ms
性能对比
查询方法 | 速度 |
---|---|
全表扫描 | 14.5 秒 |
80维索引加速查询 | - |
CTE 分组(每组16维)查询(索引加速) | 1.6 秒 |
异步调用 分组(每组16维)查询(索引加速) | 356 毫秒 |
其他优化方法
1、复合索引
如果查询除了有向量距离,还有其他查询条件,可以建立组合索引。
create extension btree_gist;
2、降维,类似图层
前面已提及,可以做压缩降维,在少量维度上计算得到TOP N条,然后再精细排序。
3、拆表,使用 dblink异步并行
《PostgreSQL 11 相似图像搜索插件 imgsmlr 性能测试与优化 2 - 单机分区表 (dblink 异步调用并行) (4亿图像)》
4、使用citus 插件 sharding
《PostgreSQL 11 相似图像搜索插件 imgsmlr 性能测试与优化 3 - citus 8机128shard (4亿图像)》
参考
https://github.com/postgrespro/imgsmlr
《PostgreSQL 11 相似图像搜索插件 imgsmlr 性能测试与优化 3 - citus 8机128shard (4亿图像)》
《PostgreSQL 11 相似图像搜索插件 imgsmlr 性能测试与优化 2 - 单机分区表 (dblink 异步调用并行) (4亿图像)》
《PostgreSQL 11 相似图像搜索插件 imgsmlr 性能测试与优化 1 - 单机单表 (4亿图像)》
《PostgreSQL 相似搜索插件介绍大汇总 (cube,rum,pg_trgm,smlar,imgsmlr,pg_similarity) (rum,gin,gist)》