Postgis long lat geometry distance search tuning using gist knn function

简介:
空间数据中对临近点的检索使用非常常见, 例如以经纬度为坐标点, 检索离这个点1公里范围内的其他点的信息.
最近有网友问到这样的问题.
本文将以Postgis为例, 举一个简单的例子, 利用gist 索引加速检索.
测试表 :
create table cust_jw
(
 dz varchar(300),
 jwd geometry
);

测试数据, 取自经纬度信息网站.
insert into cust_jw values ('杭州', ST_Transform(ST_GeomFromText('POINT(120.19 30.26)', 4326), 2163));
insert into cust_jw values ('北京', ST_Transform(ST_GeomFromText('POINT(116.46 39.92)', 4326), 2163));
insert into cust_jw values ('南京', ST_Transform(ST_GeomFromText('POINT(118.78 32.04)', 4326), 2163));
insert into cust_jw values ('南宁', ST_Transform(ST_GeomFromText('POINT(108.33 22.84)', 4326), 2163));
insert into cust_jw values ('贵阳', ST_Transform(ST_GeomFromText('POINT(106.71 26.57)', 4326), 2163));
insert into cust_jw values ('南昌', ST_Transform(ST_GeomFromText('POINT(115.89 28.68)', 4326), 2163));
insert into cust_jw values ('余杭', ST_Transform(ST_GeomFromText('POINT(120.3 30.43)', 4326), 2163));

创建gist索引 : 
create index idx_cust_jw_1 on cust_jw using gist(jwd);

这个索引方法支持包含<->两个几何类型的距离排序和&&两个几何类型相交.
详见pg_amop , pg_am, pg_operator, pg_opfamily等系统表.

以下SQL查出北京到杭州的直线距离, 单位米 :
SELECT ST_Distance(
ST_Transform(ST_GeomFromText('POINT(120.19 30.26)', 4326), 2163),
ST_Transform(ST_GeomFromText('POINT(116.46 39.92)', 4326), 2163)
);
   st_distance    
------------------
 1256521.71432098
(1 row)

以下SQL 查出表中距离 ST_Transform(ST_GeomFromText('POINT(120.19 30.26)', 4326), 2163)这个点20公里的坐标.
函数使用方法参考postgis手册.
digoal=# select *,ST_Distance(jwd, ST_Transform(ST_GeomFromText('POINT(120.19 30.26)', 4326), 2163)) from cust_jw where jwd && ST_Buffer(ST_Transform(ST_GeomFromText('POINT(120.19 30.26)', 4326), 2163), 20000, 10);
  dz  |                        jwd                         |   st_distance    
------+----------------------------------------------------+------------------
 杭州 | 0101000020730800004C94087D5D4F54C173AA7759E8FB5D41 |                0
 余杭 | 0101000020730800000E6E5A20494854C121FC688DA9EF5D41 | 14483.9823187612
(2 rows)
Time: 1.335 ms

前面已经说了, 这个索引访问方法支持&&操作符, <->操作符.
digoal=# explain select *,ST_Distance(jwd, ST_Transform(ST_GeomFromText('POINT(120.19 30.26)', 4326), 2163)) from cust_jw where jwd && ST_Buffer(ST_Transform(ST_GeomFromText('POINT(120.19 30.26)', 4326), 2163), 20000, 10);                                                                                                         
                          QUERY PLAN                                                                                                
                                                                                                                                    
                                                                                                                                    
                                                                                                                                    
                                                                                                                                    
                                                               
------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------
 Index Scan using idx_cust_jw_1 on cust_jw  (cost=0.14..3.41 rows=1 width=548)
   Index Cond: (jwd && '01030000207308000001000000290000004C94087DD53B54C173AA7759E8FB5D411122F50B133C54C154295A2DDAF85D41D751B134CA
3C54C1F4F2B643DFF55D41B6BBAE74F63D54C10FB6A0650AF35D41CDDC4767903F54C1D331586C6DF05D4124855AF48D4154C14B9BC9D018EE5D41AC1BE98FE24354
C1F4F2B6431BEC5D41E89F31897F4654C1DDD11D5181EA5D41CDDC4767544954C1FE67201155E95D412D13EB504F4C54C1383864E89DE85D414C94087D5D4F54C173
AA775960E85D416B1526A96B5254C1383864E89DE85D41CB4BC992665554C1FE67201155E95D41B088DF703B5854C1DDD11D5181EA5D41EC0C286AD85A54C1F4F2B6
431BEC5D4174A3B6052D5D54C14B9BC9D018EE5D41CB4BC9922A5F54C1D331586C6DF05D41E26C6285C46054C10FB6A0650AF35D41C1D65FC5F06154C1F4F2B643DF
F55D4187061CEEA76254C154295A2DDAF85D414C94087DE56254C173AA7759E8FB5D4187061CEEA76254C1922B9585F6FE5D41C1D65FC5F06154C1F261386FF1015E
41E26C6285C46054C1D79E4E4DC6045E41CB4BC9922A5F54C11323974663075E4174A3B6052D5D54C19BB925E2B7095E41EC0C286AD85A54C1F261386FB50B5E41B0
88DF703B5854C10983D1614F0D5E41CB4BC992665554C1E8ECCEA17B0E5E416B1526A96B5254C1AE1C8BCA320F5E414C94087D5D4F54C173AA7759700F5E412D13EB
504F4C54C1AE1C8BCA320F5E41CDDC4767544954C1E8ECCEA17B0E5E41E89F31897F4654C10983D1614F0D5E41AC1BE98FE24354C1F261386FB50B5E4124855AF48D
4154C19BB925E2B7095E41CDDC4767903F54C11323974663075E41B6BBAE74F63D54C1D79E4E4DC6045E41D751B134CA3C54C1F261386FF1015E411122F50B133C54
C1922B9585F6FE5D414C94087DD53B54C173AA7759E8FB5D41'::geometry)
(2 rows)
Time: 1.218 ms


以下SQL 按距离排序.
digoal=# select *,ST_Distance(jwd, ST_Transform(ST_GeomFromText('POINT(120.19 30.26)', 4326), 2163)) from cust_jw order by jwd <-> ST_Transform(ST_GeomFromText('POINT(120.19 30.26)', 4326), 2163);
  dz  |                        jwd                         |   st_distance    
------+----------------------------------------------------+------------------
 杭州 | 0101000020730800004C94087D5D4F54C173AA7759E8FB5D41 |                0
 余杭 | 0101000020730800000E6E5A20494854C121FC688DA9EF5D41 | 14483.9823187612
 南京 | 0101000020730800000FFE5AD1D62653C16F4F972A10635E41 | 321491.591341196
 南昌 | 010100002073080000B2744BA1FE5253C10975D1494AA25F41 | 503843.306221247
 北京 | 0101000020730800006EBBB0F1AB0E4FC17207C71D44525E41 | 1256521.71432098
 南宁 | 01010000207308000030806B3882F451C18E3F38DCBB686141 |  1409624.7420143
 贵阳 | 01010000207308000082EA89026EE14FC1D6A3AD6E9E786141 | 1732521.31784296
(7 rows)
Time: 0.598 ms

通过以下方法强制排序走索引 :
digoal=# set enable_seqscan=off;
SET
Time: 0.109 ms
digoal=# explain select *,ST_Distance(jwd, ST_Transform(ST_GeomFromText('POINT(120.19 30.26)', 4326), 2163)) from cust_jw order by jwd <-> ST_Transform(ST_GeomFromText('POINT(120.19 30.26)', 4326), 2163);
                                      QUERY PLAN                                      
--------------------------------------------------------------------------------------
 Index Scan using idx_cust_jw_1 on cust_jw  (cost=0.14..54.44 rows=140 width=548)
   Order By: (jwd <-> '0101000020730800004C94087D5D4F54C173AA7759E8FB5D41'::geometry)
(2 rows)


以下为进一步的优化, 如果点比较密集的话, 这种方法比较好. 
digoal=# select * from (select *,ST_Distance(jwd, ST_Transform(ST_GeomFromText('POINT(120.19 30.26)', 4326), 2163)) AS dist from cust_jw order by jwd <-> ST_Transform(ST_GeomFromText('POINT(120.19 30.26)', 4326), 2163) limit 1000) t where dist<15000;
  dz  |                        jwd                         |       dist       
------+----------------------------------------------------+------------------
 杭州 | 0101000020730800004C94087D5D4F54C173AA7759E8FB5D41 |                0
 余杭 | 0101000020730800000E6E5A20494854C121FC688DA9EF5D41 | 14483.9823187612
(2 rows)
Time: 0.634 ms


进一步优化, 使用游标, 可以将数据扫描降到极限. (前提是for循环中的sql order by使用了索引)
digoal=# do language plpgsql $$
declare
  v_rec record;
  v_limit int := 1000;
begin
  set enable_seqscan=off;  -- 强制索引, 因为扫描行数够就退出.
  for v_rec in select *,ST_Distance(jwd, ST_Transform(ST_GeomFromText('POINT(120.19 30.26)', 4326), 2163)) AS dist from cust_jw order by jwd <-> ST_Transform(ST_GeomFromText('POINT(120.19 30.26)', 4326), 2163) loop
    if v_limit <=0 then 
      raise notice '已经取足数据';
      return;
    end if;
    if v_rec.dist > 20000 then 
      raise notice '满足条件的点已输出完毕';
      return;
    else
      raise notice 'do someting, v_rec:%', v_rec;
    end if;
    v_limit := v_limit -1;
  end loop;
end;
$$;
NOTICE:  do someting, v_rec:(杭州,0101000020730800004C94087D5D4F54C173AA7759E8FB5D41,0)
NOTICE:  do someting, v_rec:(余杭,0101000020730800000E6E5A20494854C121FC688DA9EF5D41,14483.9823187612)
NOTICE:  满足条件的点已输出完毕
DO

使用这种方法最多扫描比需求结果多1行.

[参考]
相关文章
|
4月前
|
机器学习/深度学习 人工智能
【CatBoost报错解决】CatBoostError: Bad value for num feature[non default doc idx=0,feature idx=19]=
【CatBoost报错解决】CatBoostError: Bad value for num feature[non default doc idx=0,feature idx=19]=
|
Java 定位技术
Google Earth Engine(GEE)——Error: Projection: The CRS of a map projection could not be parsed.(坐标转换错)
Google Earth Engine(GEE)——Error: Projection: The CRS of a map projection could not be parsed.(坐标转换错)
622 0
Google Earth Engine(GEE)——Error: Projection: The CRS of a map projection could not be parsed.(坐标转换错)
|
编解码 ice
Google Earth Engine ——MYD13Q1.006 Aqua Vegetation Indices 16-Day Global 250m分辨率(NDVI/EVI)数据集
Google Earth Engine ——MYD13Q1.006 Aqua Vegetation Indices 16-Day Global 250m分辨率(NDVI/EVI)数据集
178 0
Google Earth Engine ——MYD13Q1.006 Aqua Vegetation Indices 16-Day Global 250m分辨率(NDVI/EVI)数据集
|
编解码 ice
Google Earth Engine ——MYD13A1/A2.006 Aqua Vegetation Indices 8/16-Day Global 500m(NDVI和EVI)指数数据集
Google Earth Engine ——MYD13A1/A2.006 Aqua Vegetation Indices 8/16-Day Global 500m(NDVI和EVI)指数数据集
112 0
Google Earth Engine ——MYD13A1/A2.006 Aqua Vegetation Indices 8/16-Day Global 500m(NDVI和EVI)指数数据集
|
大数据 atlas
Google Earth Engine——Oxford MAP TCW: Malaria Atlas Project Gap-Filled Tasseled Cap Wetness数据集
Google Earth Engine——Oxford MAP TCW: Malaria Atlas Project Gap-Filled Tasseled Cap Wetness数据集
126 0
Google Earth Engine——Oxford MAP TCW: Malaria Atlas Project Gap-Filled Tasseled Cap Wetness数据集
halcon模板匹配实践(2)算子find_shape_model里的参数Row, Column, Angle含义是什么?
halcon模板匹配实践(2)算子find_shape_model里的参数Row, Column, Angle含义是什么?
617 0
halcon模板匹配实践(2)算子find_shape_model里的参数Row, Column, Angle含义是什么?
Halcon之select_shape_xld是支持多个参数过滤的
Halcon之select_shape_xld是支持多个参数过滤的
581 0