PostgreSQL 空间类型统计信息(Statistics, Cardinality, Selectivity, Estimate)不准确导致SQL执行计划不准(包含、相交查询)的优化实践

本文涉及的产品
云原生数据库 PolarDB 分布式版,标准版 2核8GB
云数据库 RDS MySQL,集群系列 2核4GB
推荐场景:
搭建个人博客
RDS MySQL Serverless 基础系列,0.5-2RCU 50GB
简介:

标签

PostgreSQL , 空间类型 , 统计信息 , 包含查询 , 相交查询 , BOUND , index , x , y


背景

PostgreSQL 通过几何类型以及PostGIS插件,支持丰富的空间类型,空间数据类型包括KNN检索,空间包含,空间相交,方位搜索,空间计算等操作。

为了得到正确的执行计划,在数据库优化器中,评估满足某个操作符条件的行数,是一个重要的CBO依据。

而评估选择性又依赖统计信息,同时依赖每种操作符的选择性算法。

一些优化器相关文档:

《PostgreSQL 自定义函数表达式选择性评估算法 - Statistics, Cardinality, Selectivity, Estimate》

《PostgreSQL 多值列的选择性 - Statistics, Cardinality, Selectivity, Estimate》

《PostgreSQL 11 preview - 表达式索引柱状图buckets\STATISTICS\default_statistics_target可设置》

《PostgreSQL 11 preview - 优化器 增强 汇总》

《PostgreSQL 11 preview - 强制auto prepared statment开关(自动化plan cache)(类似Oracle cursor_sharing force)》

《[未完待续] PostgreSQL PRO 特性 - AQO(机器学习执行计划优化器)》

《PostgreSQL pg_stat_reset清除track_counts的隐患》

《PostgreSQL 统计信息pg_statistic格式及导入导出dump_stat - 兼容Oracle》

《优化器里的概率学 - 性能抖动原理分析》

《懒人推动社会进步 - 多列聚合, gin与数据分布(选择性)》

《数据库优化器原理(含动态规划、机器学习建模优化器aqo) - 如何治疗选择综合症》

《PostgreSQL 10.0 preview 功能增强 - SQL执行剩余时间 - 垃圾回收过程可视pg_stat_progress_vacuum》

《PostgreSQL 10.0 preview 性能增强 - hash,nestloop join优化(聪明的优化器是这样的)》

《PostgreSQL 10.0 preview 功能增强 - 动态视图pg_stat_activity新增数据库管理进程信息》

《PostgreSQL 10.0 preview 功能增强 - 流复制统计视图pg_stat_replication增强, 新增时间维度延迟评估》

《PostgreSQL 10.0 preview 优化器改进 - 不完整索引支持复合排序》

《聊一下PostgreSQL优化器 - in里面有重复值时PostgreSQL如何处理?》

《官人要杯咖啡吗? - PostgreSQL实时监测PLAN tree的执行进度 - pg_query_state》

《PostgreSQL pg_stat_ pg_statio_ 统计信息(scan,read,fetch,hit)源码解读》

《PostgreSQL 9.6 并行计算 优化器算法浅析 - 以及如何强制并行度》

《PostgreSQL 9.6 并行计算 优化器算法浅析》

《关键时刻HINT出彩 - PG优化器的参数优化、执行计划固化CASE》

《Greenplum ORCA 优化器的编译安装与使用》

《PostgreSQL prepared statement和simple query的profile及性能差异》

《PostgreSQL 优化器逻辑推理能力 源码解析》

《PostgreSQL 操作符与优化器详解 - 包含(选择性、JOIN方法、等效)等内容》

《PostgreSQL pg_stat_replication sent_location, write_location, flush_location, replay_location的差别》

《为什么PostgreSQL启动后有个UDP监听localhost - pgstat统计信息进程间通信》

《pg_stat_statements fwrite not save》

《PostgreSQL 9.5 new feature - use FSM fast evaluate bloat and other table level statics(pgstattuple_approx)》

《PostgreSQL aggregate function 2 : Aggregate Functions for Statistics》

《优化器成本因子校对(disk,ssd,memory IO开销精算) - PostgreSQL real seq_page_cost & random_page_cost in disks,ssd,memory》

《优化器成本因子校对 - PostgreSQL explain cost constants alignment to timestamp》

《PostgreSQL pg_stats used to estimate top N freps values and explain rows》

《执行计划选择算法 与 绑定变量 - PostgreSQL prepared statement: SPI_prepare, prepare|execute COMMAND, PL/pgsql STYLE: custom & generic plan cache》

《执行计划维度统计\判断执行计划翻转\统计每种执行计划的统计信息 use pg_stat_plans in PostgreSQL 9.0, 9.1 and 9.2》

《PostgreSQL 9.2 improve prepared statements plan's selectivity》

《PostgreSQL 9.2 add array elements statistics》

《WHY prepared Statement running slower in some situation CASE》

《PostgreSQL 优化器行评估算法》

那么,空间类型有合理的统计信息吗?

“目前空间数据类型,还没有柱状图,高频词等内容”,所以给操作符提供的选择性依据非常有限。

因此目前存在一个这样的问题,选择性函数内使用百分比硬编码,在代码中写死了选择性比例,那么空间包含查询(不管范围多大,评估出来的记录数是一样的),所以基本上都会选择空间索引扫描( 然而实际上当真实空间包含查询包含了大量记录时,全表扫描性能更好 )。

例子

1、建表(pos存储point数据),写入1000万记录数据,创建空间索引

create extension postgis;  
create table test (id int , info text, pos geometry);  
insert into test select generate_series(1,10000000), 'test', st_setsrid(st_makepoint(110+random()*20-10, 70+random()*10-5), 4326);  
create index idx_test_1 on test using gist(pos);  

2、收集统计信息

vacuum analyze test;  

你会发现空间类型的统计信息非常有限(没有高频词,柱状图,相关性等内容):

postgres=# select * from pg_stats where tablename='test' and attname='pos';  
 schemaname | tablename | attname | inherited | null_frac | avg_width | n_distinct | most_common_vals | most_common_freqs | histogram_bounds | correlation | most_common_elems | most_common_elem_freqs | elem_count_histogram   
------------+-----------+---------+-----------+-----------+-----------+------------+------------------+-------------------+------------------+-------------+-------------------+------------------------+----------------------  
 public     | test      | pos     | f         |         0 |        32 |         -1 |                  |                   |                  |             |                   |                        |   
(1 row)  

3、查看执行计划,查看评估出来的记录数。

空间类型选择性使用如下硬编码0.001。

src/backend/utils/adt/geo_selfuncs.c

/*
 *      contsel -- How likely is a box to contain (be contained by) a given box?
 *
 * This is a tighter constraint than "overlap", so produce a smaller
 * estimate than areasel does.
 */

Datum
contsel(PG_FUNCTION_ARGS)
{
        PG_RETURN_FLOAT8(0.001);
}

不管输入什么空间条件,评估得到的记录数都是千分之一:10000条

条件1:
postgres=# explain select * from test where st_contains(st_setsrid(st_makebox2d(st_makepoint(110,1), st_makepoint(112,2)), 4326) , pos);  
                                                  QUERY PLAN                                                                                                                    
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  
 Bitmap Heap Scan on test  (cost=149.95..12834.93 rows=3333 width=41)  
   Recheck Cond: ('0103000020E610000001000000050000000000000000805B40000000000000F03F0000000000805B4000000000000000400000000000005C4000000000000000400000000000005C40000000000000F03F0000000000805B40000000000000F03F'::geometry ~ pos)  
   Filter: _st_contains('0103000020E610000001000000050000000000000000805B40000000000000F03F0000000000805B4000000000000000400000000000005C4000000000000000400000000000005C40000000000000F03F0000000000805B40000000000000F03F'::geometry, pos)  
   ->  Bitmap Index Scan on idx_test_1  (cost=0.00..149.12 rows=10000 width=0)  
         Index Cond: ('0103000020E610000001000000050000000000000000805B40000000000000F03F0000000000805B4000000000000000400000000000005C4000000000000000400000000000005C40000000000000F03F0000000000805B40000000000000F03F'::geometry ~ pos)  
(5 rows)  
  
条件2:
postgres=# explain select * from test where st_contains(st_setsrid(st_makebox2d(st_makepoint(110,1), st_makepoint(120,100)), 4326) , pos);  
                                                 QUERY PLAN                                                                                                                    
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  
 Bitmap Heap Scan on test  (cost=149.95..12834.93 rows=3333 width=41)  
   Recheck Cond: ('0103000020E610000001000000050000000000000000805B40000000000000F03F0000000000805B4000000000000059400000000000005E4000000000000059400000000000005E40000000000000F03F0000000000805B40000000000000F03F'::geometry ~ pos)  
   Filter: _st_contains('0103000020E610000001000000050000000000000000805B40000000000000F03F0000000000805B4000000000000059400000000000005E4000000000000059400000000000005E40000000000000F03F0000000000805B40000000000000F03F'::geometry, pos)  
   ->  Bitmap Index Scan on idx_test_1  (cost=0.00..149.12 rows=10000 width=0)  
         Index Cond: ('0103000020E610000001000000050000000000000000805B40000000000000F03F0000000000805B4000000000000059400000000000005E4000000000000059400000000000005E40000000000000F03F0000000000805B40000000000000F03F'::geometry ~ pos)  
(5 rows)  

4、那么实际上的记录数是多少呢?

条件1:0条。

postgres=# explain analyze select count(*) from test where st_contains(st_setsrid(st_makebox2d(st_makepoint(110,1), st_makepoint(112,2)), 4326) , pos);  
                               QUERY PLAN    
-------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=12843.26..12843.27 rows=1 width=8) (actual time=0.015..0.015 rows=1 loops=1)  
   ->  Bitmap Heap Scan on test  (cost=149.95..12834.93 rows=3333 width=0) (actual time=0.012..0.012 rows=0 loops=1)  
         Recheck Cond: ('0103000020E610000001000000050000000000000000805B40000000000000F03F0000000000805B4000000000000000400000000000005C4000000000000000400000000000005C40000000000000F03F0000000000805B40000000000000F03F'::geometry ~ pos)  
         Filter: _st_contains('0103000020E610000001000000050000000000000000805B40000000000000F03F0000000000805B4000000000000000400000000000005C4000000000000000400000000000005C40000000000000F03F0000000000805B40000000000000F03F'::geometry, pos)  
         ->  Bitmap Index Scan on idx_test_1  (cost=0.00..149.12 rows=10000 width=0) (actual time=0.010..0.010 rows=0 loops=1)  
               Index Cond: ('0103000020E610000001000000050000000000000000805B40000000000000F03F0000000000805B4000000000000000400000000000005C4000000000000000400000000000005C40000000000000F03F0000000000805B40000000000000F03F'::geometry ~   
pos)  
 Planning time: 0.170 ms  
 Execution time: 0.061 ms  
(8 rows)  

条件2:5001107条。

postgres=# explain analyze select count(*) from test where st_contains(st_setsrid(st_makebox2d(st_makepoint(110,1), st_makepoint(120,100)), 4326) , pos);  
                                QUERY PLAN    
----------------------------------------------------------------------------------------------------------------------------  
 Aggregate  (cost=12843.26..12843.27 rows=1 width=8) (actual time=5057.822..5057.822 rows=1 loops=1)  
   ->  Bitmap Heap Scan on test  (cost=149.95..12834.93 rows=3333 width=0) (actual time=507.744..4537.564 rows=5001107 loops=1)  
         Recheck Cond: ('0103000020E610000001000000050000000000000000805B40000000000000F03F0000000000805B4000000000000059400000000000005E4000000000000059400000000000005E40000000000000F03F0000000000805B40000000000000F03F'::geometry ~ pos)  
         Rows Removed by Index Recheck: 1981986  
         Filter: _st_contains('0103000020E610000001000000050000000000000000805B40000000000000F03F0000000000805B4000000000000059400000000000005E4000000000000059400000000000005E40000000000000F03F0000000000805B40000000000000F03F'::geometry,  
 pos)  
         Heap Blocks: exact=50167 lossy=33167  
         ->  Bitmap Index Scan on idx_test_1  (cost=0.00..149.12 rows=10000 width=0) (actual time=499.627..499.627 rows=5001107 loops=1)  
               Index Cond: ('0103000020E610000001000000050000000000000000805B40000000000000F03F0000000000805B4000000000000059400000000000005E4000000000000059400000000000005E40000000000000F03F0000000000805B40000000000000F03F'::geometry ~   
pos)  
 Planning time: 0.179 ms  
 Execution time: 5057.882 ms  
(10 rows)  

5、空间类型的选择性不准确,导致的问题,不管什么情况都走空间索引。

5.1 大范围搜索,使用索引反而更慢,如下。

postgres=# explain analyze select count(*) from test where st_contains(st_setsrid(st_makebox2d(st_makepoint(110,1), st_makepoint(120,100)), 4326) , pos);  
                                        QUERY PLAN  
-------------------------------------------------------------------------------------------------------  
 Finalize Aggregate  (cost=10539.13..10539.14 rows=1 width=8) (actual time=1079.698..1079.699 rows=1 loops=1)  
   ->  Gather  (cost=10539.10..10539.11 rows=8 width=8) (actual time=1079.655..1079.693 rows=9 loops=1)  
         Workers Planned: 8  
         Workers Launched: 8  
         ->  Partial Aggregate  (cost=10539.10..10539.11 rows=1 width=8) (actual time=1049.984..1049.984 rows=1 loops=9)  
               ->  Parallel Bitmap Heap Scan on test  (cost=149.95..10538.06 rows=417 width=0) (actual time=483.733..991.774 rows=555679 loops=9)  
                     Recheck Cond: ('0103000020E610000001000000050000000000000000805B40000000000000F03F0000000000805B4000000000000059400000000000005E4000000000000059400000000000005E40000000000000F03F0000000000805B40000000000000F03F'::geo  metry ~ pos)  
                     Rows Removed by Index Recheck: 220221  
                     Filter: _st_contains('0103000020E610000001000000050000000000000000805B40000000000000F03F0000000000805B4000000000000059400000000000005E4000000000000059400000000000005E40000000000000F03F0000000000805B40000000000000F03F'::geometry, pos)  
                     Heap Blocks: exact=5908 lossy=3936  
                     ->  Bitmap Index Scan on idx_test_1  (cost=0.00..149.12 rows=10000 width=0) (actual time=502.790..502.790 rows=5001107 loops=1)  
                           Index Cond: ('0103000020E610000001000000050000000000000000805B40000000000000F03F0000000000805B4000000000000059400000000000005E4000000000000059400000000000005E40000000000000F03F0000000000805B40000000000000F03F'::geometry ~ pos)  
 Planning time: 0.171 ms  
 Execution time: 1087.890 ms  
(14 rows)  
  
  
postgres=# explain analyze select count(*) from test where st_contains(st_setsrid(st_makebox2d(st_makepoint(110,1), st_makepoint(120,100)), 4326) , pos);  
                                  QUERY PLAN           
-------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=411456.73..411456.74 rows=1 width=8) (actual time=674.853..674.853 rows=1 loops=1)  
   ->  Gather  (cost=411456.70..411456.71 rows=8 width=8) (actual time=674.793..674.848 rows=9 loops=1)  
         Workers Planned: 8  
         Workers Launched: 8  
         ->  Partial Aggregate  (cost=411456.70..411456.71 rows=1 width=8) (actual time=644.627..644.627 rows=1 loops=9)  
               ->  Parallel Seq Scan on test  (cost=0.00..411455.65 rows=417 width=0) (actual time=0.045..586.545 rows=555679 loops=9)  
                     Filter: (('0103000020E610000001000000050000000000000000805B40000000000000F03F0000000000805B4000000000000059400000000000005E4000000000000059400000000000005E40000000000000F03F0000000000805B40000000000000F03F'::geometry   ~ pos) AND _st_contains('0103000020E610000001000000050000000000000000805B40000000000000F03F0000000000805B4000000000000059400000000000005E4000000000000059400000000000005E40000000000000F03F0000000000805B40000000000000F03F'::geometry, pos)  )  
                     Rows Removed by Filter: 555433  
 Planning time: 0.130 ms  
 Execution time: 683.011 ms  
(10 rows)  

优化方法

首先我们知道空间包含,实际上是通过类似r-tree来从大到小进行收敛的,索引结构如下:

《PostgreSQL 黑科技 - 空间聚集存储, 内窥GIN, GiST, SP-GiST索引》

《通过空间思想理解GiST索引的构造》

因此,对于“平面或3D空间点”类型,我们可以利用x,y,z来收集BOUND BOX的边界统计信息,使用表达式索引即可得到。

下面的方法适合POINT geometry类型。

1、对point类型,创建x,y表达式索引

create index idx_test_2 on test (st_x(pos));  
create index idx_test_3 on test (st_y(pos));  

2、收集统计信息

vacuum analyze test;  

3、现在,表达式索引的统计信息有了(边界,柱状图,高频词等).

postgres=# select * from pg_stats where tablename='idx_test_2' ;  
 schemaname | tablename  | attname | inherited | null_frac | avg_width | n_distinct |  most_common_vals  | most_common_freqs |                                                                                                                 
                                                                                                                                                                                                                                               
                                                                                                                                                                                                                                               
                                                                                                                                                                                                                                               
                        histogram_bounds                                                                                                                                                                                                       
                                                                                                                                                                                                                                               
                                                                                                                                                                                                                                               
                                                                                                                                                                               | correlation | most_common_elems | most_common_elem_freqs | e  
lem_count_histogram   
------------+------------+---------+-----------+-----------+-----------+------------+--------------------+-------------------+---------------------------------------------------------------------------------------------------------------  
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------+-------------------+------------------------+--  
--------------------  
 public     | idx_test_2 | st_x    | f         |         0 |         8 |  -0.978292 | {100.210125753656} | {6.66667e-05}     | {100.001683048904,100.198292508721,100.394482128322,100.614011939615,100.833150474355,101.039264379069,101.233  
114106581,101.457025809214,101.663421457633,101.865276163444,102.060301089659,102.248072810471,102.453578868881,102.656150059775,102.860928429291,103.057968365029,103.260449869558,103.457426037639,103.632223745808,103.85178135708,104.049  
411769956,104.26340050064,104.474113518372,104.658453594893,104.85337683931,105.054327072576,105.277718435973,105.484521845356,105.703673372045,105.907790735364,106.115896645933,106.314396839589,106.519666947424,106.724081514403,106.9380  
68913296,107.144253859296,107.326071513817,107.518919138238,107.721077715978,107.916374253109,108.123016441241,108.360300129279,108.551258808002,108.714984534308,108.903268156573,109.091228945181,109.277053307742,109.48098176159,109.6832  
09387586,109.853478074074,110.044501451775,110.241627292708,110.45505293645,110.652585113421,110.851763477549,111.036027139053,111.231312695891,111.423934968188,111.620921371505,111.847442938015,112.031054906547,112.234003236517,112.4332  
3944509,112.63793184422,112.834744984284,113.041293732822,113.23037719354,113.432592023164,113.619993180037,113.809020379558,114.014157289639,114.208164261654,114.409972019494,114.612494371831,114.827286116779,115.010247323662,115.208249  
371499,115.397510435432,115.607831152156,115.814645215869,116.007154844701,116.198594048619,116.39090036042,116.600214680657,116.819634130225,117.008039858192,117.199446037412,117.389995325357,117.60263632983,117.81475706026,118.02150100  
4696,118.219726895913,118.425922412425,118.623819025233,118.801201758906,118.997462140396,119.18357164599,119.394188923761,119.607428628951,119.789487998933,119.999485854059} |   0.0060246 |                   |                        |   
(1 row)  
  
postgres=# select * from pg_stats where tablename='idx_test_3' ;  
 schemaname | tablename  | attname | inherited | null_frac | avg_width | n_distinct | most_common_vals | most_common_freqs |                                                                                                                   
                                                                                                                                                                                                                                               
                                                                                                                                                                                                                                               
                                                                                                                                                                                                                                               
                   histogram_bounds                                                                                                                                                                                                            
                                                                                                                                                                                                                                               
                                                                                                                                                                                                                                               
                                                                                                                                                                        | correlation | most_common_elems | most_common_elem_freqs | elem_cou  
nt_histogram   
------------+------------+---------+-----------+-----------+-----------+------------+------------------+-------------------+-----------------------------------------------------------------------------------------------------------------  
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  
------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------+-------------------+------------------------+---------  
-------------  
 public     | idx_test_3 | st_y    | f         |         0 |         8 |         -1 |                  |                   | {65.0000545242801,65.0990079157054,65.2072919439524,65.301208673045,65.3929488640279,65.4877291806042,65.5846159  
020439,65.6880339607596,65.7856895681471,65.8818093780428,65.9888388821855,66.0896951518953,66.1984446132556,66.304198782891,66.3969474285841,66.496685273014,66.6056828852743,66.7012783605605,66.7941406555474,66.8967378046364,66.98895718  
90235,67.0855156239122,67.191197341308,67.2994636883959,67.3969272850081,67.4947946099564,67.5993846775964,67.7099830284715,67.8120812214911,67.9088703868911,67.9981751134619,68.099527056329,68.1973828841001,68.2960116351023,68.398535256  
274,68.4888928988948,68.5882083000615,68.6907122470438,68.800939405337,68.9010692993179,69.0081828692928,69.11032628268,69.19522870332,69.2957087606192,69.3950105085969,69.5086302934214,69.6040614042431,69.7045981185511,69.8040427314118,  
69.906362099573,70.0155263161287,70.1146272942424,70.2022811910138,70.2994426619262,70.4040612280369,70.5112449126318,70.6194959674031,70.722281485796,70.822925157845,70.918908463791,71.0138623649254,71.1134587181732,71.2168908445165,71.  
3034919975325,71.4052765490487,71.5016098646447,71.6014923620969,71.6971560986713,71.8042277451605,71.8970008520409,71.9877171749249,72.0928315026686,72.1963384188712,72.2967635607347,72.3947894759476,72.4977402808145,72.5967703945935,72  
.6985206454992,72.7883456554264,72.885032473132,72.9875696543604,73.0857636081055,73.1845043040812,73.280454329215,73.3783990284428,73.4819683339447,73.5874625109136,73.6941803013906,73.7924668611959,73.8872256595641,73.9888873603195,74.  
0806347271428,74.1819418873638,74.2877806117758,74.3848745618016,74.4918680656701,74.5988712925464,74.7014277381822,74.8038458405063,74.9038722459227,74.9999207118526} | -0.00307583 |                   |                        |   
(1 row)  

现在我们改一下SQL,把boundbox放进SQL

select count(*) from test   
where   
st_contains(st_setsrid(st_makebox2d(st_makepoint(110,1), st_makepoint(112,2)), 4326) , pos);  

改成:

select count(*) from test   
where   
st_x(pos)   
  between  
    st_xmin(box2d(st_setsrid(st_makebox2d(st_makepoint(110,1), st_makepoint(120,100)), 4326)))  
  and  
    st_xmax(box2d(st_setsrid(st_makebox2d(st_makepoint(110,1), st_makepoint(120,100)), 4326)))  
and  
st_y(pos)   
  between  
    st_ymin(box2d(st_setsrid(st_makebox2d(st_makepoint(110,1), st_makepoint(120,100)), 4326)))  
  and  
    st_ymax(box2d(st_setsrid(st_makebox2d(st_makepoint(110,1), st_makepoint(120,100)), 4326)))  
and  
st_contains(st_setsrid(st_makebox2d(st_makepoint(110,1), st_makepoint(120,100)), 4326) , pos);  

执行计划如下:

在大范围输入时,通过表达式索引条件评估得到的行数比空间编码大于空间评估(硬编码 千分之一)得到的记录数,所以依旧使用了空间索引。

postgres=# explain select count(*) from test   
where   
st_x(pos)   
  between  
    st_xmin(box2d(st_setsrid(st_makebox2d(st_makepoint(110,1), st_makepoint(120,100)), 4326)))  
  and  
    st_xmax(box2d(st_setsrid(st_makebox2d(st_makepoint(110,1), st_makepoint(120,100)), 4326)))  
and  
st_y(pos)   
  between  
    st_ymin(box2d(st_setsrid(st_makebox2d(st_makepoint(110,1), st_makepoint(120,100)), 4326)))  
  and  
    st_ymax(box2d(st_setsrid(st_makebox2d(st_makepoint(110,1), st_makepoint(120,100)), 4326)))  
and  
st_contains(st_setsrid(st_makebox2d(st_makepoint(110,1), st_makepoint(120,100)), 4326) , pos);  
                       QUERY PLAN                                                                                                                                        
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=12380.60..12380.61 rows=1 width=8)  
   ->  Index Scan using idx_test_1 on test  (cost=0.42..12376.42 rows=1674 width=0)  
         Index Cond: ('0103000020E610000001000000050000000000000000805B40000000000000F03F0000000000805B4000000000000059400000000000005E4000000000000059400000000000005E40000000000000F03F0000000000805B40000000000000F03F'::geometry ~ pos)  
         Filter: ((st_x(pos) >= '110'::double precision) AND (st_x(pos) <= '120'::double precision) AND (st_y(pos) >= '1'::double precision) AND (st_y(pos) <= '100'::double precision) AND _st_contains('0103000020E610000001000000050000000  000000000805B40000000000000F03F0000000000805B4000000000000059400000000000005E4000000000000059400000000000005E40000000000000F03F0000000000805B40000000000000F03F'::geometry, pos))  
(4 rows)  

在小范围输入时,评估得到的行数比空间编码更少(千分之一),所以使用了非空间索引。

explain select count(*) from test   
where   
st_x(pos)   
  between  
    st_xmin(box2d(st_setsrid(st_makebox2d(st_makepoint(110,1), st_makepoint(112,2)), 4326)))  
  and  
    st_xmax(box2d(st_setsrid(st_makebox2d(st_makepoint(110,1), st_makepoint(112,2)), 4326)))  
and  
st_y(pos)   
  between  
    st_ymin(box2d(st_setsrid(st_makebox2d(st_makepoint(110,1), st_makepoint(112,2)), 4326)))  
  and  
    st_ymax(box2d(st_setsrid(st_makebox2d(st_makepoint(110,1), st_makepoint(112,2)), 4326)))  
and  
st_contains(st_setsrid(st_makebox2d(st_makepoint(110,1), st_makepoint(112,2)), 4326) , pos);  
  
   QUERY PLAN                                
----------------------------------------------------------------------------------------------
 Aggregate  (cost=2.92..2.93 rows=1 width=8)  
   ->  Index Scan using idx_test_3 on test  (cost=0.43..2.92 rows=1 width=0)  
         Index Cond: ((st_y(pos) >= '1'::double precision) AND (st_y(pos) <= '2'::double precision))  
         Filter: (('0103000020E610000001000000050000000000000000805B40000000000000F03F0000000000805B4000000000000000400000000000005C4000000000000000400000000000005C40000000000000F03F0000000000805B40000000000000F03F'::geometry ~ pos) AND   
(st_x(pos) >= '110'::double precision) AND (st_x(pos) <= '112'::double precision) AND _st_contains('0103000020E610000001000000050000000000000000805B40000000000000F03F0000000000805B4000000000000000400000000000005C4000000000000000400000000  
000005C40000000000000F03F0000000000805B40000000000000F03F'::geometry, pos))  
(4 rows)  

一些函数

查看geometry的bound box

postgres=# select st_astext(box2d(st_setsrid(st_makebox2d(st_makepoint(110,1), st_makepoint(112,2)), 4326) ));  
                st_astext                   
------------------------------------------  
 POLYGON((110 1,110 2,112 2,112 1,110 1))  
(1 row)  
postgres=# select st_astext(box2d(ST_GeomFromEWKT('CIRCULARSTRING(220268 150415 1,220227 150505 2,220227 150406 3)')));  
                                                                       st_astext                                                                          
--------------------------------------------------------------------------------------------------------------------------------------------------------  
 POLYGON((220186.995121892 150406,220186.995121892 150506.126829327,220288.248780547 150506.126829327,220288.248780547 150406,220186.995121892 150406))  
(1 row)  

2、查看BOUND BOX的边界点x,y.

postgres=# select st_xmin(box2d(ST_GeomFromEWKT('CIRCULARSTRING(220268 150415 1,220227 150505 2,220227 150406 3)')));  
     st_xmin        
------------------  
 220186.995121892  
(1 row)  
  
postgres=# select st_xmax(box2d(ST_GeomFromEWKT('CIRCULARSTRING(220268 150415 1,220227 150505 2,220227 150406 3)')));  
     st_xmax        
------------------  
 220288.248780547  
(1 row)  
  
postgres=# select st_ymin(box2d(ST_GeomFromEWKT('CIRCULARSTRING(220268 150415 1,220227 150505 2,220227 150406 3)')));  
 st_ymin   
---------  
  150406  
(1 row)  
  
postgres=# select st_ymax(box2d(ST_GeomFromEWKT('CIRCULARSTRING(220268 150415 1,220227 150505 2,220227 150406 3)')));  
     st_ymax        
------------------  
 150506.126829327  
(1 row)  

小结

目前空间类型的bound box没有统计信息,所以空间查询时,选择性使用了硬编码写死,固定了选择性百分比,乘以pg_class.reltuples即可得到评估记录数,但是不准确。

使得PostgreSQL的空间搜索(包含搜索),不管范围多大,都使用索引,而实际上大范围可能使用全表扫描性能更好。

本文增加boundbox的条件,使用xyz边界条件,使得行数评估更加的准确,在适当的时候选择适合的索引。

当然,最好的方法,还是PG内核层面优化空间类型的统计信息,以及选择性代码的支持。

参考

建议,PostgreSQL内核层面改进空间类型的统计信息,点类型,分别统计X,Y轴的统计信息。

《PostgreSQL 自定义函数表达式选择性评估算法 - Statistics, Cardinality, Selectivity, Estimate》

相关实践学习
使用PolarDB和ECS搭建门户网站
本场景主要介绍基于PolarDB和ECS实现搭建门户网站。
阿里云数据库产品家族及特性
阿里云智能数据库产品团队一直致力于不断健全产品体系,提升产品性能,打磨产品功能,从而帮助客户实现更加极致的弹性能力、具备更强的扩展能力、并利用云设施进一步降低企业成本。以云原生+分布式为核心技术抓手,打造以自研的在线事务型(OLTP)数据库Polar DB和在线分析型(OLAP)数据库Analytic DB为代表的新一代企业级云原生数据库产品体系, 结合NoSQL数据库、数据库生态工具、云原生智能化数据库管控平台,为阿里巴巴经济体以及各个行业的企业客户和开发者提供从公共云到混合云再到私有云的完整解决方案,提供基于云基础设施进行数据从处理、到存储、再到计算与分析的一体化解决方案。本节课带你了解阿里云数据库产品家族及特性。
目录
相关文章
|
4月前
|
XML SQL 数据格式
XML动态sql查询当前时间之前的信息报错
XML动态sql查询当前时间之前的信息报错
58 2
|
28天前
|
SQL Java
使用java在未知表字段情况下通过sql查询信息
使用java在未知表字段情况下通过sql查询信息
36 8
|
1月前
|
SQL 监控 关系型数据库
SQL语句当前及历史信息查询-performance schema的使用
本文介绍了如何使用MySQL的Performance Schema来获取SQL语句的当前和历史执行信息。Performance Schema默认在MySQL 8.0中启用,可以通过查询相关表来获取详细的SQL执行信息,包括当前执行的SQL、历史执行记录和统计汇总信息,从而快速定位和解决性能瓶颈。
|
3月前
|
SQL Java
使用java在未知表字段情况下通过sql查询信息
使用java在未知表字段情况下通过sql查询信息
40 1
|
2月前
|
SQL 存储 关系型数据库
mysql 数据库空间统计sql
mysql 数据库空间统计sql
50 0
|
3月前
|
SQL 关系型数据库 C语言
PostgreSQL SQL扩展 ---- C语言函数(三)
可以用C(或者与C兼容,比如C++)语言编写用户自定义函数(User-defined functions)。这些函数被编译到动态可加载目标文件(也称为共享库)中并被守护进程加载到服务中。“C语言函数”与“内部函数”的区别就在于动态加载这个特性,二者的实际编码约定本质上是相同的(因此,标准的内部函数库为用户自定义C语言函数提供了丰富的示例代码)
|
4月前
|
SQL 存储 关系型数据库
PostgreSQL核心之SQL基础学习
PostgreSQL核心之SQL基础学习
61 3
|
4月前
|
SQL 流计算
Flink SQL 在快手实践问题之由于meta信息变化导致的state向前兼容问题如何解决
Flink SQL 在快手实践问题之由于meta信息变化导致的state向前兼容问题如何解决
53 1
|
4月前
|
SQL 关系型数据库 数据库
数据库空间之谜:彻底解决RDS for SQL Server的空间难题
【8月更文挑战第16天】在管理阿里云RDS for SQL Server时,合理排查与解决空间问题是确保数据库性能稳定的关键。常见问题包括数据文件增长、日志文件膨胀及索引碎片累积。利用SQL Server的动态管理视图(DMV)可有效监测文件使用情况、日志空间及索引碎片化程度。例如,使用`sp_spaceused`检查文件使用量,`sys.dm_db_log_space_usage`监控日志空间,`sys.dm_db_index_physical_stats`识别索引碎片。同时,合理的备份策略和文件组设置也有助于优化空间使用,确保数据库高效运行。
116 2
|
4月前
|
SQL 安全 关系型数据库
PostgreSQL SQL注入漏洞(CVE-2018-10915)--处理
【8月更文挑战第8天】漏洞描述:PostgreSQL是一款自由的对象关系型数据库管理系统,支持多种SQL标准及特性。存在SQL注入漏洞,源于应用未有效验证外部输入的SQL语句,允许攻击者执行非法命令。受影响版本包括10.5及更早版本等。解决方法为升级PostgreSQL
311 2

相关产品

  • 云原生数据库 PolarDB
  • 云数据库 RDS PostgreSQL 版