PostgreSQL 空间类型统计信息(Statistics, Cardinality, Selectivity, Estimate)不准确导致SQL执行计划不准(包含、相交查询)的优化实践

本文涉及的产品
云原生数据库 PolarDB 分布式版,标准版 2核8GB
RDS SQL Server Serverless,2-4RCU 50GB 3个月
推荐场景:
云原生数据库 PolarDB MySQL 版,Serverless 5000PCU 100GB
简介:

标签

PostgreSQL , 空间类型 , 统计信息 , 包含查询 , 相交查询 , BOUND , index , x , y


背景

PostgreSQL 通过几何类型以及PostGIS插件,支持丰富的空间类型,空间数据类型包括KNN检索,空间包含,空间相交,方位搜索,空间计算等操作。

为了得到正确的执行计划,在数据库优化器中,评估满足某个操作符条件的行数,是一个重要的CBO依据。

而评估选择性又依赖统计信息,同时依赖每种操作符的选择性算法。

一些优化器相关文档:

《PostgreSQL 自定义函数表达式选择性评估算法 - Statistics, Cardinality, Selectivity, Estimate》

《PostgreSQL 多值列的选择性 - Statistics, Cardinality, Selectivity, Estimate》

《PostgreSQL 11 preview - 表达式索引柱状图buckets\STATISTICS\default_statistics_target可设置》

《PostgreSQL 11 preview - 优化器 增强 汇总》

《PostgreSQL 11 preview - 强制auto prepared statment开关(自动化plan cache)(类似Oracle cursor_sharing force)》

《[未完待续] PostgreSQL PRO 特性 - AQO(机器学习执行计划优化器)》

《PostgreSQL pg_stat_reset清除track_counts的隐患》

《PostgreSQL 统计信息pg_statistic格式及导入导出dump_stat - 兼容Oracle》

《优化器里的概率学 - 性能抖动原理分析》

《懒人推动社会进步 - 多列聚合, gin与数据分布(选择性)》

《数据库优化器原理(含动态规划、机器学习建模优化器aqo) - 如何治疗选择综合症》

《PostgreSQL 10.0 preview 功能增强 - SQL执行剩余时间 - 垃圾回收过程可视pg_stat_progress_vacuum》

《PostgreSQL 10.0 preview 性能增强 - hash,nestloop join优化(聪明的优化器是这样的)》

《PostgreSQL 10.0 preview 功能增强 - 动态视图pg_stat_activity新增数据库管理进程信息》

《PostgreSQL 10.0 preview 功能增强 - 流复制统计视图pg_stat_replication增强, 新增时间维度延迟评估》

《PostgreSQL 10.0 preview 优化器改进 - 不完整索引支持复合排序》

《聊一下PostgreSQL优化器 - in里面有重复值时PostgreSQL如何处理?》

《官人要杯咖啡吗? - PostgreSQL实时监测PLAN tree的执行进度 - pg_query_state》

《PostgreSQL pg_stat_ pg_statio_ 统计信息(scan,read,fetch,hit)源码解读》

《PostgreSQL 9.6 并行计算 优化器算法浅析 - 以及如何强制并行度》

《PostgreSQL 9.6 并行计算 优化器算法浅析》

《关键时刻HINT出彩 - PG优化器的参数优化、执行计划固化CASE》

《Greenplum ORCA 优化器的编译安装与使用》

《PostgreSQL prepared statement和simple query的profile及性能差异》

《PostgreSQL 优化器逻辑推理能力 源码解析》

《PostgreSQL 操作符与优化器详解 - 包含(选择性、JOIN方法、等效)等内容》

《PostgreSQL pg_stat_replication sent_location, write_location, flush_location, replay_location的差别》

《为什么PostgreSQL启动后有个UDP监听localhost - pgstat统计信息进程间通信》

《pg_stat_statements fwrite not save》

《PostgreSQL 9.5 new feature - use FSM fast evaluate bloat and other table level statics(pgstattuple_approx)》

《PostgreSQL aggregate function 2 : Aggregate Functions for Statistics》

《优化器成本因子校对(disk,ssd,memory IO开销精算) - PostgreSQL real seq_page_cost & random_page_cost in disks,ssd,memory》

《优化器成本因子校对 - PostgreSQL explain cost constants alignment to timestamp》

《PostgreSQL pg_stats used to estimate top N freps values and explain rows》

《执行计划选择算法 与 绑定变量 - PostgreSQL prepared statement: SPI_prepare, prepare|execute COMMAND, PL/pgsql STYLE: custom & generic plan cache》

《执行计划维度统计\判断执行计划翻转\统计每种执行计划的统计信息 use pg_stat_plans in PostgreSQL 9.0, 9.1 and 9.2》

《PostgreSQL 9.2 improve prepared statements plan's selectivity》

《PostgreSQL 9.2 add array elements statistics》

《WHY prepared Statement running slower in some situation CASE》

《PostgreSQL 优化器行评估算法》

那么,空间类型有合理的统计信息吗?

“目前空间数据类型,还没有柱状图,高频词等内容”,所以给操作符提供的选择性依据非常有限。

因此目前存在一个这样的问题,选择性函数内使用百分比硬编码,在代码中写死了选择性比例,那么空间包含查询(不管范围多大,评估出来的记录数是一样的),所以基本上都会选择空间索引扫描( 然而实际上当真实空间包含查询包含了大量记录时,全表扫描性能更好 )。

例子

1、建表(pos存储point数据),写入1000万记录数据,创建空间索引

create extension postgis;  
create table test (id int , info text, pos geometry);  
insert into test select generate_series(1,10000000), 'test', st_setsrid(st_makepoint(110+random()*20-10, 70+random()*10-5), 4326);  
create index idx_test_1 on test using gist(pos);  

2、收集统计信息

vacuum analyze test;  

你会发现空间类型的统计信息非常有限(没有高频词,柱状图,相关性等内容):

postgres=# select * from pg_stats where tablename='test' and attname='pos';  
 schemaname | tablename | attname | inherited | null_frac | avg_width | n_distinct | most_common_vals | most_common_freqs | histogram_bounds | correlation | most_common_elems | most_common_elem_freqs | elem_count_histogram   
------------+-----------+---------+-----------+-----------+-----------+------------+------------------+-------------------+------------------+-------------+-------------------+------------------------+----------------------  
 public     | test      | pos     | f         |         0 |        32 |         -1 |                  |                   |                  |             |                   |                        |   
(1 row)  

3、查看执行计划,查看评估出来的记录数。

空间类型选择性使用如下硬编码0.001。

src/backend/utils/adt/geo_selfuncs.c

/*
 *      contsel -- How likely is a box to contain (be contained by) a given box?
 *
 * This is a tighter constraint than "overlap", so produce a smaller
 * estimate than areasel does.
 */

Datum
contsel(PG_FUNCTION_ARGS)
{
        PG_RETURN_FLOAT8(0.001);
}

不管输入什么空间条件,评估得到的记录数都是千分之一:10000条

条件1:
postgres=# explain select * from test where st_contains(st_setsrid(st_makebox2d(st_makepoint(110,1), st_makepoint(112,2)), 4326) , pos);  
                                                  QUERY PLAN                                                                                                                    
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  
 Bitmap Heap Scan on test  (cost=149.95..12834.93 rows=3333 width=41)  
   Recheck Cond: ('0103000020E610000001000000050000000000000000805B40000000000000F03F0000000000805B4000000000000000400000000000005C4000000000000000400000000000005C40000000000000F03F0000000000805B40000000000000F03F'::geometry ~ pos)  
   Filter: _st_contains('0103000020E610000001000000050000000000000000805B40000000000000F03F0000000000805B4000000000000000400000000000005C4000000000000000400000000000005C40000000000000F03F0000000000805B40000000000000F03F'::geometry, pos)  
   ->  Bitmap Index Scan on idx_test_1  (cost=0.00..149.12 rows=10000 width=0)  
         Index Cond: ('0103000020E610000001000000050000000000000000805B40000000000000F03F0000000000805B4000000000000000400000000000005C4000000000000000400000000000005C40000000000000F03F0000000000805B40000000000000F03F'::geometry ~ pos)  
(5 rows)  
  
条件2:
postgres=# explain select * from test where st_contains(st_setsrid(st_makebox2d(st_makepoint(110,1), st_makepoint(120,100)), 4326) , pos);  
                                                 QUERY PLAN                                                                                                                    
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  
 Bitmap Heap Scan on test  (cost=149.95..12834.93 rows=3333 width=41)  
   Recheck Cond: ('0103000020E610000001000000050000000000000000805B40000000000000F03F0000000000805B4000000000000059400000000000005E4000000000000059400000000000005E40000000000000F03F0000000000805B40000000000000F03F'::geometry ~ pos)  
   Filter: _st_contains('0103000020E610000001000000050000000000000000805B40000000000000F03F0000000000805B4000000000000059400000000000005E4000000000000059400000000000005E40000000000000F03F0000000000805B40000000000000F03F'::geometry, pos)  
   ->  Bitmap Index Scan on idx_test_1  (cost=0.00..149.12 rows=10000 width=0)  
         Index Cond: ('0103000020E610000001000000050000000000000000805B40000000000000F03F0000000000805B4000000000000059400000000000005E4000000000000059400000000000005E40000000000000F03F0000000000805B40000000000000F03F'::geometry ~ pos)  
(5 rows)  

4、那么实际上的记录数是多少呢?

条件1:0条。

postgres=# explain analyze select count(*) from test where st_contains(st_setsrid(st_makebox2d(st_makepoint(110,1), st_makepoint(112,2)), 4326) , pos);  
                               QUERY PLAN    
-------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=12843.26..12843.27 rows=1 width=8) (actual time=0.015..0.015 rows=1 loops=1)  
   ->  Bitmap Heap Scan on test  (cost=149.95..12834.93 rows=3333 width=0) (actual time=0.012..0.012 rows=0 loops=1)  
         Recheck Cond: ('0103000020E610000001000000050000000000000000805B40000000000000F03F0000000000805B4000000000000000400000000000005C4000000000000000400000000000005C40000000000000F03F0000000000805B40000000000000F03F'::geometry ~ pos)  
         Filter: _st_contains('0103000020E610000001000000050000000000000000805B40000000000000F03F0000000000805B4000000000000000400000000000005C4000000000000000400000000000005C40000000000000F03F0000000000805B40000000000000F03F'::geometry, pos)  
         ->  Bitmap Index Scan on idx_test_1  (cost=0.00..149.12 rows=10000 width=0) (actual time=0.010..0.010 rows=0 loops=1)  
               Index Cond: ('0103000020E610000001000000050000000000000000805B40000000000000F03F0000000000805B4000000000000000400000000000005C4000000000000000400000000000005C40000000000000F03F0000000000805B40000000000000F03F'::geometry ~   
pos)  
 Planning time: 0.170 ms  
 Execution time: 0.061 ms  
(8 rows)  

条件2:5001107条。

postgres=# explain analyze select count(*) from test where st_contains(st_setsrid(st_makebox2d(st_makepoint(110,1), st_makepoint(120,100)), 4326) , pos);  
                                QUERY PLAN    
----------------------------------------------------------------------------------------------------------------------------  
 Aggregate  (cost=12843.26..12843.27 rows=1 width=8) (actual time=5057.822..5057.822 rows=1 loops=1)  
   ->  Bitmap Heap Scan on test  (cost=149.95..12834.93 rows=3333 width=0) (actual time=507.744..4537.564 rows=5001107 loops=1)  
         Recheck Cond: ('0103000020E610000001000000050000000000000000805B40000000000000F03F0000000000805B4000000000000059400000000000005E4000000000000059400000000000005E40000000000000F03F0000000000805B40000000000000F03F'::geometry ~ pos)  
         Rows Removed by Index Recheck: 1981986  
         Filter: _st_contains('0103000020E610000001000000050000000000000000805B40000000000000F03F0000000000805B4000000000000059400000000000005E4000000000000059400000000000005E40000000000000F03F0000000000805B40000000000000F03F'::geometry,  
 pos)  
         Heap Blocks: exact=50167 lossy=33167  
         ->  Bitmap Index Scan on idx_test_1  (cost=0.00..149.12 rows=10000 width=0) (actual time=499.627..499.627 rows=5001107 loops=1)  
               Index Cond: ('0103000020E610000001000000050000000000000000805B40000000000000F03F0000000000805B4000000000000059400000000000005E4000000000000059400000000000005E40000000000000F03F0000000000805B40000000000000F03F'::geometry ~   
pos)  
 Planning time: 0.179 ms  
 Execution time: 5057.882 ms  
(10 rows)  

5、空间类型的选择性不准确,导致的问题,不管什么情况都走空间索引。

5.1 大范围搜索,使用索引反而更慢,如下。

postgres=# explain analyze select count(*) from test where st_contains(st_setsrid(st_makebox2d(st_makepoint(110,1), st_makepoint(120,100)), 4326) , pos);  
                                        QUERY PLAN  
-------------------------------------------------------------------------------------------------------  
 Finalize Aggregate  (cost=10539.13..10539.14 rows=1 width=8) (actual time=1079.698..1079.699 rows=1 loops=1)  
   ->  Gather  (cost=10539.10..10539.11 rows=8 width=8) (actual time=1079.655..1079.693 rows=9 loops=1)  
         Workers Planned: 8  
         Workers Launched: 8  
         ->  Partial Aggregate  (cost=10539.10..10539.11 rows=1 width=8) (actual time=1049.984..1049.984 rows=1 loops=9)  
               ->  Parallel Bitmap Heap Scan on test  (cost=149.95..10538.06 rows=417 width=0) (actual time=483.733..991.774 rows=555679 loops=9)  
                     Recheck Cond: ('0103000020E610000001000000050000000000000000805B40000000000000F03F0000000000805B4000000000000059400000000000005E4000000000000059400000000000005E40000000000000F03F0000000000805B40000000000000F03F'::geo  metry ~ pos)  
                     Rows Removed by Index Recheck: 220221  
                     Filter: _st_contains('0103000020E610000001000000050000000000000000805B40000000000000F03F0000000000805B4000000000000059400000000000005E4000000000000059400000000000005E40000000000000F03F0000000000805B40000000000000F03F'::geometry, pos)  
                     Heap Blocks: exact=5908 lossy=3936  
                     ->  Bitmap Index Scan on idx_test_1  (cost=0.00..149.12 rows=10000 width=0) (actual time=502.790..502.790 rows=5001107 loops=1)  
                           Index Cond: ('0103000020E610000001000000050000000000000000805B40000000000000F03F0000000000805B4000000000000059400000000000005E4000000000000059400000000000005E40000000000000F03F0000000000805B40000000000000F03F'::geometry ~ pos)  
 Planning time: 0.171 ms  
 Execution time: 1087.890 ms  
(14 rows)  
  
  
postgres=# explain analyze select count(*) from test where st_contains(st_setsrid(st_makebox2d(st_makepoint(110,1), st_makepoint(120,100)), 4326) , pos);  
                                  QUERY PLAN           
-------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=411456.73..411456.74 rows=1 width=8) (actual time=674.853..674.853 rows=1 loops=1)  
   ->  Gather  (cost=411456.70..411456.71 rows=8 width=8) (actual time=674.793..674.848 rows=9 loops=1)  
         Workers Planned: 8  
         Workers Launched: 8  
         ->  Partial Aggregate  (cost=411456.70..411456.71 rows=1 width=8) (actual time=644.627..644.627 rows=1 loops=9)  
               ->  Parallel Seq Scan on test  (cost=0.00..411455.65 rows=417 width=0) (actual time=0.045..586.545 rows=555679 loops=9)  
                     Filter: (('0103000020E610000001000000050000000000000000805B40000000000000F03F0000000000805B4000000000000059400000000000005E4000000000000059400000000000005E40000000000000F03F0000000000805B40000000000000F03F'::geometry   ~ pos) AND _st_contains('0103000020E610000001000000050000000000000000805B40000000000000F03F0000000000805B4000000000000059400000000000005E4000000000000059400000000000005E40000000000000F03F0000000000805B40000000000000F03F'::geometry, pos)  )  
                     Rows Removed by Filter: 555433  
 Planning time: 0.130 ms  
 Execution time: 683.011 ms  
(10 rows)  

优化方法

首先我们知道空间包含,实际上是通过类似r-tree来从大到小进行收敛的,索引结构如下:

《PostgreSQL 黑科技 - 空间聚集存储, 内窥GIN, GiST, SP-GiST索引》

《通过空间思想理解GiST索引的构造》

因此,对于“平面或3D空间点”类型,我们可以利用x,y,z来收集BOUND BOX的边界统计信息,使用表达式索引即可得到。

下面的方法适合POINT geometry类型。

1、对point类型,创建x,y表达式索引

create index idx_test_2 on test (st_x(pos));  
create index idx_test_3 on test (st_y(pos));  

2、收集统计信息

vacuum analyze test;  

3、现在,表达式索引的统计信息有了(边界,柱状图,高频词等).

postgres=# select * from pg_stats where tablename='idx_test_2' ;  
 schemaname | tablename  | attname | inherited | null_frac | avg_width | n_distinct |  most_common_vals  | most_common_freqs |                                                                                                                 
                                                                                                                                                                                                                                               
                                                                                                                                                                                                                                               
                                                                                                                                                                                                                                               
                        histogram_bounds                                                                                                                                                                                                       
                                                                                                                                                                                                                                               
                                                                                                                                                                                                                                               
                                                                                                                                                                               | correlation | most_common_elems | most_common_elem_freqs | e  
lem_count_histogram   
------------+------------+---------+-----------+-----------+-----------+------------+--------------------+-------------------+---------------------------------------------------------------------------------------------------------------  
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------+-------------------+------------------------+--  
--------------------  
 public     | idx_test_2 | st_x    | f         |         0 |         8 |  -0.978292 | {100.210125753656} | {6.66667e-05}     | {100.001683048904,100.198292508721,100.394482128322,100.614011939615,100.833150474355,101.039264379069,101.233  
114106581,101.457025809214,101.663421457633,101.865276163444,102.060301089659,102.248072810471,102.453578868881,102.656150059775,102.860928429291,103.057968365029,103.260449869558,103.457426037639,103.632223745808,103.85178135708,104.049  
411769956,104.26340050064,104.474113518372,104.658453594893,104.85337683931,105.054327072576,105.277718435973,105.484521845356,105.703673372045,105.907790735364,106.115896645933,106.314396839589,106.519666947424,106.724081514403,106.9380  
68913296,107.144253859296,107.326071513817,107.518919138238,107.721077715978,107.916374253109,108.123016441241,108.360300129279,108.551258808002,108.714984534308,108.903268156573,109.091228945181,109.277053307742,109.48098176159,109.6832  
09387586,109.853478074074,110.044501451775,110.241627292708,110.45505293645,110.652585113421,110.851763477549,111.036027139053,111.231312695891,111.423934968188,111.620921371505,111.847442938015,112.031054906547,112.234003236517,112.4332  
3944509,112.63793184422,112.834744984284,113.041293732822,113.23037719354,113.432592023164,113.619993180037,113.809020379558,114.014157289639,114.208164261654,114.409972019494,114.612494371831,114.827286116779,115.010247323662,115.208249  
371499,115.397510435432,115.607831152156,115.814645215869,116.007154844701,116.198594048619,116.39090036042,116.600214680657,116.819634130225,117.008039858192,117.199446037412,117.389995325357,117.60263632983,117.81475706026,118.02150100  
4696,118.219726895913,118.425922412425,118.623819025233,118.801201758906,118.997462140396,119.18357164599,119.394188923761,119.607428628951,119.789487998933,119.999485854059} |   0.0060246 |                   |                        |   
(1 row)  
  
postgres=# select * from pg_stats where tablename='idx_test_3' ;  
 schemaname | tablename  | attname | inherited | null_frac | avg_width | n_distinct | most_common_vals | most_common_freqs |                                                                                                                   
                                                                                                                                                                                                                                               
                                                                                                                                                                                                                                               
                                                                                                                                                                                                                                               
                   histogram_bounds                                                                                                                                                                                                            
                                                                                                                                                                                                                                               
                                                                                                                                                                                                                                               
                                                                                                                                                                        | correlation | most_common_elems | most_common_elem_freqs | elem_cou  
nt_histogram   
------------+------------+---------+-----------+-----------+-----------+------------+------------------+-------------------+-----------------------------------------------------------------------------------------------------------------  
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  
------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------+-------------------+------------------------+---------  
-------------  
 public     | idx_test_3 | st_y    | f         |         0 |         8 |         -1 |                  |                   | {65.0000545242801,65.0990079157054,65.2072919439524,65.301208673045,65.3929488640279,65.4877291806042,65.5846159  
020439,65.6880339607596,65.7856895681471,65.8818093780428,65.9888388821855,66.0896951518953,66.1984446132556,66.304198782891,66.3969474285841,66.496685273014,66.6056828852743,66.7012783605605,66.7941406555474,66.8967378046364,66.98895718  
90235,67.0855156239122,67.191197341308,67.2994636883959,67.3969272850081,67.4947946099564,67.5993846775964,67.7099830284715,67.8120812214911,67.9088703868911,67.9981751134619,68.099527056329,68.1973828841001,68.2960116351023,68.398535256  
274,68.4888928988948,68.5882083000615,68.6907122470438,68.800939405337,68.9010692993179,69.0081828692928,69.11032628268,69.19522870332,69.2957087606192,69.3950105085969,69.5086302934214,69.6040614042431,69.7045981185511,69.8040427314118,  
69.906362099573,70.0155263161287,70.1146272942424,70.2022811910138,70.2994426619262,70.4040612280369,70.5112449126318,70.6194959674031,70.722281485796,70.822925157845,70.918908463791,71.0138623649254,71.1134587181732,71.2168908445165,71.  
3034919975325,71.4052765490487,71.5016098646447,71.6014923620969,71.6971560986713,71.8042277451605,71.8970008520409,71.9877171749249,72.0928315026686,72.1963384188712,72.2967635607347,72.3947894759476,72.4977402808145,72.5967703945935,72  
.6985206454992,72.7883456554264,72.885032473132,72.9875696543604,73.0857636081055,73.1845043040812,73.280454329215,73.3783990284428,73.4819683339447,73.5874625109136,73.6941803013906,73.7924668611959,73.8872256595641,73.9888873603195,74.  
0806347271428,74.1819418873638,74.2877806117758,74.3848745618016,74.4918680656701,74.5988712925464,74.7014277381822,74.8038458405063,74.9038722459227,74.9999207118526} | -0.00307583 |                   |                        |   
(1 row)  

现在我们改一下SQL,把boundbox放进SQL

select count(*) from test   
where   
st_contains(st_setsrid(st_makebox2d(st_makepoint(110,1), st_makepoint(112,2)), 4326) , pos);  

改成:

select count(*) from test   
where   
st_x(pos)   
  between  
    st_xmin(box2d(st_setsrid(st_makebox2d(st_makepoint(110,1), st_makepoint(120,100)), 4326)))  
  and  
    st_xmax(box2d(st_setsrid(st_makebox2d(st_makepoint(110,1), st_makepoint(120,100)), 4326)))  
and  
st_y(pos)   
  between  
    st_ymin(box2d(st_setsrid(st_makebox2d(st_makepoint(110,1), st_makepoint(120,100)), 4326)))  
  and  
    st_ymax(box2d(st_setsrid(st_makebox2d(st_makepoint(110,1), st_makepoint(120,100)), 4326)))  
and  
st_contains(st_setsrid(st_makebox2d(st_makepoint(110,1), st_makepoint(120,100)), 4326) , pos);  

执行计划如下:

在大范围输入时,通过表达式索引条件评估得到的行数比空间编码大于空间评估(硬编码 千分之一)得到的记录数,所以依旧使用了空间索引。

postgres=# explain select count(*) from test   
where   
st_x(pos)   
  between  
    st_xmin(box2d(st_setsrid(st_makebox2d(st_makepoint(110,1), st_makepoint(120,100)), 4326)))  
  and  
    st_xmax(box2d(st_setsrid(st_makebox2d(st_makepoint(110,1), st_makepoint(120,100)), 4326)))  
and  
st_y(pos)   
  between  
    st_ymin(box2d(st_setsrid(st_makebox2d(st_makepoint(110,1), st_makepoint(120,100)), 4326)))  
  and  
    st_ymax(box2d(st_setsrid(st_makebox2d(st_makepoint(110,1), st_makepoint(120,100)), 4326)))  
and  
st_contains(st_setsrid(st_makebox2d(st_makepoint(110,1), st_makepoint(120,100)), 4326) , pos);  
                       QUERY PLAN                                                                                                                                        
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=12380.60..12380.61 rows=1 width=8)  
   ->  Index Scan using idx_test_1 on test  (cost=0.42..12376.42 rows=1674 width=0)  
         Index Cond: ('0103000020E610000001000000050000000000000000805B40000000000000F03F0000000000805B4000000000000059400000000000005E4000000000000059400000000000005E40000000000000F03F0000000000805B40000000000000F03F'::geometry ~ pos)  
         Filter: ((st_x(pos) >= '110'::double precision) AND (st_x(pos) <= '120'::double precision) AND (st_y(pos) >= '1'::double precision) AND (st_y(pos) <= '100'::double precision) AND _st_contains('0103000020E610000001000000050000000  000000000805B40000000000000F03F0000000000805B4000000000000059400000000000005E4000000000000059400000000000005E40000000000000F03F0000000000805B40000000000000F03F'::geometry, pos))  
(4 rows)  

在小范围输入时,评估得到的行数比空间编码更少(千分之一),所以使用了非空间索引。

explain select count(*) from test   
where   
st_x(pos)   
  between  
    st_xmin(box2d(st_setsrid(st_makebox2d(st_makepoint(110,1), st_makepoint(112,2)), 4326)))  
  and  
    st_xmax(box2d(st_setsrid(st_makebox2d(st_makepoint(110,1), st_makepoint(112,2)), 4326)))  
and  
st_y(pos)   
  between  
    st_ymin(box2d(st_setsrid(st_makebox2d(st_makepoint(110,1), st_makepoint(112,2)), 4326)))  
  and  
    st_ymax(box2d(st_setsrid(st_makebox2d(st_makepoint(110,1), st_makepoint(112,2)), 4326)))  
and  
st_contains(st_setsrid(st_makebox2d(st_makepoint(110,1), st_makepoint(112,2)), 4326) , pos);  
  
   QUERY PLAN                                
----------------------------------------------------------------------------------------------
 Aggregate  (cost=2.92..2.93 rows=1 width=8)  
   ->  Index Scan using idx_test_3 on test  (cost=0.43..2.92 rows=1 width=0)  
         Index Cond: ((st_y(pos) >= '1'::double precision) AND (st_y(pos) <= '2'::double precision))  
         Filter: (('0103000020E610000001000000050000000000000000805B40000000000000F03F0000000000805B4000000000000000400000000000005C4000000000000000400000000000005C40000000000000F03F0000000000805B40000000000000F03F'::geometry ~ pos) AND   
(st_x(pos) >= '110'::double precision) AND (st_x(pos) <= '112'::double precision) AND _st_contains('0103000020E610000001000000050000000000000000805B40000000000000F03F0000000000805B4000000000000000400000000000005C4000000000000000400000000  
000005C40000000000000F03F0000000000805B40000000000000F03F'::geometry, pos))  
(4 rows)  

一些函数

查看geometry的bound box

postgres=# select st_astext(box2d(st_setsrid(st_makebox2d(st_makepoint(110,1), st_makepoint(112,2)), 4326) ));  
                st_astext                   
------------------------------------------  
 POLYGON((110 1,110 2,112 2,112 1,110 1))  
(1 row)  
postgres=# select st_astext(box2d(ST_GeomFromEWKT('CIRCULARSTRING(220268 150415 1,220227 150505 2,220227 150406 3)')));  
                                                                       st_astext                                                                          
--------------------------------------------------------------------------------------------------------------------------------------------------------  
 POLYGON((220186.995121892 150406,220186.995121892 150506.126829327,220288.248780547 150506.126829327,220288.248780547 150406,220186.995121892 150406))  
(1 row)  

2、查看BOUND BOX的边界点x,y.

postgres=# select st_xmin(box2d(ST_GeomFromEWKT('CIRCULARSTRING(220268 150415 1,220227 150505 2,220227 150406 3)')));  
     st_xmin        
------------------  
 220186.995121892  
(1 row)  
  
postgres=# select st_xmax(box2d(ST_GeomFromEWKT('CIRCULARSTRING(220268 150415 1,220227 150505 2,220227 150406 3)')));  
     st_xmax        
------------------  
 220288.248780547  
(1 row)  
  
postgres=# select st_ymin(box2d(ST_GeomFromEWKT('CIRCULARSTRING(220268 150415 1,220227 150505 2,220227 150406 3)')));  
 st_ymin   
---------  
  150406  
(1 row)  
  
postgres=# select st_ymax(box2d(ST_GeomFromEWKT('CIRCULARSTRING(220268 150415 1,220227 150505 2,220227 150406 3)')));  
     st_ymax        
------------------  
 150506.126829327  
(1 row)  

小结

目前空间类型的bound box没有统计信息,所以空间查询时,选择性使用了硬编码写死,固定了选择性百分比,乘以pg_class.reltuples即可得到评估记录数,但是不准确。

使得PostgreSQL的空间搜索(包含搜索),不管范围多大,都使用索引,而实际上大范围可能使用全表扫描性能更好。

本文增加boundbox的条件,使用xyz边界条件,使得行数评估更加的准确,在适当的时候选择适合的索引。

当然,最好的方法,还是PG内核层面优化空间类型的统计信息,以及选择性代码的支持。

参考

建议,PostgreSQL内核层面改进空间类型的统计信息,点类型,分别统计X,Y轴的统计信息。

《PostgreSQL 自定义函数表达式选择性评估算法 - Statistics, Cardinality, Selectivity, Estimate》

相关实践学习
使用PolarDB和ECS搭建门户网站
本场景主要介绍基于PolarDB和ECS实现搭建门户网站。
阿里云数据库产品家族及特性
阿里云智能数据库产品团队一直致力于不断健全产品体系,提升产品性能,打磨产品功能,从而帮助客户实现更加极致的弹性能力、具备更强的扩展能力、并利用云设施进一步降低企业成本。以云原生+分布式为核心技术抓手,打造以自研的在线事务型(OLTP)数据库Polar DB和在线分析型(OLAP)数据库Analytic DB为代表的新一代企业级云原生数据库产品体系, 结合NoSQL数据库、数据库生态工具、云原生智能化数据库管控平台,为阿里巴巴经济体以及各个行业的企业客户和开发者提供从公共云到混合云再到私有云的完整解决方案,提供基于云基础设施进行数据从处理、到存储、再到计算与分析的一体化解决方案。本节课带你了解阿里云数据库产品家族及特性。
相关文章
|
2月前
|
SQL 关系型数据库 数据库
实时计算 Flink版操作报错之使用SQL 将 PostgreSQL 的 date 类型字段转换为 TIMESTAMP 类型时遇到报错,该如何处理
在使用实时计算Flink版过程中,可能会遇到各种错误,了解这些错误的原因及解决方法对于高效排错至关重要。针对具体问题,查看Flink的日志是关键,它们通常会提供更详细的错误信息和堆栈跟踪,有助于定位问题。此外,Flink社区文档和官方论坛也是寻求帮助的好去处。以下是一些常见的操作报错及其可能的原因与解决策略。
|
1天前
|
存储 关系型数据库 分布式数据库
PolarDB产品使用问题之如何查看PolarDB for PostgreSQL的备份信息
PolarDB产品使用合集涵盖了从创建与管理、数据管理、性能优化与诊断、安全与合规到生态与集成、运维与支持等全方位的功能和服务,旨在帮助企业轻松构建高可用、高性能且易于管理的数据库环境,满足不同业务场景的需求。用户可以通过阿里云控制台、API、SDK等方式便捷地使用这些功能,实现数据库的高效运维与持续优化。
|
2天前
|
SQL
为什么使用odps.sql.groupby.skewindata = true优化后,逻辑执行计划会发生改变
为什么使用odps.sql.groupby.skewindata = true优化后,逻辑执行计划会发生改变
|
28天前
|
SQL 运维 安全
数据管理DMS产品使用合集之执行SQL时,如何添加Hint来改变查询的执行计划
阿里云数据管理DMS提供了全面的数据管理、数据库运维、数据安全、数据迁移与同步等功能,助力企业高效、安全地进行数据库管理和运维工作。以下是DMS产品使用合集的详细介绍。
31 1
|
1月前
|
SQL 关系型数据库 数据库
nacos 2.2.3版本 查看配置文件的历史版本的接口 是针对MySQL数据库的sql 改成postgresql后 sql语句报错 该怎么解决
在Nacos 2.2.3中切换到PostgreSQL后,执行配置文件历史版本分页查询出错,因`LIMIT 0, 10`语法不被PostgreSQL支持,需改为`LIMIT 10 OFFSET 0`。仅当存在历史版本时报错。解决方案是调整查询SQL以兼容PostgreSQL语法。
|
1月前
|
SQL 关系型数据库 MySQL
MySQL数据库——索引(4)-SQL性能分析-profile详情、explain(profile查看指令,explain执行计划中各个字段的含义)
MySQL数据库——索引(4)-SQL性能分析-profile详情、explain(profile查看指令,explain执行计划中各个字段的含义)
32 2
|
29天前
|
自然语言处理 关系型数据库 数据库
技术经验解读:【转】PostgreSQL的FTI(TSearch)与中文全文索引的实践
技术经验解读:【转】PostgreSQL的FTI(TSearch)与中文全文索引的实践
13 0
|
2月前
|
SQL BI HIVE
【Hive SQL 每日一题】统计用户留存率
用户留存率是衡量产品成功的关键指标,表示用户在特定时间内持续使用产品的比例。计算公式为留存用户数除以初始用户数。例如,游戏发行后第一天有10000玩家,第七天剩5000人,第一周留存率为50%。提供的SQL代码展示了如何根据用户活动数据统计每天的留存率。需求包括计算系统上线后的每日留存率,以及从第一天开始的累计N日留存率。通过窗口函数`LAG`和`COUNT(DISTINCT user_id)`,可以有效地分析用户留存趋势。
|
1月前
|
SQL 关系型数据库 PostgreSQL
【sql】PostgreSQL物化视图表使用案例
【sql】PostgreSQL物化视图表使用案例
22 0
|
2月前
|
SQL HIVE
【Hive SQL 每日一题】统计用户连续下单的日期区间
该SQL代码用于统计用户连续下单的日期区间。首先按`user_id`和`order_date`分组并去除重复,然后使用`row_number()`标记行号,并通过`date_sub`与行号计算潜在的连续日期。接着按用户ID和计算后的日期分组,排除连续订单数少于2的情况,最后提取连续下单的起始和结束日期。输出结果展示了用户连续下单的日期范围。

相关产品

  • 云原生数据库 PolarDB