PostgreSQL 空间类型统计信息(Statistics, Cardinality, Selectivity, Estimate)不准确导致SQL执行计划不准(包含、相交查询)的优化实践

本文涉及的产品
云原生数据库 PolarDB MySQL 版,Serverless 5000PCU 100GB
云原生数据库 PolarDB 分布式版,标准版 2核8GB
云数据库 RDS MySQL Serverless,0.5-2RCU 50GB
简介:

标签

PostgreSQL , 空间类型 , 统计信息 , 包含查询 , 相交查询 , BOUND , index , x , y


背景

PostgreSQL 通过几何类型以及PostGIS插件,支持丰富的空间类型,空间数据类型包括KNN检索,空间包含,空间相交,方位搜索,空间计算等操作。

为了得到正确的执行计划,在数据库优化器中,评估满足某个操作符条件的行数,是一个重要的CBO依据。

而评估选择性又依赖统计信息,同时依赖每种操作符的选择性算法。

一些优化器相关文档:

《PostgreSQL 自定义函数表达式选择性评估算法 - Statistics, Cardinality, Selectivity, Estimate》

《PostgreSQL 多值列的选择性 - Statistics, Cardinality, Selectivity, Estimate》

《PostgreSQL 11 preview - 表达式索引柱状图buckets\STATISTICS\default_statistics_target可设置》

《PostgreSQL 11 preview - 优化器 增强 汇总》

《PostgreSQL 11 preview - 强制auto prepared statment开关(自动化plan cache)(类似Oracle cursor_sharing force)》

《[未完待续] PostgreSQL PRO 特性 - AQO(机器学习执行计划优化器)》

《PostgreSQL pg_stat_reset清除track_counts的隐患》

《PostgreSQL 统计信息pg_statistic格式及导入导出dump_stat - 兼容Oracle》

《优化器里的概率学 - 性能抖动原理分析》

《懒人推动社会进步 - 多列聚合, gin与数据分布(选择性)》

《数据库优化器原理(含动态规划、机器学习建模优化器aqo) - 如何治疗选择综合症》

《PostgreSQL 10.0 preview 功能增强 - SQL执行剩余时间 - 垃圾回收过程可视pg_stat_progress_vacuum》

《PostgreSQL 10.0 preview 性能增强 - hash,nestloop join优化(聪明的优化器是这样的)》

《PostgreSQL 10.0 preview 功能增强 - 动态视图pg_stat_activity新增数据库管理进程信息》

《PostgreSQL 10.0 preview 功能增强 - 流复制统计视图pg_stat_replication增强, 新增时间维度延迟评估》

《PostgreSQL 10.0 preview 优化器改进 - 不完整索引支持复合排序》

《聊一下PostgreSQL优化器 - in里面有重复值时PostgreSQL如何处理?》

《官人要杯咖啡吗? - PostgreSQL实时监测PLAN tree的执行进度 - pg_query_state》

《PostgreSQL pg_stat_ pg_statio_ 统计信息(scan,read,fetch,hit)源码解读》

《PostgreSQL 9.6 并行计算 优化器算法浅析 - 以及如何强制并行度》

《PostgreSQL 9.6 并行计算 优化器算法浅析》

《关键时刻HINT出彩 - PG优化器的参数优化、执行计划固化CASE》

《Greenplum ORCA 优化器的编译安装与使用》

《PostgreSQL prepared statement和simple query的profile及性能差异》

《PostgreSQL 优化器逻辑推理能力 源码解析》

《PostgreSQL 操作符与优化器详解 - 包含(选择性、JOIN方法、等效)等内容》

《PostgreSQL pg_stat_replication sent_location, write_location, flush_location, replay_location的差别》

《为什么PostgreSQL启动后有个UDP监听localhost - pgstat统计信息进程间通信》

《pg_stat_statements fwrite not save》

《PostgreSQL 9.5 new feature - use FSM fast evaluate bloat and other table level statics(pgstattuple_approx)》

《PostgreSQL aggregate function 2 : Aggregate Functions for Statistics》

《优化器成本因子校对(disk,ssd,memory IO开销精算) - PostgreSQL real seq_page_cost & random_page_cost in disks,ssd,memory》

《优化器成本因子校对 - PostgreSQL explain cost constants alignment to timestamp》

《PostgreSQL pg_stats used to estimate top N freps values and explain rows》

《执行计划选择算法 与 绑定变量 - PostgreSQL prepared statement: SPI_prepare, prepare|execute COMMAND, PL/pgsql STYLE: custom & generic plan cache》

《执行计划维度统计\判断执行计划翻转\统计每种执行计划的统计信息 use pg_stat_plans in PostgreSQL 9.0, 9.1 and 9.2》

《PostgreSQL 9.2 improve prepared statements plan's selectivity》

《PostgreSQL 9.2 add array elements statistics》

《WHY prepared Statement running slower in some situation CASE》

《PostgreSQL 优化器行评估算法》

那么,空间类型有合理的统计信息吗?

“目前空间数据类型,还没有柱状图,高频词等内容”,所以给操作符提供的选择性依据非常有限。

因此目前存在一个这样的问题,选择性函数内使用百分比硬编码,在代码中写死了选择性比例,那么空间包含查询(不管范围多大,评估出来的记录数是一样的),所以基本上都会选择空间索引扫描( 然而实际上当真实空间包含查询包含了大量记录时,全表扫描性能更好 )。

例子

1、建表(pos存储point数据),写入1000万记录数据,创建空间索引

create extension postgis;  
create table test (id int , info text, pos geometry);  
insert into test select generate_series(1,10000000), 'test', st_setsrid(st_makepoint(110+random()*20-10, 70+random()*10-5), 4326);  
create index idx_test_1 on test using gist(pos);  

2、收集统计信息

vacuum analyze test;  

你会发现空间类型的统计信息非常有限(没有高频词,柱状图,相关性等内容):

postgres=# select * from pg_stats where tablename='test' and attname='pos';  
 schemaname | tablename | attname | inherited | null_frac | avg_width | n_distinct | most_common_vals | most_common_freqs | histogram_bounds | correlation | most_common_elems | most_common_elem_freqs | elem_count_histogram   
------------+-----------+---------+-----------+-----------+-----------+------------+------------------+-------------------+------------------+-------------+-------------------+------------------------+----------------------  
 public     | test      | pos     | f         |         0 |        32 |         -1 |                  |                   |                  |             |                   |                        |   
(1 row)  

3、查看执行计划,查看评估出来的记录数。

空间类型选择性使用如下硬编码0.001。

src/backend/utils/adt/geo_selfuncs.c

/*
 *      contsel -- How likely is a box to contain (be contained by) a given box?
 *
 * This is a tighter constraint than "overlap", so produce a smaller
 * estimate than areasel does.
 */

Datum
contsel(PG_FUNCTION_ARGS)
{
        PG_RETURN_FLOAT8(0.001);
}

不管输入什么空间条件,评估得到的记录数都是千分之一:10000条

条件1:
postgres=# explain select * from test where st_contains(st_setsrid(st_makebox2d(st_makepoint(110,1), st_makepoint(112,2)), 4326) , pos);  
                                                  QUERY PLAN                                                                                                                    
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  
 Bitmap Heap Scan on test  (cost=149.95..12834.93 rows=3333 width=41)  
   Recheck Cond: ('0103000020E610000001000000050000000000000000805B40000000000000F03F0000000000805B4000000000000000400000000000005C4000000000000000400000000000005C40000000000000F03F0000000000805B40000000000000F03F'::geometry ~ pos)  
   Filter: _st_contains('0103000020E610000001000000050000000000000000805B40000000000000F03F0000000000805B4000000000000000400000000000005C4000000000000000400000000000005C40000000000000F03F0000000000805B40000000000000F03F'::geometry, pos)  
   ->  Bitmap Index Scan on idx_test_1  (cost=0.00..149.12 rows=10000 width=0)  
         Index Cond: ('0103000020E610000001000000050000000000000000805B40000000000000F03F0000000000805B4000000000000000400000000000005C4000000000000000400000000000005C40000000000000F03F0000000000805B40000000000000F03F'::geometry ~ pos)  
(5 rows)  
  
条件2:
postgres=# explain select * from test where st_contains(st_setsrid(st_makebox2d(st_makepoint(110,1), st_makepoint(120,100)), 4326) , pos);  
                                                 QUERY PLAN                                                                                                                    
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  
 Bitmap Heap Scan on test  (cost=149.95..12834.93 rows=3333 width=41)  
   Recheck Cond: ('0103000020E610000001000000050000000000000000805B40000000000000F03F0000000000805B4000000000000059400000000000005E4000000000000059400000000000005E40000000000000F03F0000000000805B40000000000000F03F'::geometry ~ pos)  
   Filter: _st_contains('0103000020E610000001000000050000000000000000805B40000000000000F03F0000000000805B4000000000000059400000000000005E4000000000000059400000000000005E40000000000000F03F0000000000805B40000000000000F03F'::geometry, pos)  
   ->  Bitmap Index Scan on idx_test_1  (cost=0.00..149.12 rows=10000 width=0)  
         Index Cond: ('0103000020E610000001000000050000000000000000805B40000000000000F03F0000000000805B4000000000000059400000000000005E4000000000000059400000000000005E40000000000000F03F0000000000805B40000000000000F03F'::geometry ~ pos)  
(5 rows)  

4、那么实际上的记录数是多少呢?

条件1:0条。

postgres=# explain analyze select count(*) from test where st_contains(st_setsrid(st_makebox2d(st_makepoint(110,1), st_makepoint(112,2)), 4326) , pos);  
                               QUERY PLAN    
-------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=12843.26..12843.27 rows=1 width=8) (actual time=0.015..0.015 rows=1 loops=1)  
   ->  Bitmap Heap Scan on test  (cost=149.95..12834.93 rows=3333 width=0) (actual time=0.012..0.012 rows=0 loops=1)  
         Recheck Cond: ('0103000020E610000001000000050000000000000000805B40000000000000F03F0000000000805B4000000000000000400000000000005C4000000000000000400000000000005C40000000000000F03F0000000000805B40000000000000F03F'::geometry ~ pos)  
         Filter: _st_contains('0103000020E610000001000000050000000000000000805B40000000000000F03F0000000000805B4000000000000000400000000000005C4000000000000000400000000000005C40000000000000F03F0000000000805B40000000000000F03F'::geometry, pos)  
         ->  Bitmap Index Scan on idx_test_1  (cost=0.00..149.12 rows=10000 width=0) (actual time=0.010..0.010 rows=0 loops=1)  
               Index Cond: ('0103000020E610000001000000050000000000000000805B40000000000000F03F0000000000805B4000000000000000400000000000005C4000000000000000400000000000005C40000000000000F03F0000000000805B40000000000000F03F'::geometry ~   
pos)  
 Planning time: 0.170 ms  
 Execution time: 0.061 ms  
(8 rows)  

条件2:5001107条。

postgres=# explain analyze select count(*) from test where st_contains(st_setsrid(st_makebox2d(st_makepoint(110,1), st_makepoint(120,100)), 4326) , pos);  
                                QUERY PLAN    
----------------------------------------------------------------------------------------------------------------------------  
 Aggregate  (cost=12843.26..12843.27 rows=1 width=8) (actual time=5057.822..5057.822 rows=1 loops=1)  
   ->  Bitmap Heap Scan on test  (cost=149.95..12834.93 rows=3333 width=0) (actual time=507.744..4537.564 rows=5001107 loops=1)  
         Recheck Cond: ('0103000020E610000001000000050000000000000000805B40000000000000F03F0000000000805B4000000000000059400000000000005E4000000000000059400000000000005E40000000000000F03F0000000000805B40000000000000F03F'::geometry ~ pos)  
         Rows Removed by Index Recheck: 1981986  
         Filter: _st_contains('0103000020E610000001000000050000000000000000805B40000000000000F03F0000000000805B4000000000000059400000000000005E4000000000000059400000000000005E40000000000000F03F0000000000805B40000000000000F03F'::geometry,  
 pos)  
         Heap Blocks: exact=50167 lossy=33167  
         ->  Bitmap Index Scan on idx_test_1  (cost=0.00..149.12 rows=10000 width=0) (actual time=499.627..499.627 rows=5001107 loops=1)  
               Index Cond: ('0103000020E610000001000000050000000000000000805B40000000000000F03F0000000000805B4000000000000059400000000000005E4000000000000059400000000000005E40000000000000F03F0000000000805B40000000000000F03F'::geometry ~   
pos)  
 Planning time: 0.179 ms  
 Execution time: 5057.882 ms  
(10 rows)  

5、空间类型的选择性不准确,导致的问题,不管什么情况都走空间索引。

5.1 大范围搜索,使用索引反而更慢,如下。

postgres=# explain analyze select count(*) from test where st_contains(st_setsrid(st_makebox2d(st_makepoint(110,1), st_makepoint(120,100)), 4326) , pos);  
                                        QUERY PLAN  
-------------------------------------------------------------------------------------------------------  
 Finalize Aggregate  (cost=10539.13..10539.14 rows=1 width=8) (actual time=1079.698..1079.699 rows=1 loops=1)  
   ->  Gather  (cost=10539.10..10539.11 rows=8 width=8) (actual time=1079.655..1079.693 rows=9 loops=1)  
         Workers Planned: 8  
         Workers Launched: 8  
         ->  Partial Aggregate  (cost=10539.10..10539.11 rows=1 width=8) (actual time=1049.984..1049.984 rows=1 loops=9)  
               ->  Parallel Bitmap Heap Scan on test  (cost=149.95..10538.06 rows=417 width=0) (actual time=483.733..991.774 rows=555679 loops=9)  
                     Recheck Cond: ('0103000020E610000001000000050000000000000000805B40000000000000F03F0000000000805B4000000000000059400000000000005E4000000000000059400000000000005E40000000000000F03F0000000000805B40000000000000F03F'::geo  metry ~ pos)  
                     Rows Removed by Index Recheck: 220221  
                     Filter: _st_contains('0103000020E610000001000000050000000000000000805B40000000000000F03F0000000000805B4000000000000059400000000000005E4000000000000059400000000000005E40000000000000F03F0000000000805B40000000000000F03F'::geometry, pos)  
                     Heap Blocks: exact=5908 lossy=3936  
                     ->  Bitmap Index Scan on idx_test_1  (cost=0.00..149.12 rows=10000 width=0) (actual time=502.790..502.790 rows=5001107 loops=1)  
                           Index Cond: ('0103000020E610000001000000050000000000000000805B40000000000000F03F0000000000805B4000000000000059400000000000005E4000000000000059400000000000005E40000000000000F03F0000000000805B40000000000000F03F'::geometry ~ pos)  
 Planning time: 0.171 ms  
 Execution time: 1087.890 ms  
(14 rows)  
  
  
postgres=# explain analyze select count(*) from test where st_contains(st_setsrid(st_makebox2d(st_makepoint(110,1), st_makepoint(120,100)), 4326) , pos);  
                                  QUERY PLAN           
-------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=411456.73..411456.74 rows=1 width=8) (actual time=674.853..674.853 rows=1 loops=1)  
   ->  Gather  (cost=411456.70..411456.71 rows=8 width=8) (actual time=674.793..674.848 rows=9 loops=1)  
         Workers Planned: 8  
         Workers Launched: 8  
         ->  Partial Aggregate  (cost=411456.70..411456.71 rows=1 width=8) (actual time=644.627..644.627 rows=1 loops=9)  
               ->  Parallel Seq Scan on test  (cost=0.00..411455.65 rows=417 width=0) (actual time=0.045..586.545 rows=555679 loops=9)  
                     Filter: (('0103000020E610000001000000050000000000000000805B40000000000000F03F0000000000805B4000000000000059400000000000005E4000000000000059400000000000005E40000000000000F03F0000000000805B40000000000000F03F'::geometry   ~ pos) AND _st_contains('0103000020E610000001000000050000000000000000805B40000000000000F03F0000000000805B4000000000000059400000000000005E4000000000000059400000000000005E40000000000000F03F0000000000805B40000000000000F03F'::geometry, pos)  )  
                     Rows Removed by Filter: 555433  
 Planning time: 0.130 ms  
 Execution time: 683.011 ms  
(10 rows)  

优化方法

首先我们知道空间包含,实际上是通过类似r-tree来从大到小进行收敛的,索引结构如下:

《PostgreSQL 黑科技 - 空间聚集存储, 内窥GIN, GiST, SP-GiST索引》

《通过空间思想理解GiST索引的构造》

因此,对于“平面或3D空间点”类型,我们可以利用x,y,z来收集BOUND BOX的边界统计信息,使用表达式索引即可得到。

下面的方法适合POINT geometry类型。

1、对point类型,创建x,y表达式索引

create index idx_test_2 on test (st_x(pos));  
create index idx_test_3 on test (st_y(pos));  

2、收集统计信息

vacuum analyze test;  

3、现在,表达式索引的统计信息有了(边界,柱状图,高频词等).

postgres=# select * from pg_stats where tablename='idx_test_2' ;  
 schemaname | tablename  | attname | inherited | null_frac | avg_width | n_distinct |  most_common_vals  | most_common_freqs |                                                                                                                 
                                                                                                                                                                                                                                               
                                                                                                                                                                                                                                               
                                                                                                                                                                                                                                               
                        histogram_bounds                                                                                                                                                                                                       
                                                                                                                                                                                                                                               
                                                                                                                                                                                                                                               
                                                                                                                                                                               | correlation | most_common_elems | most_common_elem_freqs | e  
lem_count_histogram   
------------+------------+---------+-----------+-----------+-----------+------------+--------------------+-------------------+---------------------------------------------------------------------------------------------------------------  
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------+-------------------+------------------------+--  
--------------------  
 public     | idx_test_2 | st_x    | f         |         0 |         8 |  -0.978292 | {100.210125753656} | {6.66667e-05}     | {100.001683048904,100.198292508721,100.394482128322,100.614011939615,100.833150474355,101.039264379069,101.233  
114106581,101.457025809214,101.663421457633,101.865276163444,102.060301089659,102.248072810471,102.453578868881,102.656150059775,102.860928429291,103.057968365029,103.260449869558,103.457426037639,103.632223745808,103.85178135708,104.049  
411769956,104.26340050064,104.474113518372,104.658453594893,104.85337683931,105.054327072576,105.277718435973,105.484521845356,105.703673372045,105.907790735364,106.115896645933,106.314396839589,106.519666947424,106.724081514403,106.9380  
68913296,107.144253859296,107.326071513817,107.518919138238,107.721077715978,107.916374253109,108.123016441241,108.360300129279,108.551258808002,108.714984534308,108.903268156573,109.091228945181,109.277053307742,109.48098176159,109.6832  
09387586,109.853478074074,110.044501451775,110.241627292708,110.45505293645,110.652585113421,110.851763477549,111.036027139053,111.231312695891,111.423934968188,111.620921371505,111.847442938015,112.031054906547,112.234003236517,112.4332  
3944509,112.63793184422,112.834744984284,113.041293732822,113.23037719354,113.432592023164,113.619993180037,113.809020379558,114.014157289639,114.208164261654,114.409972019494,114.612494371831,114.827286116779,115.010247323662,115.208249  
371499,115.397510435432,115.607831152156,115.814645215869,116.007154844701,116.198594048619,116.39090036042,116.600214680657,116.819634130225,117.008039858192,117.199446037412,117.389995325357,117.60263632983,117.81475706026,118.02150100  
4696,118.219726895913,118.425922412425,118.623819025233,118.801201758906,118.997462140396,119.18357164599,119.394188923761,119.607428628951,119.789487998933,119.999485854059} |   0.0060246 |                   |                        |   
(1 row)  
  
postgres=# select * from pg_stats where tablename='idx_test_3' ;  
 schemaname | tablename  | attname | inherited | null_frac | avg_width | n_distinct | most_common_vals | most_common_freqs |                                                                                                                   
                                                                                                                                                                                                                                               
                                                                                                                                                                                                                                               
                                                                                                                                                                                                                                               
                   histogram_bounds                                                                                                                                                                                                            
                                                                                                                                                                                                                                               
                                                                                                                                                                                                                                               
                                                                                                                                                                        | correlation | most_common_elems | most_common_elem_freqs | elem_cou  
nt_histogram   
------------+------------+---------+-----------+-----------+-----------+------------+------------------+-------------------+-----------------------------------------------------------------------------------------------------------------  
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  
------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------+-------------------+------------------------+---------  
-------------  
 public     | idx_test_3 | st_y    | f         |         0 |         8 |         -1 |                  |                   | {65.0000545242801,65.0990079157054,65.2072919439524,65.301208673045,65.3929488640279,65.4877291806042,65.5846159  
020439,65.6880339607596,65.7856895681471,65.8818093780428,65.9888388821855,66.0896951518953,66.1984446132556,66.304198782891,66.3969474285841,66.496685273014,66.6056828852743,66.7012783605605,66.7941406555474,66.8967378046364,66.98895718  
90235,67.0855156239122,67.191197341308,67.2994636883959,67.3969272850081,67.4947946099564,67.5993846775964,67.7099830284715,67.8120812214911,67.9088703868911,67.9981751134619,68.099527056329,68.1973828841001,68.2960116351023,68.398535256  
274,68.4888928988948,68.5882083000615,68.6907122470438,68.800939405337,68.9010692993179,69.0081828692928,69.11032628268,69.19522870332,69.2957087606192,69.3950105085969,69.5086302934214,69.6040614042431,69.7045981185511,69.8040427314118,  
69.906362099573,70.0155263161287,70.1146272942424,70.2022811910138,70.2994426619262,70.4040612280369,70.5112449126318,70.6194959674031,70.722281485796,70.822925157845,70.918908463791,71.0138623649254,71.1134587181732,71.2168908445165,71.  
3034919975325,71.4052765490487,71.5016098646447,71.6014923620969,71.6971560986713,71.8042277451605,71.8970008520409,71.9877171749249,72.0928315026686,72.1963384188712,72.2967635607347,72.3947894759476,72.4977402808145,72.5967703945935,72  
.6985206454992,72.7883456554264,72.885032473132,72.9875696543604,73.0857636081055,73.1845043040812,73.280454329215,73.3783990284428,73.4819683339447,73.5874625109136,73.6941803013906,73.7924668611959,73.8872256595641,73.9888873603195,74.  
0806347271428,74.1819418873638,74.2877806117758,74.3848745618016,74.4918680656701,74.5988712925464,74.7014277381822,74.8038458405063,74.9038722459227,74.9999207118526} | -0.00307583 |                   |                        |   
(1 row)  

现在我们改一下SQL,把boundbox放进SQL

select count(*) from test   
where   
st_contains(st_setsrid(st_makebox2d(st_makepoint(110,1), st_makepoint(112,2)), 4326) , pos);  

改成:

select count(*) from test   
where   
st_x(pos)   
  between  
    st_xmin(box2d(st_setsrid(st_makebox2d(st_makepoint(110,1), st_makepoint(120,100)), 4326)))  
  and  
    st_xmax(box2d(st_setsrid(st_makebox2d(st_makepoint(110,1), st_makepoint(120,100)), 4326)))  
and  
st_y(pos)   
  between  
    st_ymin(box2d(st_setsrid(st_makebox2d(st_makepoint(110,1), st_makepoint(120,100)), 4326)))  
  and  
    st_ymax(box2d(st_setsrid(st_makebox2d(st_makepoint(110,1), st_makepoint(120,100)), 4326)))  
and  
st_contains(st_setsrid(st_makebox2d(st_makepoint(110,1), st_makepoint(120,100)), 4326) , pos);  

执行计划如下:

在大范围输入时,通过表达式索引条件评估得到的行数比空间编码大于空间评估(硬编码 千分之一)得到的记录数,所以依旧使用了空间索引。

postgres=# explain select count(*) from test   
where   
st_x(pos)   
  between  
    st_xmin(box2d(st_setsrid(st_makebox2d(st_makepoint(110,1), st_makepoint(120,100)), 4326)))  
  and  
    st_xmax(box2d(st_setsrid(st_makebox2d(st_makepoint(110,1), st_makepoint(120,100)), 4326)))  
and  
st_y(pos)   
  between  
    st_ymin(box2d(st_setsrid(st_makebox2d(st_makepoint(110,1), st_makepoint(120,100)), 4326)))  
  and  
    st_ymax(box2d(st_setsrid(st_makebox2d(st_makepoint(110,1), st_makepoint(120,100)), 4326)))  
and  
st_contains(st_setsrid(st_makebox2d(st_makepoint(110,1), st_makepoint(120,100)), 4326) , pos);  
                       QUERY PLAN                                                                                                                                        
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=12380.60..12380.61 rows=1 width=8)  
   ->  Index Scan using idx_test_1 on test  (cost=0.42..12376.42 rows=1674 width=0)  
         Index Cond: ('0103000020E610000001000000050000000000000000805B40000000000000F03F0000000000805B4000000000000059400000000000005E4000000000000059400000000000005E40000000000000F03F0000000000805B40000000000000F03F'::geometry ~ pos)  
         Filter: ((st_x(pos) >= '110'::double precision) AND (st_x(pos) <= '120'::double precision) AND (st_y(pos) >= '1'::double precision) AND (st_y(pos) <= '100'::double precision) AND _st_contains('0103000020E610000001000000050000000  000000000805B40000000000000F03F0000000000805B4000000000000059400000000000005E4000000000000059400000000000005E40000000000000F03F0000000000805B40000000000000F03F'::geometry, pos))  
(4 rows)  

在小范围输入时,评估得到的行数比空间编码更少(千分之一),所以使用了非空间索引。

explain select count(*) from test   
where   
st_x(pos)   
  between  
    st_xmin(box2d(st_setsrid(st_makebox2d(st_makepoint(110,1), st_makepoint(112,2)), 4326)))  
  and  
    st_xmax(box2d(st_setsrid(st_makebox2d(st_makepoint(110,1), st_makepoint(112,2)), 4326)))  
and  
st_y(pos)   
  between  
    st_ymin(box2d(st_setsrid(st_makebox2d(st_makepoint(110,1), st_makepoint(112,2)), 4326)))  
  and  
    st_ymax(box2d(st_setsrid(st_makebox2d(st_makepoint(110,1), st_makepoint(112,2)), 4326)))  
and  
st_contains(st_setsrid(st_makebox2d(st_makepoint(110,1), st_makepoint(112,2)), 4326) , pos);  
  
   QUERY PLAN                                
----------------------------------------------------------------------------------------------
 Aggregate  (cost=2.92..2.93 rows=1 width=8)  
   ->  Index Scan using idx_test_3 on test  (cost=0.43..2.92 rows=1 width=0)  
         Index Cond: ((st_y(pos) >= '1'::double precision) AND (st_y(pos) <= '2'::double precision))  
         Filter: (('0103000020E610000001000000050000000000000000805B40000000000000F03F0000000000805B4000000000000000400000000000005C4000000000000000400000000000005C40000000000000F03F0000000000805B40000000000000F03F'::geometry ~ pos) AND   
(st_x(pos) >= '110'::double precision) AND (st_x(pos) <= '112'::double precision) AND _st_contains('0103000020E610000001000000050000000000000000805B40000000000000F03F0000000000805B4000000000000000400000000000005C4000000000000000400000000  
000005C40000000000000F03F0000000000805B40000000000000F03F'::geometry, pos))  
(4 rows)  

一些函数

查看geometry的bound box

postgres=# select st_astext(box2d(st_setsrid(st_makebox2d(st_makepoint(110,1), st_makepoint(112,2)), 4326) ));  
                st_astext                   
------------------------------------------  
 POLYGON((110 1,110 2,112 2,112 1,110 1))  
(1 row)  
postgres=# select st_astext(box2d(ST_GeomFromEWKT('CIRCULARSTRING(220268 150415 1,220227 150505 2,220227 150406 3)')));  
                                                                       st_astext                                                                          
--------------------------------------------------------------------------------------------------------------------------------------------------------  
 POLYGON((220186.995121892 150406,220186.995121892 150506.126829327,220288.248780547 150506.126829327,220288.248780547 150406,220186.995121892 150406))  
(1 row)  

2、查看BOUND BOX的边界点x,y.

postgres=# select st_xmin(box2d(ST_GeomFromEWKT('CIRCULARSTRING(220268 150415 1,220227 150505 2,220227 150406 3)')));  
     st_xmin        
------------------  
 220186.995121892  
(1 row)  
  
postgres=# select st_xmax(box2d(ST_GeomFromEWKT('CIRCULARSTRING(220268 150415 1,220227 150505 2,220227 150406 3)')));  
     st_xmax        
------------------  
 220288.248780547  
(1 row)  
  
postgres=# select st_ymin(box2d(ST_GeomFromEWKT('CIRCULARSTRING(220268 150415 1,220227 150505 2,220227 150406 3)')));  
 st_ymin   
---------  
  150406  
(1 row)  
  
postgres=# select st_ymax(box2d(ST_GeomFromEWKT('CIRCULARSTRING(220268 150415 1,220227 150505 2,220227 150406 3)')));  
     st_ymax        
------------------  
 150506.126829327  
(1 row)  

小结

目前空间类型的bound box没有统计信息,所以空间查询时,选择性使用了硬编码写死,固定了选择性百分比,乘以pg_class.reltuples即可得到评估记录数,但是不准确。

使得PostgreSQL的空间搜索(包含搜索),不管范围多大,都使用索引,而实际上大范围可能使用全表扫描性能更好。

本文增加boundbox的条件,使用xyz边界条件,使得行数评估更加的准确,在适当的时候选择适合的索引。

当然,最好的方法,还是PG内核层面优化空间类型的统计信息,以及选择性代码的支持。

参考

建议,PostgreSQL内核层面改进空间类型的统计信息,点类型,分别统计X,Y轴的统计信息。

《PostgreSQL 自定义函数表达式选择性评估算法 - Statistics, Cardinality, Selectivity, Estimate》

相关实践学习
使用PolarDB和ECS搭建门户网站
本场景主要介绍基于PolarDB和ECS实现搭建门户网站。
阿里云数据库产品家族及特性
阿里云智能数据库产品团队一直致力于不断健全产品体系,提升产品性能,打磨产品功能,从而帮助客户实现更加极致的弹性能力、具备更强的扩展能力、并利用云设施进一步降低企业成本。以云原生+分布式为核心技术抓手,打造以自研的在线事务型(OLTP)数据库Polar DB和在线分析型(OLAP)数据库Analytic DB为代表的新一代企业级云原生数据库产品体系, 结合NoSQL数据库、数据库生态工具、云原生智能化数据库管控平台,为阿里巴巴经济体以及各个行业的企业客户和开发者提供从公共云到混合云再到私有云的完整解决方案,提供基于云基础设施进行数据从处理、到存储、再到计算与分析的一体化解决方案。本节课带你了解阿里云数据库产品家族及特性。
相关文章
|
7天前
|
SQL 存储 关系型数据库
【MySQL系列笔记】SQL优化
SQL优化是通过调整数据库查询、索引、表结构和配置参数等方式,提高SQL查询性能和效率的过程。它旨在减少查询执行时间、减少系统资源消耗,从而提升数据库系统整体性能。优化方法包括索引优化、查询重写、表分区、适当选择和调整数据库引擎等。
128 3
|
9天前
|
分布式计算 DataWorks 调度
DataWorks产品使用合集之DataWorks中,填写ODPS SQL任务中的参数和分区信息如何解决
DataWorks作为一站式的数据开发与治理平台,提供了从数据采集、清洗、开发、调度、服务化、质量监控到安全管理的全套解决方案,帮助企业构建高效、规范、安全的大数据处理体系。以下是对DataWorks产品使用合集的概述,涵盖数据处理的各个环节。
23 0
|
9天前
|
存储 SQL 缓存
30个业务场景的SQL优化
这些优化策略和示例可以帮助改善 `SQL` 查询的性能和效率。在实践中,需要综合考虑数据库设计、`SQL` 编写、服务器配置等多方面因素,选择合适的优化方法,并进行充分的测试和验证。以上 30 个经验是 V 哥在实际经验中总结的内容,当然,业务场景不同,具体的优化策略也会不同,按实际情况处理,这不就是程序员要做的事情么。
|
9天前
|
SQL 存储 算法
clickhouse SQL优化
clickhouse 是 OLAP 数据库,但其具有独特的索引设计,所以如果拿 MySQL 或者其他 RDB 的优化经验来优化 clickhouse 可能得不到很好的效果,所以特此单独整理一篇文档,用于有 SQL 优化需求的同学,本人接触 clickhouse 时间也不长,难免有不足的地方,如果大家发现错误,还请不吝指正。
|
12天前
|
SQL 关系型数据库 MySQL
【MySQL】SQL优化
【MySQL】SQL优化
|
14天前
|
SQL 存储 关系型数据库
MySQL SQL优化
MySQL SQL优化
16 0
|
16天前
|
SQL 分布式计算 资源调度
一文解析 ODPS SQL 任务优化方法原理
本文重点尝试从ODPS SQL的逻辑执行计划和Logview中的执行计划出发,分析日常数据研发过程中各种优化方法背后的原理,覆盖了部分调优方法的分析,从知道怎么优化,到为什么这样优化,以及还能怎样优化。
103480 1
|
20天前
|
SQL 关系型数据库 MySQL
explain是什么?explain优缺点及如何使用explain优化SQL
explain是什么?explain优缺点及如何使用explain优化SQL
41 1
|
20天前
|
SQL 自然语言处理 数据挖掘
NL2SQL技术方案系列(1):NL2API、NL2SQL技术路径选择;LLM选型与Prompt工程技巧,揭秘项目落地优化之道
NL2SQL技术方案系列(1):NL2API、NL2SQL技术路径选择;LLM选型与Prompt工程技巧,揭秘项目落地优化之道
NL2SQL技术方案系列(1):NL2API、NL2SQL技术路径选择;LLM选型与Prompt工程技巧,揭秘项目落地优化之道
|
22天前
|
SQL NoSQL 关系型数据库
【后端面经】【数据库与MySQL】SQL优化:如何发现SQL中的问题?-02
【4月更文挑战第13天】该文介绍了几个数据库查询优化技巧。首先,创建覆盖索引如&lt;A,B,C&gt;能加速`select A,B,C from student where A=? and B=? and C=?`的执行。其次,为常用于排序的列建立索引,如在`id,update_time`上建索引,可避免数据排序,显著提高查询速度。优化`count(*)`可通过预估值或使用Redis记录总数,但需注意数据一致性问题。使用索引提示如FORCE INDEX可强制使用特定索引,但应谨慎。将`having`的非聚合条件移到`where`里可提升效率。最后,处理深度分页时
23 3

相关产品

  • 云原生数据库 PolarDB