PostgreSQL pg_stats used to estimate top N freps values and explain rows

本文涉及的产品
云原生数据库 PolarDB MySQL 版,通用型 2核4GB 50GB
云原生数据库 PolarDB PostgreSQL 版,标准版 2核4GB 50GB
简介:

标签

PostgreSQL , 数组 , pg_stats , 统计信息 , TOP N , 词频

背景

本文要聊的是如何利用统计信息规避一些复杂的精确统计.

去年写过一篇关于 PostgreSQL 9.2 新增array元素统计收集的相关文章. 本文也会讲到.

《PostgreSQL 9.2 add array elements statistics》

Release 9.2  
  
Release Date: 2012-09-10  
     * Move the frequently accessed members of the PGPROC shared memory  
       array to a separate array (Pavan Deolasee, Heikki Linnakangas,  
       Robert Haas)  

在日常的数据库统计中, count(*), 排名这类的统计非常多, 同时这类统计的开销也非常大, 特别是当表的数据量巨大时.

接下来模拟几个场景。
1、统计某条件下的记录条数.

digoal=# create table test_1 (id serial4 primary key, info text, appid int, crt_time timestamp);  
CREATE TABLE  
digoal=# insert into test_1 (info,appid,crt_time) select md5(random()::text),round(10000*random())::int,clock_timestamp() from generate_series(1,2000000);  
INSERT 0 2000000  

默认的统计目标值为100

digoal=# show default_statistics_target ;  
-[ RECORD 1 ]-------------+----  
default_statistics_target | 100  
digoal=# analyze test_1;  
ANALYZE  

为了得到准确的统计信息, 如果没有打开autovacuum, 最好手动收集一次统计信息.

digoal=# select * from pg_stat_all_tables where relname='test_1';  
-[ RECORD 1 ]-----+------------------------------  
relid             | 91368  
schemaname        | public  
relname           | test_1  
seq_scan          | 1  
seq_tup_read      | 0  
idx_scan          | 0  
idx_tup_fetch     | 0  
n_tup_ins         | 2000000  
n_tup_upd         | 0  
n_tup_del         | 0  
n_tup_hot_upd     | 0  
n_live_tup        | 2000000  
n_dead_tup        | 0  
last_vacuum       |   
last_autovacuum   |   
last_analyze      | 2013-08-11 11:15:04.975523+08  
last_autoanalyze  | 2013-08-11 11:14:53.663951+08  
vacuum_count      | 0  
autovacuum_count  | 0  
analyze_count     | 1  
autoanalyze_count | 1  

查询appid=1的记录数有多少

digoal=# select count(*) from test_1 where appid=1;  
 count   
-------  
   189  
(1 row)  

使用explain输出appid=1的记录数有多少, 这里显示为197. 和使用count(*)得到的存在一点差异.

digoal=# explain select * from test_1 where appid=1;  
                         QUERY PLAN                            
-------------------------------------------------------------  
 Seq Scan on test_1  (cost=0.00..45619.00 rows=197 width=49)  
   Filter: (appid = 1)  
(2 rows)  

如果将统计目标调整为10000, 采样行数会大大增加, 消耗的资源加大.

digoal=# alter table test_1 alter column appid SET STATISTICS 10000;  
ALTER TABLE  
digoal=# analyze verbose test_1;  
INFO:  analyzing "public.test_1"  
INFO:  "test_1": scanned 20619 of 20619 pages, containing 2000000 live rows and 0 dead rows; 2000000 rows in sample, 2000000 estimated total rows  
ANALYZE  

但是统计行数更加准确了, 现在为188. 只相差1行.

digoal=# explain select * from test_1 where appid=1;  
                         QUERY PLAN                            
-------------------------------------------------------------  
 Seq Scan on test_1  (cost=0.00..45619.00 rows=188 width=49)  
   Filter: (appid = 1)  
(2 rows)  

范围查询看看是否准确呢?

digoal=# explain select * from test_1 where appid>1000;  
                           QUERY PLAN                              
-----------------------------------------------------------------  
 Seq Scan on test_1  (cost=0.00..45619.00 rows=1800419 width=49)  
   Filter: (appid > 1000)  
(2 rows)  

也非常准确.

digoal=# select count(*) from test_1 where appid>1000;  
  count    
---------  
 1800263  
(1 row)  

误差为 : 0.00008665

组合条件的输出行评估 :

digoal=# explain select * from test_1 where appid>1000 and crt_time>now();  
                         QUERY PLAN                            
-------------------------------------------------------------  
 Seq Scan on test_1  (cost=0.00..55619.00 rows=180 width=49)  
   Filter: ((appid > 1000) AND (crt_time > now()))  
(2 rows)  
digoal=# select * from test_1 where appid>1000 and crt_time>now();  
 id | info | appid | crt_time   
----+------+-------+----------  
(0 rows)  

2、分组排行, 例如要查询哪个appid的记录条数最多.

digoal=# alter table test_1 alter column appid SET STATISTICS 100;  
ALTER TABLE  
digoal=# analyze verbose test_1;  
INFO:  analyzing "public.test_1"  
INFO:  "test_1": scanned 20619 of 20619 pages, containing 2000000 live rows and 0 dead rows; 30000 rows in sample, 2000000 estimated total rows  
ANALYZE  
digoal=# select most_common_vals,most_common_freqs from pg_stats where tablename='test_1' and attname='appid';  
-[ RECORD 1 ]-----+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  
most_common_vals  | {3677,6460,1210,1291,2052,3374,3633,4386,4729,4756,5602,320,383,479,906,1003,1018,1102,1243,2594,2625,2762,3092,3243,3376,3511,4842,5595,5967,6135,6412,6821,6824,6966,7828,7984,8118,8310,8378,8952,9012,9840,9922,22,68,359,632,899,933,1034,1227,1369,1554,1615,1706,1744,1824,1995,2034,2056,2215,2412,2770,2988,3488,3722,3780,3834,3937,4079,4124,4224,4424,4723,4811,4870,5287,5490,5596,5609,5665,5751,5881,6236,6562,6656,6694,6827,6865,6980,6996,7008,7021,7097,7274,7285,7289,7330,7367,7449}  
most_common_freqs | {0.0004,0.000366667,0.000333333,0.000333333,0.000333333,0.000333333,0.000333333,0.000333333,0.000333333,0.000333333,0.000333333,0.0003,0.0003,0.0003,0.0003,0.0003,0.0003,0.0003,0.0003,0.0003,0.0003,0.0003,0.0003,0.0003,0.0003,0.0003,0.0003,0.0003,0.0003,0.0003,0.0003,0.0003,0.0003,0.0003,0.0003,0.0003,0.0003,0.0003,0.0003,0.0003,0.0003,0.0003,0.0003,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667,0.000266667}  

值太均匀, 所以这个不准确.

digoal=# select appid,count(*) from test_1 group by appid order by count(*) desc limit 100;  
 appid | count   
-------+-------  
  9853 |   253  
  6502 |   249  
  1688 |   249  
   464 |   249  
  9974 |   248  
  1540 |   248  
  6622 |   247  
  6669 |   247  
  4643 |   247  
  1046 |   246  
  3051 |   246  
  6359 |   246  
  9103 |   246  
   348 |   246  
  2213 |   244  
   138 |   244  
  8135 |   244  
  3980 |   244  
  5870 |   243  
  9349 |   243  
  6210 |   243  
  4575 |   243  
  3421 |   242  
   207 |   242  
  3224 |   242  
  7056 |   242  
  4561 |   242  
  8770 |   241  
  3011 |   241  
  3731 |   241  
  4951 |   241  
  1066 |   240  
  5501 |   240  
  9354 |   240  
  7430 |   240  
  7621 |   240  
  2058 |   240  
  5460 |   240  
  6578 |   239  
  7431 |   239  
  5473 |   239  
  7305 |   239  
  9563 |   239  
  3275 |   239  
  2968 |   239  
  8825 |   239  
  3426 |   238  
  3850 |   238  
  6835 |   238  
  5928 |   238  
  8567 |   238  
  4083 |   238  
  1137 |   238  
  4862 |   238  
  4238 |   238  
  1058 |   238  
  6745 |   237  
  5854 |   237  
  3196 |   237  
  3165 |   237  
   724 |   237  
  9643 |   237  
  7326 |   237  
  6661 |   237  
  3685 |   236  
  2590 |   236  
  9685 |   236  
  8366 |   236  
  3931 |   236  
  7074 |   236  
  6140 |   236  
  4402 |   236  
  4635 |   236  
  7628 |   236  
  5967 |   236  
    24 |   236  
   987 |   236  
  2472 |   236  
  8724 |   236  
  6404 |   236  
  9504 |   235  
  5816 |   235  
  1261 |   235  
  5551 |   235  
   874 |   235  
  1880 |   235  
  5248 |   235  
   404 |   235  
  5738 |   235  
   583 |   235  
  7799 |   235  
  2362 |   235  
  1789 |   235  
  7707 |   235  
  3091 |   234  
  9245 |   234  
  6107 |   234  
  8657 |   234  
  7460 |   234  
  2252 |   234  
(100 rows)  

当倾斜较大时(符合现实世界理论,二八原则), 就特别准.

digoal=#  insert into test_1 (info,appid,crt_time) select 'test',1,now() from generate_series(1,100000);  
INSERT 0 100000  
digoal=# analyze verbose test_1;  
INFO:  analyzing "public.test_1"  
INFO:  "test_1": scanned 21262 of 21262 pages, containing 2101000 live rows and 0 dead rows; 30000 rows in sample, 2101000 estimated total rows  
ANALYZE  
digoal=# select appid,count(*) from test_1 group by appid order by count(*) desc limit 5;  
 appid | count    
-------+--------  
     1 | 101189  
  9853 |    253  
  6502 |    249  
   464 |    249  
  1688 |    249  
(5 rows)  

3、数组的元素值排行统计

例如有一个表中记录了客户允许的应用程序信息, 应用程序存储为一个数组, 程序以id形式存到数组中.

如果要统计全国终端中最火爆的程序排行.

digoal=# create table test_2(id serial primary key, appid int[], crt_time timestamp);  
CREATE TABLE  

假设appid为0-10的程序比较火爆, 模拟100秒插入请求.

vi test.sql  
insert into test_2(appid) select array_agg(appid) appid_agg from (select round(10*random())::int as appid from generate_series(1,20)) t;  
pg93@db-172-16-3-33-> pgbench -M prepared -n -r -f ./test.sql -c 16 -j 4 -T 100 digoal  
transaction type: Custom query  
scaling factor: 1  
query mode: prepared  
number of clients: 16  
number of threads: 4  
duration: 100 s  
number of transactions actually processed: 389701  
tps = 3896.755035 (including connections establishing)  
tps = 3897.686073 (excluding connections establishing)  
statement latencies in milliseconds:  
        4.103117        insert into test_2(appid) select array_agg(appid) appid_agg from (select round(10*random())::int as appid from generate_series(1,20)) t;  

假设appid为10以上的程序不火爆, 模拟10秒插入请求.

vi test.sql  
insert into test_2(appid) select array_agg(appid) appid_agg from (select round(1000*random())::int as appid from generate_series(1,20)) t;  
pg93@db-172-16-3-33-> pgbench -M prepared -n -r -f ./test.sql -c 16 -j 4 -T 10 digoal  
transaction type: Custom query  
scaling factor: 1  
query mode: prepared  
number of clients: 16  
number of threads: 4  
duration: 10 s  
number of transactions actually processed: 37256  
tps = 3718.362208 (including connections establishing)  
tps = 3725.838361 (excluding connections establishing)  
statement latencies in milliseconds:  
        4.290334        insert into test_2(appid) select array_agg(appid) appid_agg from (select round(1000*random())::int as appid from generate_series(1,20)) t;  

总记录数如下 :

digoal=# select count(*) from test_2;  
 count    
--------  
 472583  
(1 row)  

收集统计信息

digoal=# analyze verbose test_2;  
INFO:  analyzing "public.test_2"  
INFO:  "test_2": scanned 8184 of 8184 pages, containing 472583 live rows and 0 dead rows; 30000 rows in sample, 472583 estimated total rows  
ANALYZE  

通过group by和order by获取实际的排名 :

digoal=# select appid,count(*) from (select unnest(appid) as appid from test_2) t group by appid order by count(*) desc limit 20;  
 appid | count    
-------+--------  
     9 | 872831  
     6 | 871908  
     3 | 871867  
     7 | 871551  
     8 | 871436  
     4 | 871391  
     1 | 871051  
     5 | 870770  
     2 | 870692  
    10 | 435583  
     0 | 435342  
   387 |    831  
    69 |    824  
   665 |    822  
   703 |    816  
   651 |    811  
   520 |    809  
   435 |    809  
   783 |    806  
   671 |    806  
(20 rows)  

接下来要用9.2新增的array 统计信息查看排名了 :

注意元素的收集个数为default_statistics_target 的10倍, 这个可以参考源码 :

+   /*  
+    * Set up static pointer for use by subroutines.  We wait till here in  
+    * case std_compute_stats somehow recursively invokes us (probably not  
+    * possible, but ...)  
+    */  
+   array_extra_data = extra_data;  
+  
+   /*  
+    * We want statistics_target * 10 elements in the MCELEM array. This  
+    * multiplier is pretty arbitrary, but is meant to reflect the fact that  
+    * the number of individual elements tracked in pg_statistic ought to be  
+    * more than the number of values for a simple scalar column.  
+    */  
+   num_mcelem = stats->attr->attstattarget * 10;  

所以默认default_statistics_target=100的情况下, 将会收集1000个数组型元素的出现概率.

digoal=# select array_length(most_common_elems,1),array_length(most_common_elem_freqs,1) from pg_stats where tablename='test_2' and attname='appid';  
-[ RECORD 1 ]+-----  
array_length | 1000  
array_length | 1003  
most_common_elem_freqs中最后3个的值代表min, max, null_freqs. 具体见  

《PostgreSQL 9.2 add array elements statistics》
如果把统计信息修改为50, 那么统计信息的元素个数将变成500.

digoal=# alter table test_2 alter column appid set statistics 50;  
ALTER TABLE  
digoal=# analyze verbose test_2;  
INFO:  analyzing "public.test_2"  
INFO:  "test_2": scanned 8184 of 8184 pages, containing 472583 live rows and 0 dead rows; 30000 rows in sample, 472583 estimated total rows  
ANALYZE  
digoal=# select array_length(most_common_elems,1),array_length(most_common_elem_freqs,1) from pg_stats where tablename='test_2' and attname='appid';  
-[ RECORD 1 ]+----  
array_length | 500  
array_length | 503  

下面是统计信息 :

digoal=# select most_common_elems,most_common_elem_freqs from pg_stats where tablename='test_2' and attname='appid';  
most_common_elems      | {0,1,2,3,4,5,6,7,8,9,10,12,13,15,16,17,18,22,23,24,27,30,31,32,34,35,36,37,40,41,42,44,49,50,52,55,56,60,63,67,69,73,77,78,81,85,89,90,94,95,97,100,101,103,104,108,109,114,116,117,120,121,123,125,128,132,133,134,136,139,140,143,144,146,149,150,151,152,153,154,155,158,159,162,165,167,170,172,173,174,175,180,182,183,184,186,189,192,196,197,200,202,208,209,210,211,214,217,221,222,226,227,232,233,234,239,241,242,243,245,246,249,250,253,254,259,260,263,264,265,268,269,272,273,275,276,277,281,283,287,288,289,291,294,295,296,297,300,301,304,305,307,311,315,316,317,318,322,324,325,333,334,335,338,340,342,344,346,348,351,352,353,354,355,357,359,362,363,365,366,367,368,370,374,376,377,379,382,384,387,389,391,393,397,398,400,401,402,406,407,410,415,418,419,420,422,423,424,425,428,430,431,432,433,434,436,437,438,441,442,443,445,446,448,449,450,451,455,456,457,458,460,463,464,465,467,470,471,472,474,476,478,481,483,484,486,488,490,491,492,494,495,496,498,499,503,504,505,510,512,513,514,515,518,520,522,523,529,530,532,538,539,542,543,545,553,556,559,560,562,565,567,568,569,571,572,574,575,576,579,580,582,587,588,589,593,596,597,599,601,605,606,607,608,609,610,612,616,617,619,621,622,624,626,628,631,632,637,638,642,644,646,648,655,656,657,660,661,663,664,665,666,667,669,670,672,675,678,679,681,682,683,685,687,689,691,692,693,696,697,698,699,701,702,703,707,708,709,712,714,716,718,719,720,722,723,724,725,727,728,729,734,735,736,738,741,742,743,744,746,747,752,753,759,760,761,762,763,767,769,770,772,773,775,776,777,778,779,782,784,785,787,788,789,790,791,792,795,797,799,800,801,804,805,807,809,811,815,816,818,819,824,825,827,828,829,830,831,832,833,836,838,840,842,843,844,847,848,850,851,854,858,860,861,862,864,865,868,869,870,873,874,875,880,881,884,886,893,894,895,898,900,904,906,907,912,913,914,922,924,928,931,933,939,941,942,944,946,947,954,955,956,957,960,961,962,963,965,967,968,970,972,975,978,987,988,991,994,996,997}  
most_common_elem_freqs | {0.590833,0.810967,0.809233,0.808433,0.807467,0.806667,0.808067,0.8102,0.806233,0.808367,0.588033,0.00186667,0.0019,0.0018,0.0018,0.00183333,0.00193333,0.00176667,0.002,0.00176667,0.00223333,0.00163333,0.00213333,0.0019,0.00193333,0.0017,0.00166667,0.00166667,0.00163333,0.00186667,0.00176667,0.0022,0.00206667,0.00196667,0.00183333,0.00163333,0.0018,0.002,0.0017,0.0017,0.00166667,0.00183333,0.00173333,0.0017,0.0017,0.00163333,0.00183333,0.00173333,0.00193333,0.0017,0.0017,0.0018,0.00173333,0.00176667,0.00166667,0.00213333,0.0021,0.00173333,0.00193333,0.00186667,0.00166667,0.0019,0.00166667,0.00213333,0.0021,0.00176667,0.00163333,0.00173333,0.00176667,0.00176667,0.00186667,0.00193333,0.00176667,0.00173333,0.0019,0.00193333,0.00176667,0.002,0.00176667,0.00173333,0.00173333,0.00186667,0.00173333,0.00166667,0.0021,0.00163333,0.00193333,0.00176667,0.00166667,0.00176667,0.00163333,0.00193333,0.00166667,0.00166667,0.00193333,0.00176667,0.0017,0.00163333,0.00176667,0.00196667,0.002,0.00196667,0.00213333,0.0017,0.00193333,0.00173333,0.00176667,0.00166667,0.0018,0.00163333,0.0021,0.00216667,0.00203333,0.00193333,0.0019,0.00176667,0.00193333,0.00176667,0.00193333,0.00176667,0.00166667,0.00186667,0.00196667,0.00173333,0.0019,0.002,0.00193333,0.0021,0.00166667,0.0017,0.00166667,0.00196667,0.0019,0.0019,0.00163333,0.0017,0.00163333,0.0023,0.00173333,0.00186667,0.00193333,0.00173333,0.0017,0.00166667,0.00166667,0.00173333,0.0018,0.00166667,0.00183333,0.0018,0.00173333,0.00163333,0.0017,0.00176667,0.00176667,0.00166667,0.0019,0.00163333,0.00166667,0.00163333,0.0017,0.00196667,0.0021,0.00166667,0.0019,0.00173333,0.00166667,0.002,0.00233333,0.0017,0.00163333,0.00183333,0.0018,0.00186667,0.00183333,0.00186667,0.0019,0.0018,0.00166667,0.00196667,0.0018,0.00176667,0.0017,0.00166667,0.00163333,0.00166667,0.00163333,0.0017,0.00166667,0.0018,0.00166667,0.00173333,0.0017,0.0017,0.002,0.00176667,0.00166667,0.00186667,0.00166667,0.00183333,0.00163333,0.00176667,0.00183333,0.00183333,0.00173333,0.0018,0.0018,0.0017,0.00166667,0.0018,0.0018,0.0017,0.00186667,0.00166667,0.00193333,0.0018,0.0017,0.0019,0.00163333,0.00173333,0.00203333,0.0017,0.00166667,0.002,0.00163333,0.00186667,0.002,0.0019,0.00163333,0.0017,0.00186667,0.00163333,0.00173333,0.00166667,0.0018,0.00203333,0.00166667,0.00166667,0.00186667,0.0018,0.0018,0.00163333,0.0019,0.00186667,0.00166667,0.00203333,0.00176667,0.00166667,0.002,0.00163333,0.00186667,0.00183333,0.00186667,0.00176667,0.00166667,0.00196667,0.0021,0.0018,0.00186667,0.00193333,0.00196667,0.0017,0.00176667,0.0018,0.0017,0.0017,0.00183333,0.00173333,0.0021,0.002,0.0019,0.00193333,0.0018,0.00166667,0.00173333,0.00183333,0.0017,0.00176667,0.0018,0.00196667,0.00206667,0.0018,0.00173333,0.00186667,0.00173333,0.00186667,0.00166667,0.0017,0.00183333,0.00173333,0.0018,0.00163333,0.00203333,0.0018,0.00166667,0.00216667,0.00183333,0.0017,0.00166667,0.0018,0.00163333,0.00183333,0.0019,0.00163333,0.00183333,0.0018,0.00206667,0.0017,0.00176667,0.00196667,0.00163333,0.00166667,0.00206667,0.0017,0.0018,0.00196667,0.00166667,0.00193333,0.00173333,0.00166667,0.0017,0.00173333,0.0018,0.00206667,0.00166667,0.00166667,0.00186667,0.00163333,0.0017,0.00163333,0.00186667,0.00173333,0.00166667,0.00183333,0.0018,0.002,0.00163333,0.00176667,0.00206667,0.00173333,0.00183333,0.00173333,0.00163333,0.00173333,0.00183333,0.00163333,0.00163333,0.00203333,0.00173333,0.0017,0.00196667,0.00176667,0.00166667,0.00173333,0.0018,0.0018,0.00173333,0.00183333,0.0019,0.002,0.00176667,0.002,0.00173333,0.00203333,0.0018,0.00163333,0.00173333,0.00166667,0.00166667,0.00213333,0.00173333,0.00163333,0.00183333,0.002,0.00203333,0.00166667,0.0019,0.00173333,0.00166667,0.00193333,0.0017,0.00166667,0.00176667,0.0017,0.0017,0.00166667,0.00166667,0.0018,0.00163333,0.0017,0.00203333,0.00193333,0.00176667,0.00176667,0.0019,0.0018,0.00176667,0.00166667,0.0018,0.0019,0.00196667,0.00173333,0.00173333,0.0018,0.0018,0.00166667,0.00166667,0.00213333,0.0018,0.00206667,0.00173333,0.00163333,0.00166667,0.00166667,0.00176667,0.00166667,0.0017,0.00186667,0.00176667,0.00183333,0.00176667,0.002,0.00163333,0.0017,0.0018,0.00206667,0.00186667,0.0018,0.00173333,0.00196667,0.0017,0.0017,0.0018,0.00173333,0.0017,0.0017,0.00203333,0.0018,0.0018,0.00163333,0.00173333,0.00176667,0.00163333,0.0017,0.00213333,0.0017,0.00186667,0.00206667,0.00163333,0.0017,0.00176667,0.00163333,0.0021,0.0018,0.0021,0.00173333,0.00196667,0.00166667,0.00183333,0.0017,0.0018,0.00196667,0.00166667,0.00166667,0.00206667,0.00166667,0.00163333,0.00186667,0.0018,0.00186667,0.0017,0.00203333,0.00206667,0.00233333,0.0018,0.00216667,0.0017,0.00163333,0.0019,0.00173333,0.0018,0.00196667,0.00186667,0.0018,0.00176667,0.0017,0.0017,0.0018,0.0017,0.00176667,0.00166667,0.00166667,0.00196667,0.00186667,0.0021,0.00163333,0.0019,0.00176667,0.0018,0.00163333,0.00163333,0.810967,0}  

注意这里不是按照appid频率排序的, 所以还需要处理一下.

digoal=# select * from   
(select row_number() over(partition by r) as rn,ele from (select unnest(most_common_elems::text::int[]) ele,2 as r from pg_stats where tablename='test_2' and attname='appid') t) t1  
join  
(select row_number() over(partition by r) as rn,freq from (select unnest(most_common_elem_freqs) freq,2 as r from pg_stats where tablename='test_2' and attname='appid') t) t2  
on (t1.rn=t2.rn) order by t2.freq desc limit 20;  
 rn  | ele | rn  |    freq      
-----+-----+-----+------------  
   2 |   1 |   2 |   0.810967  
   8 |   7 |   8 |     0.8102  
   3 |   2 |   3 |   0.809233  
   4 |   3 |   4 |   0.808433  
  10 |   9 |  10 |   0.808367  
   7 |   6 |   7 |   0.808067  
   5 |   4 |   5 |   0.807467  
   6 |   5 |   6 |   0.806667  
   9 |   8 |   9 |   0.806233  
   1 |   0 |   1 |   0.590833  
  11 |  10 |  11 |   0.588033  
 474 | 939 | 474 | 0.00233333  
 169 | 348 | 169 | 0.00233333  
 138 | 281 | 138 |     0.0023  
  21 |  27 |  21 | 0.00223333  
  32 |  44 |  32 |     0.0022  
 476 | 942 | 476 | 0.00216667  
 296 | 593 | 296 | 0.00216667  
 112 | 227 | 112 | 0.00216667  
  56 | 108 |  56 | 0.00213333  
(20 rows)  

前10完全准确, 但是由于前8的记录数偏差太小, 所以前8的排名顺序可能不准确.

例如从统计信息中取出的排名 :

 rn  | ele | rn  |    freq      
-----+-----+-----+------------  
   2 |   1 |   2 |   0.810967  
   8 |   7 |   8 |     0.8102  
   3 |   2 |   3 |   0.809233  
   4 |   3 |   4 |   0.808433  
  10 |   9 |  10 |   0.808367  
   7 |   6 |   7 |   0.808067  
   5 |   4 |   5 |   0.807467  
   6 |   5 |   6 |   0.806667  
   9 |   8 |   9 |   0.806233  
   1 |   0 |   1 |   0.590833  
  11 |  10 |  11 |   0.588033  

实际排名 :

 appid | count    
-------+--------  
     9 | 872831  
     6 | 871908  
     3 | 871867  
     7 | 871551  
     8 | 871436  
     4 | 871391  
     1 | 871051  
     5 | 870770  
     2 | 870692  
    10 | 435583  
     0 | 435342  

非常OK , 这对于大数据的统计来说, 无疑是非常重要的参考.

注意

  1. 数据统计信息, 占用空间超过ARRAY_WIDTH_THRESHOLD的数组不会进入统计范畴.

src/backend/utils/adt/array_typanalyze.c

+/*  
+ * To avoid consuming too much memory, IO and CPU load during analysis, and/or  
+ * too much space in the resulting pg_statistic rows, we ignore arrays that  
+ * are wider than ARRAY_WIDTH_THRESHOLD (after detoasting!).  Note that this  
+ * number is considerably more than the similar WIDTH_THRESHOLD limit used  
+ * in analyze.c's standard typanalyze code.  
+ */  
+#define ARRAY_WIDTH_THRESHOLD 0x10000  
....  
+       /* Skip too-large values. */  
+       if (toast_raw_datum_size(value) > ARRAY_WIDTH_THRESHOLD)  
+           continue;  
+       else  
+           analyzed_rows++;  
相关实践学习
使用PolarDB和ECS搭建门户网站
本场景主要介绍基于PolarDB和ECS实现搭建门户网站。
阿里云数据库产品家族及特性
阿里云智能数据库产品团队一直致力于不断健全产品体系,提升产品性能,打磨产品功能,从而帮助客户实现更加极致的弹性能力、具备更强的扩展能力、并利用云设施进一步降低企业成本。以云原生+分布式为核心技术抓手,打造以自研的在线事务型(OLTP)数据库Polar DB和在线分析型(OLAP)数据库Analytic DB为代表的新一代企业级云原生数据库产品体系, 结合NoSQL数据库、数据库生态工具、云原生智能化数据库管控平台,为阿里巴巴经济体以及各个行业的企业客户和开发者提供从公共云到混合云再到私有云的完整解决方案,提供基于云基础设施进行数据从处理、到存储、再到计算与分析的一体化解决方案。本节课带你了解阿里云数据库产品家族及特性。
目录
相关文章
|
4月前
|
SQL 关系型数据库 数据库
PostgreSQL数据库报错 ERROR: multiple default values specified for column "" of table "" 如何解决?
PostgreSQL数据库报错 ERROR: multiple default values specified for column "" of table "" 如何解决?
431 59
|
SQL 关系型数据库 PostgreSQL
PostgreSQL 10.1 手册_部分 II. SQL 语言_第 7 章 查询_7.7. VALUES列表
7.7. VALUES列表 VALUES提供了一种生成“常量表”的方法,它可以被使用在一个查询中而不需要实际在磁盘上创建一个表。语法是: VALUES ( expression [, ...] ) [, ...] 每一个被圆括号包围的表达式列表生成表中的一行。
1244 0
|
固态存储 关系型数据库 Linux
|
SQL 关系型数据库 数据库
|
关系型数据库 数据库 PostgreSQL

热门文章

最新文章