PostgreSQL 统计信息pg_statistic格式及导入导出dump_stat - 兼容Oracle

本文涉及的产品
云数据库 RDS MySQL,集群系列 2核4GB
推荐场景:
搭建个人博客
RDS MySQL Serverless 基础系列,0.5-2RCU 50GB
云原生数据库 PolarDB 分布式版,标准版 2核8GB
简介:

标签

PostgreSQL , dump_stat , 统计信息 , 导出导入


背景

《PostgreSQL 规格评估 - 微观、宏观、精准 多视角估算数据库性能(选型、做预算不求人)》

EXPLAIN是PG数据库用于输出SQL执行计划的语法,

1、生成的执行计划中包含COST一项。

如果校准了成本因子,COST可以和SQL实际执行时间对其。因子校对的方法如下,实际上每一种硬件,我们只需要校对一遍即可。

《优化器成本因子校对(disk,ssd,memory IO开销精算) - PostgreSQL real seq_page_cost & random_page_cost in disks,ssd,memory》

《优化器成本因子校对 - PostgreSQL explain cost constants alignment to timestamp》

校对因子如下:

#seq_page_cost = 1.0                    # measured on an arbitrary scale    
random_page_cost = 1.2                  # same scale as above    
#cpu_tuple_cost = 0.01                  # same scale as above    
#cpu_index_tuple_cost = 0.005           # same scale as above    
#cpu_operator_cost = 0.0025             # same scale as above    
    
# 以下不需要校对, 不考虑并行计算SQL    
parallel_tuple_cost = 0.1               # same scale as above    
parallel_setup_cost = 1000.0            # same scale as above    
effective_cache_size = 10GB    

2、评估COST还需要依赖统计信息柱状图:

涉及reltuples, relpages. 表示评估的记录数以及占用多少个数据块。注意源头的block_size可能和PG的不一致,占用多少个块需要转换一下。(show block_size可以查看数据块大小。)   
    
postgres=# \d pg_class    
                     Table "pg_catalog.pg_class"    
       Column        |     Type     | Collation | Nullable | Default     
---------------------+--------------+-----------+----------+---------    
 relname             | name         |           | not null |   -- 对象名  
 relnamespace        | oid          |           | not null |   -- 对象所属的schema, 对应pg_namespace.oid  
 reltype             | oid          |           | not null |     
 reloftype           | oid          |           | not null |     
 relowner            | oid          |           | not null |     
 relam               | oid          |           | not null |     
 relfilenode         | oid          |           | not null |     
 reltablespace       | oid          |           | not null |     
 relpages            | integer      |           | not null |   -- 评估的页数(单位为block_size)  
 reltuples           | real         |           | not null |   -- 评估的记录数  
 relallvisible       | integer      |           | not null |     
 reltoastrelid       | oid          |           | not null |     
 relhasindex         | boolean      |           | not null |     
 relisshared         | boolean      |           | not null |     
 relpersistence      | "char"       |           | not null |     
 relkind             | "char"       |           | not null |     
 relnatts            | smallint     |           | not null |     
 relchecks           | smallint     |           | not null |     
 relhasoids          | boolean      |           | not null |     
 relhaspkey          | boolean      |           | not null |     
 relhasrules         | boolean      |           | not null |     
 relhastriggers      | boolean      |           | not null |     
 relhassubclass      | boolean      |           | not null |     
 relrowsecurity      | boolean      |           | not null |     
 relforcerowsecurity | boolean      |           | not null |     
 relispopulated      | boolean      |           | not null |     
 relreplident        | "char"       |           | not null |     
 relispartition      | boolean      |           | not null |     
 relfrozenxid        | xid          |           | not null |     
 relminmxid          | xid          |           | not null |     
 relacl              | aclitem[]    |           |          |     
 reloptions          | text[]       |           |          |     
 relpartbound        | pg_node_tree |           |          |     
Indexes:    
    "pg_class_oid_index" UNIQUE, btree (oid)    
    "pg_class_relname_nsp_index" UNIQUE, btree (relname, relnamespace)    
    "pg_class_tblspc_relfilenode_index" btree (reltablespace, relfilenode)    
    
涉及   空值比例、平均列宽、唯一值比例或个数、高频值以及频率、柱状图分布、存储相关性、多值列(高频元素及比例、元素柱状图分布)。    
    
-- 这个是视图:  
  
postgres=# \d pg_stats     
                     View "pg_catalog.pg_stats"    
         Column         |   Type   | Collation | Nullable | Default     
------------------------+----------+-----------+----------+---------    
 schemaname             | name     |           |          |   -- 对象所属的schema  
 tablename              | name     |           |          |   -- 对象名  
 attname                | name     |           |          |   -- 列名  
 inherited              | boolean  |           |          |   -- 是否为继承表的统计信息(false时表示当前表的统计信息,true时表示包含所有继承表的统计信息)  
 null_frac              | real     |           |          |   -- 该列空值比例  
 avg_width              | integer  |           |          |   -- 该列平均长度  
 n_distinct             | real     |           |          |   -- 该列唯一值个数(-1表示唯一,小于1表示占比,大于等于1表示实际的唯一值个数)  
 most_common_vals       | anyarray |           |          |   -- 该列高频词  
 most_common_freqs      | real[]   |           |          |   -- 该列高频词对应的出现频率  
 histogram_bounds       | anyarray |           |          |   -- 该列柱状图(表示隔出的每个BUCKET的记录数均等)  
 correlation            | real     |           |          |   -- 该列存储相关性(-1到1的区间),绝对值越小,存储越离散。小于0表示反向相关,大于0表示正向相关  
 most_common_elems      | anyarray |           |          |   -- 该列为多值类型(数组)时,多值元素的高频词  
 most_common_elem_freqs | real[]   |           |          |   -- 多值元素高频词的出现频率  
 elem_count_histogram   | real[]   |           |          |   -- 多值元素的柱状图中,每个区间的非空唯一元素个数  
    
-- 这个是实际存储的数据(也就是要导入的部分):  
-- https://www.postgresql.org/docs/10/static/catalog-pg-statistic.html  
  
postgres=# \d pg_statistic    
             Table "pg_catalog.pg_statistic"    
   Column    |   Type   | Collation | Nullable | Default     
-------------+----------+-----------+----------+---------    
 starelid    | oid      |           | not null |   -- 对象OID,对应pg_class.oid  
 staattnum   | smallint |           | not null |   -- 该列在表中的位置序号,对应pg_attribute.attnum  
 stainherit  | boolean  |           | not null |   -- 是否为继承表的统计信息(false时表示当前表的统计信息,true时表示包含所有继承表的统计信息)  
 stanullfrac | real     |           | not null |   -- 空值比例  
 stawidth    | integer  |           | not null |   -- 平均长度  
 stadistinct | real     |           | not null |   -- 唯一值个数、比例  
 stakind1    | smallint |           | not null |   -- 表示第1个SLOT的统计信息分类编号  
 stakind2    | smallint |           | not null |   -- 表示第2个SLOT的统计信息分类编号  
 stakind3    | smallint |           | not null |   -- 表示第3个SLOT的统计信息分类编号  
 stakind4    | smallint |           | not null |   -- 表示第4个SLOT的统计信息分类编号  
 stakind5    | smallint |           | not null |   -- 表示第5个SLOT的统计信息分类编号  
 staop1      | oid      |           | not null |   -- 表示第1个SLOT的统计信息是用哪个operator生成的(例如统计柱状图边界,需要用到 "<" 这个操作符)  
 staop2      | oid      |           | not null |   -- 表示第2个SLOT的统计信息是用哪个operator生成的(例如统计柱状图边界,需要用到 "<" 这个操作符)  
 staop3      | oid      |           | not null |   -- 表示第3个SLOT的统计信息是用哪个operator生成的(例如统计柱状图边界,需要用到 "<" 这个操作符)  
 staop4      | oid      |           | not null |   -- 表示第4个SLOT的统计信息是用哪个operator生成的(例如统计柱状图边界,需要用到 "<" 这个操作符)  
 staop5      | oid      |           | not null |   -- 表示第5个SLOT的统计信息是用哪个operator生成的(例如统计柱状图边界,需要用到 "<" 这个操作符)  
 stanumbers1 | real[]   |           |          |   -- 表示第1个SLOT的以numeric[]为结果的统计信息,NULL说明这个SLOT分类没有numeric的统计信息。  
 stanumbers2 | real[]   |           |          |   -- 表示第2个SLOT的以numeric[]为结果的统计信息,NULL说明这个SLOT分类没有numeric的统计信息。  
 stanumbers3 | real[]   |           |          |   -- 表示第3个SLOT的以numeric[]为结果的统计信息,NULL说明这个SLOT分类没有numeric的统计信息。  
 stanumbers4 | real[]   |           |          |   -- 表示第4个SLOT的以numeric[]为结果的统计信息,NULL说明这个SLOT分类没有numeric的统计信息。  
 stanumbers5 | real[]   |           |          |   -- 表示第5个SLOT的以numeric[]为结果的统计信息,NULL说明这个SLOT分类没有numeric的统计信息。  
 stavalues1  | anyarray |           |          |   -- 表示第1个SLOT的以anyarray[]为结果的统计信息,NULL说明这个SLOT分类没有anyarray[]统计信息。数组类型为列的元素类型,或者列本身的类型。  
 stavalues2  | anyarray |           |          |   -- 表示第2个SLOT的以anyarray[]为结果的统计信息,NULL说明这个SLOT分类没有anyarray[]统计信息。数组类型为列的元素类型,或者列本身的类型。  
 stavalues3  | anyarray |           |          |   -- 表示第3个SLOT的以anyarray[]为结果的统计信息,NULL说明这个SLOT分类没有anyarray[]统计信息。数组类型为列的元素类型,或者列本身的类型。  
 stavalues4  | anyarray |           |          |   -- 表示第4个SLOT的以anyarray[]为结果的统计信息,NULL说明这个SLOT分类没有anyarray[]统计信息。数组类型为列的元素类型,或者列本身的类型。  
 stavalues5  | anyarray |           |          |   -- 表示第5个SLOT的以anyarray[]为结果的统计信息,NULL说明这个SLOT分类没有anyarray[]统计信息。数组类型为列的元素类型,或者列本身的类型。  
Indexes:    
    "pg_statistic_relid_att_inh_index" UNIQUE, btree (starelid, staattnum, stainherit)    

statkind的定义

src/include/catalog/pg_statistic.h

/*  
 * Currently, five statistical slot "kinds" are defined by core PostgreSQL,  
 * as documented below.  Additional "kinds" will probably appear in  
 * future to help cope with non-scalar datatypes.  Also, custom data types  
 * can define their own "kind" codes by mutual agreement between a custom  
 * typanalyze routine and the selectivity estimation functions of the type's  
 * operators.  
 *  
 * Code reading the pg_statistic relation should not assume that a particular  
 * data "kind" will appear in any particular slot.  Instead, search the  
 * stakind fields to see if the desired data is available.  (The standard  
 * function get_attstatsslot() may be used for this.)  
 */  
  
/*  
 * The present allocation of "kind" codes is:  
 *  
 *      1-99:           reserved for assignment by the core PostgreSQL project  
 *                              (values in this range will be documented in this file)  
 *      100-199:        reserved for assignment by the PostGIS project  
 *                              (values to be documented in PostGIS documentation)  
 *      200-299:        reserved for assignment by the ESRI ST_Geometry project  
 *                              (values to be documented in ESRI ST_Geometry documentation)  
 *      300-9999:       reserved for future public assignments  
 *  
 * For private use you may choose a "kind" code at random in the range  
 * 10000-30000.  However, for code that is to be widely disseminated it is  
 * better to obtain a publicly defined "kind" code by request from the  
 * PostgreSQL Global Development Group.  
 */  
  
/*  
 * In a "most common values" slot, staop is the OID of the "=" operator  
 * used to decide whether values are the same or not.  stavalues contains  
 * the K most common non-null values appearing in the column, and stanumbers  
 * contains their frequencies (fractions of total row count).  The values  
 * shall be ordered in decreasing frequency.  Note that since the arrays are  
 * variable-size, K may be chosen by the statistics collector.  Values should  
 * not appear in MCV unless they have been observed to occur more than once;  
 * a unique column will have no MCV slot.  
 */  
#define STATISTIC_KIND_MCV      1  
  
/*  
 * A "histogram" slot describes the distribution of scalar data.  staop is  
 * the OID of the "<" operator that describes the sort ordering.  (In theory,  
 * more than one histogram could appear, if a datatype has more than one  
 * useful sort operator.)  stavalues contains M (>=2) non-null values that  
 * divide the non-null column data values into M-1 bins of approximately equal  
 * population.  The first stavalues item is the MIN and the last is the MAX.  
 * stanumbers is not used and should be NULL.  IMPORTANT POINT: if an MCV  
 * slot is also provided, then the histogram describes the data distribution  
 * *after removing the values listed in MCV* (thus, it's a "compressed  
 * histogram" in the technical parlance).  This allows a more accurate  
 * representation of the distribution of a column with some very-common  
 * values.  In a column with only a few distinct values, it's possible that  
 * the MCV list describes the entire data population; in this case the  
 * histogram reduces to empty and should be omitted.  
 */  
#define STATISTIC_KIND_HISTOGRAM  2  
  
/*  
 * A "correlation" slot describes the correlation between the physical order  
 * of table tuples and the ordering of data values of this column, as seen  
 * by the "<" operator identified by staop.  (As with the histogram, more  
 * than one entry could theoretically appear.)  stavalues is not used and  
 * should be NULL.  stanumbers contains a single entry, the correlation  
 * coefficient between the sequence of data values and the sequence of  
 * their actual tuple positions.  The coefficient ranges from +1 to -1.  
 */  
#define STATISTIC_KIND_CORRELATION      3  
  
/*  
 * A "most common elements" slot is similar to a "most common values" slot,  
 * except that it stores the most common non-null *elements* of the column  
 * values.  This is useful when the column datatype is an array or some other  
 * type with identifiable elements (for instance, tsvector).  staop contains  
 * the equality operator appropriate to the element type.  stavalues contains  
 * the most common element values, and stanumbers their frequencies.  Unlike  
 * MCV slots, frequencies are measured as the fraction of non-null rows the  
 * element value appears in, not the frequency of all rows.  Also unlike  
 * MCV slots, the values are sorted into the element type's default order  
 * (to support binary search for a particular value).  Since this puts the  
 * minimum and maximum frequencies at unpredictable spots in stanumbers,  
 * there are two extra members of stanumbers, holding copies of the minimum  
 * and maximum frequencies.  Optionally, there can be a third extra member,  
 * which holds the frequency of null elements (expressed in the same terms:  
 * the fraction of non-null rows that contain at least one null element).  If  
 * this member is omitted, the column is presumed to contain no null elements.  
 *  
 * Note: in current usage for tsvector columns, the stavalues elements are of  
 * type text, even though their representation within tsvector is not  
 * exactly text.  
 */  
#define STATISTIC_KIND_MCELEM  4  
  
/*  
 * A "distinct elements count histogram" slot describes the distribution of  
 * the number of distinct element values present in each row of an array-type  
 * column.  Only non-null rows are considered, and only non-null elements.  
 * staop contains the equality operator appropriate to the element type.  
 * stavalues is not used and should be NULL.  The last member of stanumbers is  
 * the average count of distinct element values over all non-null rows.  The  
 * preceding M (>=2) members form a histogram that divides the population of  
 * distinct-elements counts into M-1 bins of approximately equal population.  
 * The first of these is the minimum observed count, and the last the maximum.  
 */  
#define STATISTIC_KIND_DECHIST  5  
  
/*  
 * A "length histogram" slot describes the distribution of range lengths in  
 * rows of a range-type column. stanumbers contains a single entry, the  
 * fraction of empty ranges. stavalues is a histogram of non-empty lengths, in  
 * a format similar to STATISTIC_KIND_HISTOGRAM: it contains M (>=2) range  
 * values that divide the column data values into M-1 bins of approximately  
 * equal population. The lengths are stored as float8s, as measured by the  
 * range type's subdiff function. Only non-null rows are considered.  
 */  
#define STATISTIC_KIND_RANGE_LENGTH_HISTOGRAM  6  
  
/*  
 * A "bounds histogram" slot is similar to STATISTIC_KIND_HISTOGRAM, but for  
 * a range-type column.  stavalues contains M (>=2) range values that divide  
 * the column data values into M-1 bins of approximately equal population.  
 * Unlike a regular scalar histogram, this is actually two histograms combined  
 * into a single array, with the lower bounds of each value forming a  
 * histogram of lower bounds, and the upper bounds a histogram of upper  
 * bounds.  Only non-NULL, non-empty ranges are included.  
 */  
#define STATISTIC_KIND_BOUNDS_HISTOGRAM  7  

那么这些统计信息如何导入导出呢?

导出导入统计信息

https://postgrespro.com/docs/postgresproee/9.6/dump-stat.html

dump_stat这个插件是PostgreSQL pro推出的兼容9.6版本的导出统计信息的插件。

代码如下:

https://github.com/postgrespro/postgrespro/tree/PGPRO9_6

https://github.com/postgrespro/postgrespro/tree/PGPRO9_6/contrib/dump_stat

导出

通过dump_stat导出(导出的结构已经是SQL形式),然后通过SQL导入。

$ psql test -A    
test=# \t    
test=# \o dump_stat.sql    
test=# select dump_statistic();    

pg_statistic的每一条记录,产生一条如下SQL:

WITH upsert as (    
  UPDATE pg_catalog.pg_statistic SET column_name = expression [, ...]    
  WHERE to_schema_qualified_relation(starelid) = t_relname    
    AND to_attname(t_relname, staattnum) = t_attname    
    AND to_atttype(t_relname, staattnum) = t_atttype    
    AND stainherit = t_stainherit    
  RETURNING *)    
ins as (    
  SELECT expression [, ...]    
  WHERE NOT EXISTS (SELECT * FROM upsert)    
    AND to_attnum(t_relname, t_attname) IS NOT NULL    
    AND to_atttype(t_relname, t_attname) = t_atttype)    
INSERT INTO pg_catalog.pg_statistic SELECT * FROM ins;    
    
where expression can be one of:    
    
array_in(array_text, type_name::regtype::oid, -1)    
value::type_name    

stat导入的实际例子

(表示public.test表的info列的统计信息,如果存在则更新,不存在则插入。)

WITH   
  upsert as (   
    UPDATE pg_catalog.pg_statistic   
    SET   
      stanullfrac = 0, stawidth = 4, stadistinct = -1, stakind1 = 2, stakind2 = 3, stakind3 = 0, stakind4 = 0, stakind5 = 0, staop1 = 'pg_catalog.<(pg_catalog.text, pg_catalog.text)'::regoperator, staop2 = 'pg_catalog.<(pg_catalog.text, pg_catalog.text)'::regoperator, staop3 = '0'::regoperator, staop4 = '0'::regoperator, staop5 = '0'::regoperator, stanumbers1 = NULL::real[], stanumbers2 = '{-0.5}'::real[], stanumbers3 = NULL::real[], stanumbers4 = NULL::real[], stanumbers5 = NULL::real[], stavalues1 = array_in('{abc,cde,test}', 'pg_catalog.text'::regtype, -1)::anyarray, stavalues2 = NULL::anyarray, stavalues3 = NULL::anyarray, stavalues4 = NULL::anyarray, stavalues5 = NULL::anyarray   
    WHERE to_schema_qualified_relation(starelid) = 'public.test' AND to_attname('public.test', staattnum) = 'info' AND to_atttype('public.test', staattnum) = 'pg_catalog.text' AND stainherit = false   
    RETURNING *  
  ),   
  ins as (   
    SELECT   
      'public.test'::regclass,   
      to_attnum('public.test', 'info'),   
      'false'::boolean,   
      0::real,   
      4::integer,   
      -1::real,   
      2,  -- stakind=2 表示柱状图  
      3,  -- stakind=3 表示相关性  
      0,   
      0,   
      0,   
      'pg_catalog.<(pg_catalog.text, pg_catalog.text)'::regoperator,   
      'pg_catalog.<(pg_catalog.text, pg_catalog.text)'::regoperator,   
      '0'::regoperator,   
      '0'::regoperator,   
      '0'::regoperator,   
      NULL::real[],   
      '{-0.5}'::real[],   
      NULL::real[],   
      NULL::real[],   
      NULL::real[],   
      array_in('{abc,cde,test}', 'pg_catalog.text'::regtype, -1)::anyarray,   
      NULL::anyarray,   
      NULL::anyarray,   
      NULL::anyarray,   
      NULL::anyarray   
    WHERE NOT EXISTS (SELECT * FROM upsert) AND to_attnum('public.test', 'info') IS NOT NULL AND to_atttype('public.test', 'info') = 'pg_catalog.text'  
  )   
INSERT INTO pg_catalog.pg_statistic SELECT * FROM ins;  

导入

1、修改postgresql.conf,允许修改系统表,重启数据库生效配置

vi postgresql.conf  
    
allow_system_table_mods=on  
  
pg_ctl restart -m fast  

2、导入统计信息

-- 1 pg_class    
    
update pg_class set reltuples=?, relpages=? where oid=?;    
    
-- 2 pg_statistic    
    
WITH upsert as (    
  UPDATE pg_catalog.pg_statistic SET column_name = expression [, ...]    
  WHERE to_schema_qualified_relation(starelid) = t_relname    
    AND to_attname(t_relname, staattnum) = t_attname    
    AND to_atttype(t_relname, staattnum) = t_atttype    
    AND stainherit = t_stainherit    
  RETURNING *)    
ins as (    
  SELECT expression [, ...]    
  WHERE NOT EXISTS (SELECT * FROM upsert)    
    AND to_attnum(t_relname, t_attname) IS NOT NULL    
    AND to_atttype(t_relname, t_attname) = t_atttype)    
INSERT INTO pg_catalog.pg_statistic SELECT * FROM ins;    
    
where expression can be one of:    
    
array_in(array_text, type_name::regtype::oid, -1)    
value::type_name    

3、导入完成后,将allow_system_table_mods设置为off,重启数据库。

dump_statistic代码

CREATE FUNCTION dump_statistic(relid oid) RETURNS SETOF TEXT AS $$  
        DECLARE  
                result  text;  
                  
                cmd             text;           -- main query  
                in_args text;           -- args for insert  
                up_args text;           -- args for upsert  
                  
                fstaop  text := '%s::regoperator';  
                arr_in  text := 'array_in(%s, %s::regtype, -1)::anyarray';  
                  
                stacols text[] = ARRAY['stakind', 'staop',  
                                                           'stanumbers', 'stavalues' ];  
                  
                r               record;  
                i               int;  
                j               text;  
                ncols   int := 26;      -- number of columns in pg_statistic  
                  
                stanum  text[];         -- stanumbers{1, 5}  
                staval  text[];         -- stavalues{1, 5}  
                staop   text[];         -- staop{1, 5}  
                  
                relname text;           -- quoted relation name  
                attname text;           -- quoted attribute name  
                atttype text;           -- quoted attribute type  
                  
        BEGIN  
                for r in  
                                select * from pg_catalog.pg_statistic  
                                where starelid = relid  
                                        and get_namespace(starelid) != to_namespace('information_schema')  
                                        and get_namespace(starelid) != to_namespace('pg_catalog') loop  
                          
                        relname := to_schema_qualified_relation(r.starelid);  
                        attname := quote_literal(to_attname(relname, r.staattnum));  
                        atttype := quote_literal(to_atttype(relname, r.staattnum));  
                        relname := quote_literal(relname); -- redefine relname  
                          
                        in_args := '';  
                        up_args = 'stanullfrac = %s, stawidth = %s, stadistinct = %s, ';  
                          
                        cmd := 'WITH upsert as ( ' ||  
                                                'UPDATE pg_catalog.pg_statistic SET %s ' ||  
                                                'WHERE to_schema_qualified_relation(starelid) = ' || relname || ' '  
                                                        'AND to_attname(' || relname || ', staattnum) = ' || attname || ' '  
                                                        'AND to_atttype(' || relname || ', staattnum) = ' || atttype || ' '  
                                                        'AND stainherit = ' || r.stainherit || ' ' ||  
                                                'RETURNING *), ' ||  
                                   'ins as ( ' ||  
                                                'SELECT %s ' ||  
                                                'WHERE NOT EXISTS (SELECT * FROM upsert) ' ||  
                                                        'AND to_attnum(' || relname || ', ' || attname || ') IS NOT NULL '  
                                                        'AND to_atttype(' || relname || ', ' || attname || ') = ' || atttype || ') '  
                                   'INSERT INTO pg_catalog.pg_statistic SELECT * FROM ins;';  
                                          
                        for i in 1..ncols loop  
                                in_args := in_args || '%s';  
  
                                if i != ncols then  
                                        in_args := in_args || ', ';  
                                end if;  
                        end loop;  
                                  
                        for j in 1..4 loop  
                                for i in 1..5 loop  
                                        up_args := up_args || format('%s%s = %%s', stacols[j], i);  
  
                                        if i * j != 20 then  
                                                up_args := up_args || ', ';  
                                        end if;  
                                end loop;  
                        end loop;  
                          
                        cmd := format(cmd, up_args, in_args);   --prepare template for main query  
  
                        staop := array[format(fstaop, quote_literal(to_schema_qualified_operator(r.staop1))),  
                                                   format(fstaop, quote_literal(to_schema_qualified_operator(r.staop2))),  
                                                   format(fstaop, quote_literal(to_schema_qualified_operator(r.staop3))),  
                                                   format(fstaop, quote_literal(to_schema_qualified_operator(r.staop4))),  
                                                   format(fstaop, quote_literal(to_schema_qualified_operator(r.staop5)))];  
  
                        stanum := array[r.stanumbers1::text,  
                                                        r.stanumbers2::text,  
                                                        r.stanumbers3::text,  
                                                        r.stanumbers4::text,  
                                                        r.stanumbers5::text];  
                                                          
                        for i in 1..5 loop  
                                if stanum[i] is null then  
                                        stanum[i] := 'NULL::real[]';  
                                else  
                                        stanum[i] := '''' || stanum[i] || '''::real[]';  
                                end if;  
                        end loop;  
  
                        if r.stavalues1 is not null then  
                                staval[1] := format(arr_in, quote_literal(r.stavalues1),  
                                                                        quote_literal(  
                                                                                to_schema_qualified_type(  
                                                                                        anyarray_elemtype(r.stavalues1))));  
                        else  
                                staval[1] := 'NULL::anyarray';  
                        end if;  
  
                        if r.stavalues2 is not null then  
                                staval[2] := format(arr_in, quote_literal(r.stavalues2),  
                                                                        quote_literal(  
                                                                                to_schema_qualified_type(  
                                                                                        anyarray_elemtype(r.stavalues2))));  
                        else  
                                staval[2] := 'NULL::anyarray';  
                        end if;  
  
                        if r.stavalues3 is not null then  
                                staval[3] := format(arr_in, quote_literal(r.stavalues3),  
                                                                        quote_literal(  
                                                                                to_schema_qualified_type(  
                                                                                        anyarray_elemtype(r.stavalues3))));  
                        else  
                                staval[3] := 'NULL::anyarray';  
                        end if;  
  
                        if r.stavalues4 is not null then  
                                staval[4] := format(arr_in, quote_literal(r.stavalues4),  
                                                                        quote_literal(  
                                                                                to_schema_qualified_type(  
                                                                                        anyarray_elemtype(r.stavalues4))));  
                        else  
                                staval[4] := 'NULL::anyarray';  
                        end if;  
  
                        if r.stavalues5 is not null then  
                                staval[5] := format(arr_in, quote_literal(r.stavalues5),  
                                                                        quote_literal(  
                                                                                to_schema_qualified_type(  
                                                                                        anyarray_elemtype(r.stavalues5))));  
                        else  
                                staval[5] := 'NULL::anyarray';  
                        end if;  
                          
                        --DEBUG  
                        --staop := array['{arr}', '{arr}', '{arr}', '{arr}', '{arr}'];  
                        --stanum := array['{num}', '{num}', '{num}', '{num}', '{num}'];  
                        --staval := array['{val}', '{val}', '{val}', '{val}', '{val}'];  
  
                        result := format(cmd,  
                                                         r.stanullfrac,  
                                                         r.stawidth,  
                                                         r.stadistinct,  
                                                         -- stakind  
                                                         r.stakind1, r.stakind2, r.stakind3, r.stakind4, r.stakind5,  
                                                         -- staop  
                                                         staop[1], staop[2], staop[3], staop[4], staop[5],  
                                                         -- stanumbers  
                                                         stanum[1], stanum[2], stanum[3], stanum[4], stanum[5],  
                                                         -- stavalues  
                                                         staval[1], staval[2], staval[3], staval[4], staval[5],  
                                                           
                                                         -- first 6 columns  
                                                         format('%s::regclass', relname),  
                                                         format('to_attnum(%s, %s)', relname, attname),  
                                                         '''' || r.stainherit || '''::boolean',  
                                                         r.stanullfrac || '::real',  
                                                         r.stawidth || '::integer',  
                                                         r.stadistinct || '::real',  
                                                         -- stakind  
                                                         r.stakind1, r.stakind2, r.stakind3, r.stakind4, r.stakind5,  
                                                         -- staop  
                                                         staop[1], staop[2], staop[3], staop[4], staop[5],  
                                                         -- stanumbers  
                                                         stanum[1], stanum[2], stanum[3], stanum[4], stanum[5],  
                                                         -- stavalues  
                                                         staval[1], staval[2], staval[3], staval[4], staval[5]);  
  
                        return next result;  
                end loop;  
  
                return;  
        END;  
$$ LANGUAGE plpgsql;  

我们甚至可以将Oracle数据库的统计信息,平移到PG数据库,对齐需要的元素即可:

记录数、占用多少个数据块。每列的空值比例、平均列宽、唯一值比例或个数、高频值以及频率、柱状图分布、存储相关性、多值列(高频元素及比例、元素柱状图分布)。

好处:在迁移ORACLE数据时,可以关闭autovacuumm(提高导入速度),通过这种方法来导入统计信息。(只要元素对应即可,当然有些元素可能是ORACLE中不采集的,比如多值列的统计信息)。

参考

https://github.com/postgrespro/postgrespro/tree/PGPRO9_6/contrib/dump_stat

相关实践学习
使用PolarDB和ECS搭建门户网站
本场景主要介绍基于PolarDB和ECS实现搭建门户网站。
阿里云数据库产品家族及特性
阿里云智能数据库产品团队一直致力于不断健全产品体系,提升产品性能,打磨产品功能,从而帮助客户实现更加极致的弹性能力、具备更强的扩展能力、并利用云设施进一步降低企业成本。以云原生+分布式为核心技术抓手,打造以自研的在线事务型(OLTP)数据库Polar DB和在线分析型(OLAP)数据库Analytic DB为代表的新一代企业级云原生数据库产品体系, 结合NoSQL数据库、数据库生态工具、云原生智能化数据库管控平台,为阿里巴巴经济体以及各个行业的企业客户和开发者提供从公共云到混合云再到私有云的完整解决方案,提供基于云基础设施进行数据从处理、到存储、再到计算与分析的一体化解决方案。本节课带你了解阿里云数据库产品家族及特性。
目录
相关文章
|
6月前
|
关系型数据库 分布式数据库 数据库
PolarDB PostgreSQL版:Oracle兼容的高性能数据库
PolarDB PostgreSQL版是一款高性能的数据库,具有与Oracle兼容的特性。它采用了分布式架构,可以轻松处理大量的数据,同时还支持多种数据类型和函数,具有高可用性和可扩展性。它还提供了丰富的管理工具和性能优化功能,为企业提供了可靠的数据存储和处理解决方案。PolarDB PostgreSQL版在数据库领域具有很高的竞争力,可以满足各种企业的需求。
|
6月前
|
SQL Oracle 关系型数据库
Oracle-Oracle SQL Report (awrsqrpt.sql/awrsqrpi.sql)生成指定SQL的统计报表
Oracle-Oracle SQL Report (awrsqrpt.sql/awrsqrpi.sql)生成指定SQL的统计报表
87 0
|
2月前
|
Oracle NoSQL 关系型数据库
主流数据库对比:MySQL、PostgreSQL、Oracle和Redis的优缺点分析
主流数据库对比:MySQL、PostgreSQL、Oracle和Redis的优缺点分析
456 2
|
3月前
|
SQL 关系型数据库 PostgreSQL
PostgreSQL 如何通过身份证号码进行年龄段的统计?
【8月更文挑战第20天】PostgreSQL 如何通过身份证号码进行年龄段的统计?
487 2
|
3月前
|
SQL 关系型数据库 数据库
PostgreSQL常用命令,启动连接,pg_dump导入导出
PostgreSQL常用命令,启动连接,pg_dump导入导出
|
4月前
|
Oracle 关系型数据库
oracle收集统计信息,游标失效时间
Dbms_stats Invalidates Cursors in Auto_invalidate mode
46 0
|
6月前
|
人工智能 Oracle 关系型数据库
一篇文章弄懂Oracle和PostgreSQL的Database Link
一篇文章弄懂Oracle和PostgreSQL的Database Link
|
6月前
|
Oracle 关系型数据库 数据挖掘
Oracle常用系统函数之聚集函数:数据的统计大师
【4月更文挑战第19天】Oracle的聚集函数是数据统计的强大工具,包括`SUM`(求和)、`COUNT`(计数)、`AVG`(平均值)、`MAX`和`MIN`(最大值和最小值)、`STDDEV`(标准差)等。这些函数帮助我们汇总、分析数据,提供丰富的统计信息。熟练掌握这些函数,能提升数据分析效率,助你成为数据大师。
|
6月前
|
SQL Oracle 关系型数据库
常用数据库的分页语句(mySQL、oracle、PostgreSQL、SQL Server)
常用数据库的分页语句(mySQL、oracle、PostgreSQL、SQL Server)
|
11月前
|
SQL Oracle 关系型数据库
Oracle,Postgresql等数据库使用
Oracle,Postgresql等数据库简单使用
171 0
Oracle,Postgresql等数据库使用

相关产品

  • 云原生数据库 PolarDB
  • 云数据库 RDS PostgreSQL 版
  • 推荐镜像

    更多
    下一篇
    无影云桌面