DBMS_STATS.AUTO_SAMPLE_SIZE的值是什么?

简介: 昨天类总在微信公众号,给我留言,这是2014年写的一篇文章(http://blog.csdn.net/bisal/article/details/18910785#reply),看了一下,当时的实验和说明是,SQL> exec dbms_stats.gather_table_stats(ownname=>'SYS', tabname=>'T2');PL/SQL procedure successfully completed.查询dba_tables表,看到NUM_ROWS值是11218,说明此处采样比例是100%。

昨天类总在微信公众号,给我留言,



这是2014年写的一篇文章(http://blog.csdn.net/bisal/article/details/18910785#reply),看了一下,当时的实验和说明是,

SQL> exec dbms_stats.gather_table_stats(ownname=>'SYS', tabname=>'T2');
PL/SQL procedure successfully completed.

查询dba_tables表,看到NUM_ROWS值是11218,说明此处采样比例是100%。


这里必须纠正,我的说法有误,不能因为从dba_tables中看见了NUM_ROWS值和表实际记录数相同,就认为默认采样比例就是100%。崔老师书中说了,11g默认值是DBMS_STATS.AUTO_SAMPLE_SIZE。


从官方文档看,gather_table_stats的estimate_percent参数,取值范围是[0.000001,100]默认值是DBMS_STATS.AUTO_SAMPLE_SIZE,



AUTO_SAMPLE_SIZE是一个NUMBER类型的常量,默认值是0,表示采用自动采样算法,


问题来了,AUTO_SAMPLE_SIZE下Oracle采用的采样比例究竟是什么?究竟之前我所说的默认比例是100%,是否完全错误?


(1) 9i和10g的描述,How to Gather Optimizer Statistics on 10g (文档 ID 605439.1)

指出,

(1) 9i中ESTIMATE_PERCENT自动采样比例默认为100%。

(2) 10g中ESTIMATE_PERCENT自动采样比例默认为DBMS_STATS.AUTO_SAMPLE_SIZE,注意这指出该值非常小。

(3) 11g中ESTIMATE_PERCENT自动采样比例默认为DBMS_STATS.AUTO_SAMPLE_SIZE,注意这说的是一个相对larger的estimate percentage,一直到100%。


Note that on 11g, although using auto size for ESTIMATE_PERCENT tends to default to 100% ,because this is an auto sample, the engine may still decide to use a different sample size for tables and columns. This means that Column statistics could still be gathered with a small sample size and create a histogram that is missing key values. When ESTIMATE_PERCENT is set to a specific numeric value, that value will be used for both the table and columns. 


Additionally, even though a 100% sample is collected, the gathering process is really fast since a new hashing algorithm is used to compute the statistics rather than sorting (in 9i and 10g the "slow" part was typically the sorting). In 10g, support experience has shown that the default ESTIMATE_PERCENT sample size was extremely small which often resulted in poor statistics and is therefore not recommended.

这段描述说明,11g中ESTIMATE_PERCENT使用AUTO,会倾向于默认100%采样,由数据库引擎决定表和列,采样不同的采样比例。如果ESTIMATE_PERCENT设置了具体数值,则该值会应用于表和列。尽管100%采样,采集过程也会非常迅速,因为Oracle采用了一种新的HASH算法来计算统计信息,而不会像9i和10g中采用排序的方法,会显得非常slow。特别指出,10g中由于ESTIMATE_PERCENT默认值是一个非常非常小的数,通常会造成poor的统计信息,因此并不建议使用AUTO。


这篇文章则介绍了一些9i中ESTIMATE_PERCENT参数值设置的说明,

Using DBMS_STATS.GATHER_TABLE_STATS With ESTIMATE_PERCENT Parameter Samples All Rows (文档 ID 223065.1)

指出ESTIMATE_PERCENT有一个上限,超过则会采样所有行数据。9.0.1版本之前,这个上限是25%,9.0.1版本则是15%,从9.2版本开始,采样比例则由用户指定了。


(2) 11g的描述,How to Gather Optimizer Statistics on 11g (文档 ID 749227.1)

On 11g support suggests using the default DBMS_STATS.AUTO_SAMPLE_SIZE for ESTIMATE_PERCENT. This will generate estimate sample size of 100% for the table (if it is possible for this to fit within the maintenance window), even if that means that statistics are gathered on a reduced frequency. If this 100% sample is not feasible, then try using at least an estimate of 30%, however since 11g uses a hashing algorithm to compute the statistic, performance should be acceptable in most cases.


Generally, the accuracy of the statistics overall outweighs the day to day changes in most applications. See below for notes regarding earlier versions and this setting.

11g建议使用DBMS_STATS.AUTO_SAMPLE_SIZE,维护窗口内,尽可能按照100%完成自动采样,若100%的方式不适合,则会at least采用30%的采样比例。并且强调了11g采用了HASH算法,计算统计信息,因此几乎在所有场景下,性能都不是问题。


对于默认值,和上面10g文档描述是一样的,