[20170601]distinct的优化.txt
1.环境:
SCOTT@book> @ &r/ver1
PORT_STRING VERSION BANNER
-------------------- ---------- --------------------------------------------------------------------------------
x86_64/Linux 2.4.xx 11.2.0.4.0 Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
create table t as select * from all_objects;
create index t_idx on t(owner,object_id);
--//分析表略.method_opt=>'for all columns size 1'.
select index_name,blevel,leaf_blocks,
distinct_keys,num_rows,
avg_leaf_blocks_per_key lf_per_key,
avg_data_blocks_per_key blks_per_key
from user_indexes
where index_name ='T_IDX';
INDEX_NAME BLEVEL LEAF_BLOCKS DISTINCT_KEYS NUM_ROWS LF_PER_KEY BLKS_PER_KEY
---------- ------ ----------- ------------- ---------- ---------- ------------
T_IDX 1 256 84774 84774 1 1
--//注意T表 owner is not null.
--//当我们执行select distinct owner from t;时正常情况下选择的执行计划如下:
alter session set statistics_level=all;
SCOTT@book> column owner format a30
SCOTT@book> select distinct owner from t;
OWNER
------------------------------
OWBSYS_AUDIT
MDSYS
CTXSYS
FLOWS_FILES
HR
OLAPSYS
OUTLN
OWBSYS
PUBLIC
APEX_030200
EXFSYS
ORACLE_OCM
SCOTT
SYSTEM
DBSNMP
OE
ORDPLUGINS
ORDSYS
PM
SH
SYSMAN
APPQOSSYS
BI
IX
ORDDATA
XDB
SI_INFORMTN_SCHEMA
SYS
WMSYS
29 rows selected.
SCOTT@book> @ &r/dpc '' ''
PLAN_TABLE_OUTPUT
-------------------------------------
SQL_ID g3ywa5u5raj7d, child number 0
-------------------------------------
select distinct owner from t
Plan hash value: 1741570181
----------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | | 74 (100)| | 29 |00:00:00.04 | 263 | | | |
| 1 | HASH UNIQUE | | 1 | 29 | 174 | 74 (5)| 00:00:01 | 29 |00:00:00.04 | 263 | 5686K| 1858K| 2222K (0)|
| 2 | INDEX FAST FULL SCAN| T_IDX | 1 | 84774 | 496K| 71 (0)| 00:00:01 | 84774 |00:00:00.01 | 263 | | | |
----------------------------------------------------------------------------------------------------------------------------------------------------
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
1 - SEL$1
2 - SEL$1 / T@SEL$1
--//这样相当于把索引当作表做快速扫描,然后使用HASH UNIQUE过滤需要的结果.主要成本消耗就是扫描整个索引的成本.
--//而我们可以想象还有1个更佳的算法,就是通过递规每次查询owner的最小值,而查询最小值的逻辑读是很小的.
with ownerlist(x) as
(
select min(owner) from t
union all
select (select min(owner) from t where t.owner > ownerlist.x ) from ownerlist where x is not null
) select * from ownerlist where x is not null ;
--//注意一定要加一个条件where x is not null,不然变成死循环.后面也要加where x is not null ;,不然记录会多1条null值.
SCOTT@book> @ &r/dpc '' ''
PLAN_TABLE_OUTPUT
-------------------------------------
SQL_ID frvat47pq8f2x, child number 0
-------------------------------------
with ownerlist(x) as ( select min(owner) from t union all
select (select min(owner) from t where t.owner > ownerlist.x ) from
ownerlist where x is not null ) select * from ownerlist where x is not
null
Plan hash value: 946542369
---------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time | A-Rows | A-Time | Buffers |
---------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | | 4 (100)| | 29 |00:00:00.01 | 33 |
|* 1 | VIEW | | 1 | 2 | 34 | 4 (0)| 00:00:01 | 29 |00:00:00.01 | 33 |
| 2 | UNION ALL (RECURSIVE WITH) BREADTH FIRST| | 1 | | | | | 30 |00:00:00.01 | 33 |
| 3 | SORT AGGREGATE | | 1 | 1 | 6 | | | 1 |00:00:00.01 | 2 |
| 4 | INDEX FULL SCAN (MIN/MAX) | T_IDX | 1 | 1 | 6 | 2 (0)| 00:00:01 | 1 |00:00:00.01 | 2 |
| 5 | SORT AGGREGATE | | 29 | 1 | 6 | | | 29 |00:00:00.01 | 31 |
| 6 | FIRST ROW | | 29 | 1 | 6 | 2 (0)| 00:00:01 | 28 |00:00:00.01 | 31 |
|* 7 | INDEX RANGE SCAN (MIN/MAX) | T_IDX | 29 | 1 | 6 | 2 (0)| 00:00:01 | 28 |00:00:00.01 | 31 |
| 8 | RECURSIVE WITH PUMP | | 30 | | | | | 29 |00:00:00.01 | 0 |
---------------------------------------------------------------------------------------------------------------------------------------------
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
1 - SET$1 / OWNERLIST@SEL$4
2 - SET$1
3 - SEL$1
4 - SEL$1 / T@SEL$1
5 - SEL$3
7 - SEL$3 / T@SEL$3
8 - SEL$2
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("X" IS NOT NULL)
7 - access("T"."OWNER">:B1)
--//执行计划看起来很复杂,实际上理解了逻辑还是很简单的.逻辑读=33,明显比前面少许多,比较适合重复值很多的distinct.
--//想要说明一点,实际上算法很重要,许多开发正是丢掉这些最基本的东西,写出来的sql语句就像中学生写的家庭作业.