[20160624]慎用nvarchar2数据类型.txt
--我以前的blog都写过谨慎使用nvarchar2类型,如果没有国际化需求,建议不要使用.
--而且这种类型可能导致另外的问题.正好最近优化一条sql语句,最后才发现问题所在.
1.环境:
SYSTEM@192.168.xx.yyy:1521/orcl> @ &r/ver1
PORT_STRING VERSION BANNER
------------------------------ -------------- --------------------------------------------------------------------------------
IBMPC/WIN_NT64-9.1.0 11.2.0.1.0 Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
UPDATE pacs_image
SET DELFLAGS = '1'
WHERE insuid = '1.2.840.10008.1.2.1000510567.20160624.4765453.0';
--执行计划走的是全表扫描,如果检查可以发现索引是建立的.我一看insuid的查询值就大概知道可能是直方图导致的问题.
SYSTEM@192.168.xx.yyy:1521/orcl> select * from (select substr(INSUID,1,32),count(*) from tjpacs.PACS_IMAGE group by substr(INSUID,1,32) order by 2 desc) where rownum<=10;
SUBSTR(INSUID,1,32) COUNT(*)
--------------------------------- ----------
1.3.12.2.1107.5.4.4.10319.300000 40893
1.3.12.2.1107.5.4.4.10278.300000 26867
1.3.12.2.1107.5.3.58.40106.12.20 15343
1.2.840.10008.1.2.160400058.2016 25
1.2.840.10008.1.2.1000342479.201 16
1.2.840.10008.1.2.1000471555.201 16
1.2.840.10008.1.2.1000353888.201 15
1.2.840.10008.1.2.1000325399.201 15
1.2.840.10008.1.2.1000450679.201 15
1.2.840.10008.1.2.1000407902.201 14
10 rows selected.
SYSTEM@192.168.xx.yyy:1521/orcl> select count(*) from tjpacs.PACS_IMAGE ;
COUNT(*)
----------
184870
--不管怎样,查询占14/184870应该可以很好的使用索引,但是当我查询直方图信息发现:
SYSTEM@192.168.xx.yyy:1521/orcl> select * from DBA_TAB_HISTOGRAMS where table_name='PACS_IMAGE' and column_name='INSUID';
OWNER TABLE_NAME COLUMN_NAME ENDPOINT_NUMBER ENDPOINT_VALUE ENDPOINT_A
------ ---------- -------------------- ---------------------------------- ---------------------------------- ----------
TJPACS PACS_IMAGE INSUID 2862 993852307125505000000000000000000
TJPACS PACS_IMAGE INSUID 5416 993852307130228000000000000000000
--仅仅2个backet,有点奇怪了.
SYSTEM@192.168.xx.yyy:1521/orcl> @ &r/desc tjpacs.PACS_IMAGE
Name Null? Type
------------ -------- ----------------------------
FILENAME NOT NULL NVARCHAR2(128)
HOSTNAME NOT NULL NVARCHAR2(20)
VIRTUAL_DIR NVARCHAR2(128)
STUDYID NOT NULL NUMBER
GROUPNO NOT NULL NUMBER
SERIESID NOT NULL NUMBER
IMAGEID NOT NULL NUMBER
INSUID NVARCHAR2(128)
IMAGENO NVARCHAR2(10)
IMAGEDATE DATE
IMAGEDES NVARCHAR2(128)
BODYPART NVARCHAR2(128)
IMAGESTATUS NVARCHAR2(1)
IMAGETYPE NVARCHAR2(4)
FILESIZE NUMBER
DELFLAGS NVARCHAR2(1)
VIEWTAGPATH NVARCHAR2(200)
--一看表定义很容易明白问题使用了nvarchar2类型.转储看看:
SYSTEM@192.168.xx.yyy:1521/orcl> select dump(INSUID,16) c70 ,insuid c60 from tjpacs.PACS_IMAGE where rownum<=1;
C70 C60
---------------------------------------------------------------------- ------------------------------------------------------------
Typ=1 Len=110: 0,31,0,2e,0,33,0,2e,0,31,0,32,0,2e,0,32,0,2e,0,31,0,31, 1.3.12.2.1107.5.4.4.10278.30000015111823402501500000000
0,30,0,37,0,2e,0,35,0,2e,0,34,0,2e,0,34,0,2e,0,31,0,30,0,32,0,37,0,38,
0,2e,0,33,0,30,0,30,0,30,0,30,0,30,0,31,0,35,0,31,0,31,0,31,0,38,0,32,
0,33,0,34,0,30,0,32,0,35,0,30,0,31,0,35,0,30,0,30,0,30,0,30,0,30,0,30,
0,30,0,30
--可以发现nvarchar2类型保存'1'需要2个字节.这样直方图仅仅分析16个字符.
SYSTEM@192.168.xx.yyy:1521/orcl> select * from (select substr(INSUID,1,16),count(*) from tjpacs.PACS_IMAGE group by substr(INSUID,1,16) order by 2 desc) where rownum<=10;
SUBSTR(INSUID,1,16) COUNT(*)
-------------------- --------
1.2.840.10008.1. 101766
1.3.12.2.1107.5. 83110
1.2.840.113619.2 4
--可以发现这样查询建立的直方图就是鸡肋.毫无用处,导致oracle认为重复值很多.取消直方图定义.
BEGIN
SYS.DBMS_STATS.GATHER_TABLE_STATS (
OwnName => 'TJPACS'
,TabName => 'PACS_IMAGE'
,Estimate_Percent => SYS.DBMS_STATS.AUTO_SAMPLE_SIZE
,Method_Opt => 'FOR ALL COLUMNS SIZE auto for columns INSUID size 1 '
,Degree => 4
,Cascade => TRUE
,No_Invalidate => FALSE);
END;
/
SELECT * FROM tjpacs.pacs_image WHERE insuid = '1.2.840.10008.1.2.1000510567.20160624.4765453.0';
SYSTEM@192.168.xx.yyy:1521/orcl> @ &r/dpc '' ''
PLAN_TABLE_OUTPUT
-------------------------------------
SQL_ID fmy1xha29r0dg, child number 0
-------------------------------------
SELECT * FROM tjpacs.pacs_image WHERE insuid =
'1.2.840.10008.1.2.1000510567.20160624.4765453.0'
Plan hash value: 3036732012
-------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time | A-Rows | A-Time | Buffers |
-------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | | 4 (100)| | 1 |00:00:00.01 | 5 |
| 1 | TABLE ACCESS BY INDEX ROWID| PACS_IMAGE | 1 | 1 | 487 | 4 (0)| 00:00:01 | 1 |00:00:00.01 | 5 |
|* 2 | INDEX RANGE SCAN | PACS_IMAGE$INSUID | 1 | 1 | | 3 (0)| 00:00:01 | 1 |00:00:00.01 | 4 |
-------------------------------------------------------------------------------------------------------------------------------------------
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
1 - SEL$1 / PACS_IMAGE@SEL$1
2 - SEL$1 / PACS_IMAGE@SEL$1
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("INSUID"=U'1.2.840.10008.1.2.1000510567.20160624.4765453.0')
--OK现在可以正常使用索引了.当然如果oracle如果晚上自动分析问题就还原了.优化这个问题需要控制直方图建立.
exec dbms_stats.set_table_prefs('tjpacs','.pacs_image','method_opt','FOR ALL COLUMNS SIZE AUTO FOR columns INSUID size 1');
SYSTEM@192.168.xx.yyy:1521/orcl> exec dbms_stats.set_table_prefs('tjpacs','pacs_image','method_opt','FOR ALL COLUMNS SIZE AUTO FOR columns INSUID size 1');
PL/SQL procedure successfully completed.
SYSTEM@192.168.xx.yyy:1521/orcl> select * from SYS.OPTSTAT_USER_PREFS$;
OBJ# PNAME VALNUM VALCHAR CHGTIME SPARE1
----- ---------- ---------- ------------------------------------------------------------ --------------------------------- ------
73788 METHOD_OPT FOR ALL COLUMNS SIZE AUTO FOR COLUMNS INSUID SIZE 1 2016-06-24 15:35:10.005000 +08:00
--另外这样建立的索引键值太长,索引占用空间也很大.
SYSTEM@192.168.xx.yyy:1521/orcl> validate index tjpacs.PACS_IMAGE$INSUID;
Index analyzed.
SYSTEM@192.168.xx.yyy:1521/orcl> @ &r/i
HEIGHT BLOCKS NAME LF_ROWS LF_BLKS LF_ROWS_LEN LF_BLK_LEN BR_ROWS BR_BLKS BR_ROWS_LEN BR_BLK_LEN DEL_LF_ROWS DEL_LF_ROWS_LEN DISTINCT_KEYS35M
---------- ---------- ---------- ---------- ---------- ----------- ---------- ---------- ---------- ----------- ---------- ----------- --------------- -------------
3 4480 PACS_IMAGE 184892 4317 20799904 8000 4316 84 372844 8032 2 244 181921
$INSUID
MOST_REPEATED_KEY BTREE_SPACE USED_SPACE PCT_USED ROWS_PER_KEY BLKS_GETS_PER_ACCESS PRE_ROWS PRE_ROWS_LEN OPT_CMPR_COUNT OPT_CMPR_PCTSAVE
----------------- ----------- ---------- ---------- ------------ -------------------- ---------- ------------ -------------- ----------------
3 35210688 21172748 61 1.01633126 4.00816563 0 0 0 0
--一个块仅仅保存4个键值,索引达到了35M,.而表104M.
总之:在生产系统要慎用Nvarchar2类型.不要打着什么国际化的幌子骗人,开发应该认真了解这种数据类型,实际上这样的程序就是垃圾.
再回头非常困难.