数月之前,一位新疆的同事反映客户的某套单节点数据库存在性能问题,某个SQL长时间运行没有结束,在该SQL运行期间大量回话处于latch free等待中。因为相隔遥遥千里,同事之间是通过Email交流的;他首先给我发来了该问题SQL的相关
explain_log
。其中该语句的执行计划如下:
该SQL涉及近四十个表和索引的链接操作,而且全部采用嵌套循环链接模式,是典型的siebel应用。这个SQL的执行计划十分复杂,我甚至不愿意去看它;既然从执行计划上进一步分析过于违背“惰性”,我首先想到的还是从wait interface入手,毕竟OWI是"useless"的。因为无法远程操作,反复收集信息会让整个过程十分缓慢,一口气问对方要了10046,errorstack,latches的dump信息,以及该数据库的完整RDA报告。 既然是latch free等待,那么总要先定位到相关的latch才好进一步分析,我们首先看10046 trace:
通过以上脚本处理,我们可以看到在执行过程中出现了各种latch free,出现最频繁的是p1=-4611685984823317640的栓。latch free等待事件中p1所代表的是会话所等待栓的地址。-4611685984823317640换算到16进制为C0000007D2F56B78。我们看看这个栓到底是何方神圣:
当会话需要pin住buffer header时它首先要获去buffer handle,得到buffer handle的过程中首先要抢占cache buffer handles栓,为了避免对于cache buffer handles栓的过度争用,每个会话被允许cache一小撮buffer handles,也叫保留集(reserved set)。该保留集的上限由隐式参数_db_handles_cached(默认为5)所控制,在此基础上会话在执行不是十分复杂的SQL时不必反复申请栓。 同时存在一些十分复杂的语句,需要同时pin住大量的缓存,此时若存在空闲的handle,会话被允许索要手头5个cached buffer handles以外的handle。也为了限制单个会话的资源占用量,Oracle通过内部算法总是尽量让每个进程能够pin住”合理份额“的buffer,由此单个会话同一时间所能持有的总handle数等于db_block_buffers/processes,隐式参数_cursor_db_buffers_pinned指出了该值。另cache buffer handles并没有子栓,仅有父栓本身,所以如果系统中有大量复杂SQL,会话所能cache的buffer handles远不足以满足需求时,该栓就成为整个系统性能的瓶颈,甚至可能出现潜在的死锁/饿死问题。 结合我们从trace中获取的实际信息,cache buffer handles的命中率为1-656272521/2734921346=76%;一般来说如果系统中某个栓的命中率低于95%时,我们就可以认为该栓的争用较为激烈了;而此处一个父栓的命中率到了80%以下,我们很难想象其性能恶化状态。 在同事发给我的RDA中探索,从resource limit栏目中可以看到该实例的会话高水位从未超过2500,而实例设定的processes值则为6000,这是一个很多初级DBA或施工方会犯的错误,认为设定高一些的processes值可以避免出现ora-00020错误从而高枕无忧,殊不知Oracle中许多隐式参数的分配是针对process数来计算的,资源总数是有限的,process数多则单个会话所能操控的理论资源数相应减少,这不仅在Oracle中,几乎是放之四海皆准的原理。 极有可能是过高的processes参数影响了_cursor_db_buffers_pinned,而该问题SQL除了十分复杂外还有全部采用nested loop链接方式的问题,这就要求同时时刻需pin住大量不同表和索引的buffer,于是在SQL执行伊始会话尝试pin住了一部分的buffer,但因为最终达到了_cursor_db_buffers_pinned所指定的上限,该会话在持有cache buffer handles栓的情况下处于hang状态,进一步导致了大量其他会话因latch free等待而阻塞。 当时给同事的最主要建议就是降低processes参数,从该系统的V$resource_limit动态视图来看,将processes设到3000就绰绰有余了;同时我们可以增大隐式参数_db_handles_cached以避免系统中对该栓的争用。 几天后,同事回信告诉我开发人员通过其他途径绕过了该问题,但我还是建议他在未来可能的offline计划中加入修改processes和_db_handles_cached参数的操作。 这个case发生几个月后,我无意中从MOS中发现以下note:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
|
-----------------------------------------------------------------------------------------------
| Id | Operation |
Name
|
Rows
| Bytes | Cost |
-----------------------------------------------------------------------------------------------
| 0 |
SELECT
STATEMENT | | 1 | 1956 | 13 |
|* 1 | FILTER | | | | |
| 2 | NESTED LOOPS
OUTER
| | 1 | 1956 | 13 |
| 3 | NESTED LOOPS
OUTER
| | 1 | 1942 | 12 |
| 4 | NESTED LOOPS
OUTER
| | 1 | 1924 | 11 |
| 5 | NESTED LOOPS | | 1 | 1906 | 10 |
| 6 | NESTED LOOPS | | 1 | 1856 | 9 |
| 7 | NESTED LOOPS | | 5 | 9055 | 8 |
| 8 | NESTED LOOPS
OUTER
| | 5 | 8575 | 7 |
| 9 | NESTED LOOPS
OUTER
| | 5 | 8280 | 6 |
| 10 | NESTED LOOPS
OUTER
| | 5 | 7785 | 5 |
| 11 | NESTED LOOPS
OUTER
| | 5 | 7290 | 4 |
| 12 | NESTED LOOPS
OUTER
| | 5 | 4130 | 3 |
|* 13 |
TABLE
ACCESS
BY
INDEX
ROWID| S_PROD_INT | 5 | 2960 | 2 |
|* 14 |
INDEX
RANGE SCAN | S_PROD_INT_M53 | 326 | | 5 |
| 15 |
TABLE
ACCESS
BY
INDEX
ROWID| S_PROD_LN | 1 | 234 | 1 |
|* 16 |
INDEX
UNIQUE
SCAN | S_PROD_LN_P1 | 1 | | |
| 17 |
TABLE
ACCESS
BY
INDEX
ROWID | S_PROD_INT_CRSE | 1 | 632 | 1 |
|* 18 |
INDEX
RANGE SCAN | S_PROD_INT_CRSE_U1 | 1 | | 1 |
| 19 |
TABLE
ACCESS
BY
INDEX
ROWID | S_LIT | 1 | 99 | 1 |
|* 20 |
INDEX
UNIQUE
SCAN | S_LIT_P1 | 1 | | |
| 21 |
TABLE
ACCESS
BY
INDEX
ROWID | S_LIT | 1 | 99 | 1 |
|* 22 |
INDEX
UNIQUE
SCAN | S_LIT_P1 | 1 | | |
| 23 |
TABLE
ACCESS
BY
INDEX
ROWID | S_PROD_INT_TNTX | 1 | 59 | 1 |
|* 24 |
INDEX
RANGE SCAN | S_PROD_INT_TNTX_U1 | 1 | | 1 |
| 25 |
TABLE
ACCESS
BY
INDEX
ROWID | S_VOD | 1 | 96 | 1 |
|* 26 |
INDEX
RANGE SCAN | S_VOD_U2 | 1 | | 2 |
|* 27 |
TABLE
ACCESS
BY
INDEX
ROWID | S_VOD_VER | 1 | 45 | 1 |
|* 28 |
INDEX
RANGE SCAN | S_VOD_VER_U1 | 2 | | 4 |
| 29 |
TABLE
ACCESS
BY
INDEX
ROWID | S_ISS_OBJ_DEF | 1 | 50 | 1 |
|* 30 |
INDEX
RANGE SCAN | S_ISS_OBJ_DEF_M3 | 1 | | 2 |
| 31 |
TABLE
ACCESS
BY
INDEX
ROWID | S_CTLG_CAT | 1 | 18 | 1 |
|* 32 |
INDEX
UNIQUE
SCAN | S_CTLG_CAT_P1 | 1 | | |
| 33 |
TABLE
ACCESS
BY
INDEX
ROWID | S_CTLG_CAT | 1 | 18 | 1 |
|* 34 |
INDEX
UNIQUE
SCAN | S_CTLG_CAT_P1 | 1 | | |
| 35 |
TABLE
ACCESS
BY
INDEX
ROWID | S_CTLG_CAT | 1 | 14 | 1 |
|* 36 |
INDEX
UNIQUE
SCAN | S_CTLG_CAT_P1 | 1 | | |
| 37 | NESTED LOOPS | | 1 | 137 | 7 |
| 38 | NESTED LOOPS | | 4 | 456 | 6 |
| 39 | NESTED LOOPS | | 4 | 364 | 5 |
| 40 | NESTED LOOPS | | 2 | 142 | 4 |
| 41 | NESTED LOOPS | | 1 | 53 | 3 |
| 42 | NESTED LOOPS | | 3 | 120 | 2 |
|* 43 |
INDEX
RANGE SCAN | S_CTLG_CAT_PROD_U1 | 3 | 57 | 2 |
|* 44 |
TABLE
ACCESS
BY
INDEX
ROWID | S_CTLG_CAT | 1 | 21 | 1 |
|* 45 |
INDEX
UNIQUE
SCAN | S_CTLG_CAT_P1 | 1 | | |
|* 46 |
TABLE
ACCESS
BY
INDEX
ROWID | S_CTLG | 1 | 13 | 1 |
|* 47 |
INDEX
UNIQUE
SCAN | S_CTLG_P1 | 1 | | |
| 48 |
TABLE
ACCESS
BY
INDEX
ROWID | S_PARTY_CTGRY | 2 | 36 | 1 |
|* 49 |
INDEX
RANGE SCAN | S_PARTY_CTGRY_F1 | 2 | | 1 |
| 50 |
TABLE
ACCESS
BY
INDEX
ROWID | S_PARTY_RPT_REL | 2 | 40 | 1 |
|* 51 |
INDEX
RANGE SCAN | S_PARTY_RPTREL_F50 | 2 | | 1 |
| 52 |
TABLE
ACCESS
BY
INDEX
ROWID | S_PARTY_PER | 1 | 23 | 1 |
|* 53 |
INDEX
RANGE SCAN | S_PARTY_PER_F1 | 1 | | 2 |
|* 54 |
INDEX
RANGE SCAN | S_PARTY_PER_U1 | 1 | 23 | 3 |
| 55 | NESTED LOOPS | | 1 | 157 | 8 |
| 56 | NESTED LOOPS | | 1 | 144 | 7 |
| 57 | NESTED LOOPS | | 1 | 125 | 6 |
| 58 | NESTED LOOPS | | 1 | 104 | 5 |
| 59 | NESTED LOOPS | | 1 | 86 | 4 |
| 60 | NESTED LOOPS | | 4 | 264 | 3 |
| 61 | NESTED LOOPS | | 2 | 86 | 2 |
|* 62 |
INDEX
RANGE SCAN | S_PARTY_PER_U1 | 2 | 46 | 4 |
| 63 |
TABLE
ACCESS
BY
INDEX
ROWID | S_POSTN | 1 | 20 | 1 |
|* 64 |
INDEX
UNIQUE
SCAN | S_POSTN_P1 | 1 | | |
|* 65 |
INDEX
RANGE SCAN | S_PARTY_PER_U1 | 2 | 46 | 3 |
|* 66 |
INDEX
RANGE SCAN | S_PARTY_RPT_REL_U2 | 1 | 20 | 1 |
|* 67 |
INDEX
RANGE SCAN | S_PARTY_CTGRY_U1 | 1 | 18 | 1 |
|* 68 |
TABLE
ACCESS
BY
INDEX
ROWID | S_CTLG_CAT | 1 | 21 | 1 |
|* 69 |
INDEX
UNIQUE
SCAN | S_CTLG_CAT_P1 | 1 | | |
|* 70 |
INDEX
RANGE SCAN | S_CTLG_CAT_PROD_U1 | 1 | 19 | 1 |
|* 71 |
TABLE
ACCESS
BY
INDEX
ROWID | S_CTLG | 1 | 13 | 1 |
|* 72 |
INDEX
UNIQUE
SCAN | S_CTLG_P1 | 1 | | |
| 73 | NESTED LOOPS | | 1 | 58 | 3 |
| 74 | NESTED LOOPS | | 3 | 135 | 2 |
|* 75 |
INDEX
RANGE SCAN | S_CTLG_CAT_PROD_U1 | 2 | 38 | 2 |
|* 76 |
TABLE
ACCESS
BY
INDEX
ROWID | S_CTLG_CAT | 1 | 26 | 1 |
|* 77 |
INDEX
UNIQUE
SCAN | S_CTLG_CAT_P1 | 1 | | |
|* 78 |
TABLE
ACCESS
BY
INDEX
ROWID | S_CTLG | 1 | 13 | 1 |
|* 79 |
INDEX
UNIQUE
SCAN | S_CTLG_P1 | 1 | | |
-----------------------------------------------------------------------------------------------
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
|
cat siebdb1_ora_15465.trc |grep
"nam='latch free'"
|awk
'{print $7}'
|sort|uniq -c
1 p1=-4611685984823284696
169 p1=-4611685984823317640
1 p1=-4611685988591416672
1 p1=-4611685988592487696
2 p1=-4611685988692060472
1 p1=-4611685988694684144
1 p1=-4611685988705526816
1 p1=-4611685988733017136
1 p1=-4611685988779484112
1 p1=-4611685988784059264
1 p1=-4611685988787329592
1 p1=-4611685988788323152
1 p1=-4611685988800312448
2 p1=-4611685988805347840
24 p1=-4611685988806714976
2 p1=-4611685988837854328
7 p1=-4611685988841028520
1 p1=-4611685988869432648
1 p1=-4611685988871320920
|
1
2
3
4
5
6
|
c0000007d2f56b78 cache buffer handles
level
=3
Location
from
where
latch
is
held: kcbzfs:
Context saved
from
call: 0
state=busy
gotten 2734921346 times wait, failed
first
656272521 sleeps 44808579
gotten 0 times nowait, failed: 0
|
Hdr: 9328032 10.2.0.3 RDBMS 10.2.0.3 BUFFER CACHE PRODID-5 PORTID-23 Abstract: INCREASE IN CACHE BUFFER CHAINS/HANDLES WAITS *** 01/29/10 05:41 am *** TAR: ---- 3-1375394871 PROBLEM: -------- Today, and despite no significant change in workload we have increased waits on these two latches:
29Jan2010_1030_1045 : Top 5 Timed Events
本文转自maclean_007 51CTO博客,原文链接:
http://blog.51cto.com/maclean/1277511