latch: row cache objects 和cursor: pin S wait on X共同出现

本文涉及的产品
全局流量管理 GTM,标准版 1个月
云解析 DNS,旗舰版 1个月
公共DNS(含HTTPDNS解析),每月1000万次HTTP解析
简介:  Snap Id Snap Time Sessions Cursors/Session Begin Snap: 5980 11-Nov-13 16:00:37 1403  39.

 Snap Id Snap Time Sessions Cursors/Session
Begin Snap: 5980 11-Nov-13 16:00:37 1403  39.6
End Snap: 5981 11-Nov-13 16:31:29 1422  42.8
Elapsed:    30.86 (mins)   
DB Time:    3,660.60 (mins)   


Per Second Per Transaction
Redo size:  1,185,514.75  6,235.28
Logical reads:  683,361.22  3,594.17
Block changes:  2,972.67  15.63
Physical reads:  6,201.95  32.62
Physical writes:  621.67  3.27
User calls:  8,220.82  43.24
Parses:  2,793.81  14.69
Hard parses:  319.52  1.68
Sorts:  1,180.36  6.21
Logons:  12.61  0.07
Executes:  31,893.98  167.75
Transactions:  190.13 

Top 5 Timed Events

Event Waits Time(s) Avg Wait(ms) % Total Call Time Wait Class
latch: row cache objects  6,500,460  56,331  9  25.6 Concurrency
CPU time    50,195    22.9 
cursor: pin S wait on X  2,073,123  40,372  19  18.4 Concurrency
db file sequential read  5,758,293  17,702  3  8.1 User I/O
enq: TX - row lock contention  6,570  12,384  1,885  5.6 Application

可以看到DB_TIME比较高系统负载比较高,而等待集中在前3位,
我们先说
latch: row cache objects
根据v$latch_misses或者AWRRPT中可以看到其主要集中的位置为
row cache objects kqrpre: find obj 0 2,922,441 2,564,411
row cache objects kqreqd: reget 0 1,873,773 1,905,652
row cache objects kqreqd 0 1,682,121 2,008,663
同时可以看到其操作的对象是dc_object_ids
到底怎么查看其可以根据文档:
WAITEVENT: "latch: row cache objects" Reference Note (Doc ID 1550722.1)
我们同事纠结于DC_OBJECT_IDS到底什么时候访问,但是我觉得不管怎么说一定是过多的访问了数据字典我们可以从其位置
row cache objects kqrpre: find obj,row cache objects kqreqd: reget,row cache objects kqreqd进行入手
metalink有一个相关的说明
文档源:Contention on 'row cache objects' Latch After Upgrade from 10.2 to 11.2.0.3 (Doc ID 1485410.1)
The 'row cache objects' latch is called most frequently from 'kqrpre: find obj'. This module tries to find details of an object being parsed in the row cache. During parse, the row cache is searched. The process searches through a linked list protected by the 'row cache objects' latch to find the object in the cache that it needs. When other processes are also parsing and looking through the row cache this may cause contention; especially if the parse activity is excessive or unnecessary.

Top 5 Timed Foreground Events

Event                    Waits      Time(s) Avg wait (ms)  DB time Wait Class
latch: row cache objects 58,291,873 130,700             2    63.18 Concurrency
DB CPU                       60,563   29.27  
db file sequential read   2,994,299  14,461             5     6.99 User I/O
library cache lock            3,179   1,662           523     0.80 Concurrency
gc cr grant 2-way         1,233,503     981             1     0.47 Cluster

 Latch Name        Where	         NoWait Misses Sleeps     Waiter Sleeps
...
 row cache objects kqrpre: find obj                  0 10,332,326     8,906,124
 row cache objects kqreqd: reget                     0  7,888,165     8,503,625
 row cache objects kqreqd                            0  7,677,074     8,489,832
 row cache objects kqrso                             0      4,127         3,622
 row cache objects kqrpre: init complete             0        143            88


当然我们没有升级但是可以借鉴,其解决方案为:
1) Rewrite the application to use bind variables to lower the hard parse rate from 350/second
2) Enable the cursor_sharing = force
3) Do not use first_rows_x optimization and use all_rows instead
4) Set optimizer_feature_enabled = 10.2.0.4
5) Set event 10089 level 1 to disable index sorting.

视乎前2项都明确的说明来自硬解析的问题,而且给出了一个定量的值350,而我们这里是314。顺便说下此文档其列子如下:
 Latch Name        Where          NoWait Misses Sleeps     Waiter Sleeps
 row cache objects kqrpre: find obj                  0 10,332,326     8,906,124
 row cache objects kqreqd: reget                     0  7,888,165     8,503,625
 row cache objects kqreqd                            0  7,677,074     8,489,832
 row cache objects kqrso                             0      4,127         3,622
 row cache objects kqrpre: init complete             0        143            88
 他这里更加明显。但是我们这里毕竟没有超过350每秒的硬解析,所以好一些。
 然后我们再说
cursor: pin S wait on X 
这里参考的一个网上的文档,我相信这个文档也来自于METALINK如下:
cursor: pin S整体描述
cursor: pin S A session waits on this event when it wants to update a shared mutex pin and another session
is currently in the process of updating a shared mutex pin for the same cursor object. This wait event should
rarely be seen because a shared mutex pin update is very fast.(Wait Time: Microseconds)
 
--Parameter说明
P1 Hash value of cursor
 
P2 Mutex value
64 bit platforms
8 bytes are used.
Top 4 bytes hold the session id (if the mutex is held X)
Bottom 4 bytes hold the ref count (if the mutex is held S).
 
32 bit platforms
4 bytes are used.
Top 2 bytes hold the session id (if the mutex is held X)
Bottom 2 bytes hold the ref count (if the mutex is held S).
 
P3 Mutex where (an internal code locator) OR'd with Mutex Sleeps
 
--查询sql
SELECT a.*, s.sql_text
  FROM v$sql s,
       (SELECT sid,
               event,
               wait_class,
               p1 cursor_hash_value,
               p2raw Mutex_value,
               TO_NUMBER (SUBSTR (p2raw, 1, 8), 'xxxxxxxx') hold_mutex_x_sid
          FROM v$session_wait
         WHERE event LIKE 'cursor%') a
 WHERE s.HASH_VALUE = a.cursor_hash_value
 
cursor: pin S wait on X描述
- In previous versions of Oracle, library cache pin is protected by “library cache pin latch”.
- But in recent versions of Oracle(I believe it’s 10.2.0.2),
  library cache pin for the cursor LCO is protected by mutext.
- Mutex is allocated per LCO, so it enables fine-grained access control.
 
“cursor: pin S wait on X” wait event is mostly related to mutex and hard parse.
- When a process hard parses the SQL statement, it should acquire exclusive
  library cache pin for the corresponding LCO.
- This means that the process acquires the mutex in exclusive mode.
- Another process which also executes the same query needs to acquire the mutex
  but it’s being blocked by preceding process. The wait event is “cursor: pin S wait on X”.
 
--发生cursor: pin S wait on X原因
Frequent Hard Parses
If the frequency of Hard Parsing is extremely high, then contention can occur on this pin.
 
High Version Counts
When Version counts become excessive, a long chain of versions needs to
be examined and this can lead to contention on this event

可以看出MUTEX在某些情况下会代替LIBRARY CACHE PIN来保护LCO,如果要进行硬解析和ALTER 操作必须以X模式占用
LCO来建立或者修改,如果此时要访问这个LCO,也就是说软解析的时候要访问LCO,那么将会出现这样的等待,LCO实际
就是存储我们的游标信息,软解析共享的就是游标共享,而不需要为每个语句生成一个LCO。

所以这个问题我觉得解决的方案还是想办法降低硬解析:
1) Rewrite the application to use bind variables to lower the hard parse rate from 350/second
2) Enable the cursor_sharing = force
这里我们可以看到硬解析 350每秒 代表一个ORACLE认为不能容忍的程度,而我们这里是319,虽然没有达到350但是也达到了一个
较高的值,出现这样的情况是可能的。

相关文章
|
缓存 Oracle 关系型数据库