[20180301]模拟cursor pin S wait on X等待事件.txt
--//昨天看链接http://www.askmaclean.com/archives/cursor-pin-s-wait-on-x.html,自己也按照例子重复演示看看.
--//出现'cursor: pin * events'等待事件主要原因就是子光标太多,或者硬解析太多,还有就是一些sql语句太复杂,
--//导致分析时间太长,这些都是我的理解.
--//链接http://www.askmaclean.com/archives/cursor-pin-s-wait-on-x.html给出一些可能的情况.
该类等待事件一般是为了pin相关的子游标
'Cursor: pin S on X' 最常见的等待事件, 进程为了共享操作例如执行pin游标而以SHRD S mode申请mutex, 但是未立即获得。原因
是该游标被其他进程以EXCL X mode 持有了。
实际该 cursor: pin S wait on X等待事件往往是由于其他因素诱发的。Mutex争用仅仅是问题的症状,但根本原因需要Database
Consultant 进一步挖掘。
下面我们列出一些已知的常见案例, 在这些例子中可以看到 我上面提到的 Mutex的争用仅仅是伪争用:
过多的子游标 High Version Counts
过多的子游标版本Version Count可能导致Mutex 争用,一般一个SQL的Version Count不要高于500。
检查High Version Count很简单, 在AWR里就有SQL ordered by High Version Count,也可以写SQL查V$SQL、V$SQLAREA
昂贵的X$、V$视图查询
一些对于V$、X$视图的查询,需要访问X$KGL*之类的fixed table,可能触发Mutex争用。
--//我测试过几个会话访问v$sql视图,出现的是library cache: mutex X,链接:http://blog.itpub.net/267265/viewspace-2142625/
Mutex持有者得不到CPU
Mutex持有者若得不到足够的CPU片可能一直阻塞他人,直到它拿到需要的CPU。
这种情况可能由于OS操作系统的实际情况或者使用Resource Manager而引起。需要配合AWR中的Host CPU、Instance CPu一起看。
已经被KILLED的SESSION仍持有Mutex
当session正持有Mutex,而其对应的Process被强制KILL掉, 则直到PMON彻底清理掉该Dead Process并释放Mutex,其他session才能不再
等待。 诊断该类问题,最好能检查PMON的TRACE。 当然也存在部分BUG会导致PMON清理过程非常慢。
举例来说,bug 9312879描述了一种场景:PMON 需要获得某个Mutex以便清理某个dead process,但是该Mutex又被其他进程持有,则PMON
甚至无法开始真正清理并释放Mutex。
详见 《深入理解Oracle中的Mutex》一文
如何模拟 cursor pin S wait on X 等待事件
--//我自己也遇到一例:http://blog.itpub.net/267265/viewspace-2141699/,当时网络出现环路,导致后续调用相同语句hang住.
1.环境:
SCOTT@book> @ &r/ver1
PORT_STRING VERSION BANNER
------------------------------ -------------- --------------------------------------------------------------------------------
x86_64/Linux 2.4.xx 11.2.0.4.0 Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
2.测试:
--//session 1:
SCOTT@book> @ &r/s
SCOTT@book(274,7)> @ &r/spid
SID SERIAL# PROCESS SERVER SPID PID P_SERIAL# C50
---------- ---------- ------------------------ --------- ------ ------- ---------- --------------------------------------------------
274 7 36826 DEDICATED 36827 21 4 alter system kill session '274,7' immediate;
--//进程号=36826.
SCOTT@book(274,7)> var b1 number;
SCOTT@book(274,7)> exec :b1 :=10;
PL/SQL procedure successfully completed.
SCOTT@book(274,7)> select * from dept where deptno=:b1;
DEPTNO DNAME LOC
---------- -------------- -------------
10 ACCOUNTING NEW YORK
3.使用gdb跟踪session 1
$ gdb $(which oracle) 36827
...
(gdb) break kxsPeekBinds
Breakpoint 1 at 0x2123fa0
(gdb) command
Type commands for when breakpoint 1 is hit, one per line.
End with a line saying just "end".
>bt 4
>end
(gdb) cont
Continuing.
--//说明:break kxsPeekBinds 表示调用kxsPeekBinds中断,调用bt 4命令.
(gdb) help bt
Print backtrace of all stack frames, or innermost COUNT frames.
With a negative argument, print outermost -COUNT frames.
Use of the 'full' qualifier also prints the values of the local variables.
--//gdb不是很熟悉,按照提示:Print backtrace of all stack frames, or innermost COUNT frames.
--//session 1:
SCOTT@book(274,7)> select * from dept where deptno=:b1;
DEPTNO DNAME LOC
---------- -------------- -------------
10 ACCOUNTING NEW YORK
--//可以发现正常,因为我跟踪的调用绑定变量探查的函数,我以前的语句已经执行1次,这样第2次执行不会调用绑定变量窥视的函数.
--//修改select为Select ,这样语句会硬解析同时出现调用绑定变量窥视的函数.
SCOTT@book(274,7)> Select * from dept where deptno=:b1;
--//挂起.
--//gdb界面出现如下:
Breakpoint 1, 0x0000000002123fa0 in kxsPeekBinds ()
#0 0x0000000002123fa0 in kxsPeekBinds ()
#1 0x00000000025b821c in opitca ()
#2 0x0000000001ecf7a9 in kksSetBindType ()
#3 0x0000000009641a89 in kksfbc ()
--//session 3:
SYS@book> @ &r/wait
P1RAW P2RAW P3RAW P1 P2 P3 SID SERIAL# SEQ# EVENT STATUS STATE WAIT_TIME_MICRO SECONDS_IN_WAIT WAIT_CLASS
---------------- ---------------- ---------------- ---------- ---------- ---------- ---------- ---------- ---------- ---------------------------------------- -------- ------------------- --------------- --------------- --------------------
0000000062657100 0000000000000001 00 1650815232 1 0 274 7 58 SQL*Net message from client ACTIVE WAITED KNOWN TIME 120254317 182 Idle
--//并没有出现cursor pin S wait on X
--//session 2:
SCOTT@book> @ &r/s
SCOTT@book(28,25)> @ &r/spid
SID SERIAL# PROCESS SERVER SPID PID P_SERIAL# C50
---------- ---------- ------------------------ --------- ------ ------- ---------- --------------------------------------------------
28 25 37014 DEDICATED 37015 26 11 alter system kill session '28,25' immediate;
SCOTT@book> var b1 number;
SCOTT@book> exec :b1 :=20;
PL/SQL procedure successfully completed.
SCOTT@book(28,25)> Select * from dept where deptno=:b1;
--//挂起!!
SCOTT@book> select * from empy where t1 =:b1;
---//hang...在t1 = 之间加入空格.
--//session 3:
SYS@book> set numw 12
SYS@book> @ &r/wait
P1RAW P2RAW P3RAW P1 P2 P3 SID SERIAL# SEQ# EVENT STATUS STATE WAIT_TIME_MICRO SECONDS_IN_WAIT WAIT_CLASS
---------------- ---------------- ---------------- -------------- -------------- -------------- -------------- -------------- -------------- ---------------------------------------- -------- ------------------- --------------- --------------- --------------------
0000000062657100 0000000000000001 00 1650815232 1 0 274 7 58 SQL*Net message from client ACTIVE WAITED KNOWN TIME 120254317 437 Idle
00000000AB221382 0000011200000000 0000000500000000 2871137154 1176821039104 21474836480 28 25 42 cursor: pin S wait on X ACTIVE WAITING 54427696 54 Concurrency
SYS@book> select * from v$mutex_sleep;
MUTEX_TYPE LOCATION SLEEPS WAIT_TIME
-------------------------------- ---------------------------------------- -------------- --------------
Library Cache kgllkdl1 85 1 0
Cursor Pin kkslce [KKSCHLPIN2] 132160 0
SYS@book> select * from v$mutex_sleep;
MUTEX_TYPE LOCATION SLEEPS WAIT_TIME
-------------------------------- ---------------------------------------- -------------- --------------
Cursor Pin kkslce [KKSCHLPIN2] 144350 0
SYS@book> @ &r/mutex 5
old 18: ORDER BY sleeps DESC ) where rownum<= &1
new 18: ORDER BY sleeps DESC ) where rownum<= 5
HASH SLEEPS LOCATION MUTEX_TYPE SQLID KGLNAOWN C100
-------------- -------------- -------------------- -------------------- ------------- -------------------- -----------------------------------
2871137154 339228 kkslce [KKSCHLPIN2] Cursor Pin g0vgm2upk44w2 Select * from dept where deptno=:b1
--//停在LOCATION=kkslce,与前面gdb看到的一致.
--//做hanganalyze分析:
SYS@book> oradebug setmypid
Statement processed.
SYS@book> oradebug hanganalyze 3;
Hang Analysis in /u01/app/oracle/diag/rdbms/book/book/trace/book_ora_32088.trc
--//在执行中,看等待事件如下:
SYS@book> @ &r/wait
P1RAW P2RAW P3RAW P1 P2 P3 SID SERIAL# SEQ# EVENT STATUS STATE WAIT_TIME_MICRO SECONDS_IN_WAIT WAIT_CLASS
---------------- ---------------- ---------------- -------------- -------------- -------------- -------------- -------------- -------------- ---------------------------------------- -------- ------------------- --------------- --------------- --------------------
0000000062657100 0000000000000001 00 1650815232 1 0 274 7 58 SQL*Net message from client ACTIVE WAITED KNOWN TIME 120254317 656 Idle
00000000AB221382 0000011200000000 0000000500000000 2871137154 1176821039104 21474836480 28 25 45 cursor: pin S wait on X ACTIVE WAITING 57412911 57 Concurrency
00 00 00 0 0 0 94 25 21 ksdxexeotherwait INACTIVE WAITING 10567976 11 Other
--//检查跟踪文件:
*** 2018-03-01 10:22:19.845
===============================================================================
HANG ANALYSIS:
instances (db_name.oracle_sid): book.book
oradebug_node_dump_level: 3
analysis initiated by oradebug
os thread scheduling delay history: (sampling every 1.000000 secs)
0.000000 secs at [ 10:22:19 ]
NOTE: scheduling delay has not been sampled for 0.231213 secs 0.000000 secs from [ 10:22:15 - 10:22:20 ], 5 sec avg
0.000000 secs from [ 10:21:20 - 10:22:20 ], 1 min avg
0.000000 secs from [ 10:17:20 - 10:22:20 ], 5 min avg
vktm time drift history
===============================================================================
Chains most likely to have caused the hang:
[a] Chain 1 Signature: <not in a wait><='cursor: pin S wait on X'
Chain 1 Signature Hash: 0x3a7b30c
[b] Chain 2 Signature: 'LNS ASYNC end of log'
Chain 2 Signature Hash: 0x8ceed34f
===============================================================================
Non-intersecting chains:
-------------------------------------------------------------------------------
Chain 1:
-------------------------------------------------------------------------------
Oracle session identified by:
{
instance: 1 (book.book)
os id: 37015
process id: 26, oracle@gxqyydg4 (TNS V1-V3)
session id: 28
session serial #: 25
}
is waiting for 'cursor: pin S wait on X' with wait info:
{
p1: 'idn'=0xab221382
p2: 'value'=0x11200000000
p3: 'where'=0x500000000
time in wait: 46.772297 sec
heur. time in wait: 5 min 34 sec
timeout after: never
wait id: 44
blocking: 0 sessions
current sql: Select * from dept where deptno=:b1
short stack: ksedsts()+465<-ksdxfstk()+32<-ksdxcb()+1927<-sspuser()+112<-__sighandler()<-__select()+19<-skgpwwait()+332<-kgxWait()+774<-kgxSharedExamine()+568<-kxsGetRuntimeLock()+240<-kkscsCheckCursor()+556<-kkscsSearchChildList()+1171<-kksfbc()+12417<-kksp
wait history:
* time between current wait and wait #1: 0.000021 sec
1. event: 'cursor: pin S wait on X'
time waited: 1 min 12 sec
wait id: 43 p1: 'idn'=0xab221382
p2: 'value'=0x11200000000
p3: 'where'=0x500000000
* time between wait #1 and #2: 0.000021 sec
2. event: 'cursor: pin S wait on X'
time waited: 1 min 11 sec
wait id: 42 p1: 'idn'=0xab221382
p2: 'value'=0x11200000000
p3: 'where'=0x500000000
* time between wait #2 and #3: 0.000020 sec
3. event: 'cursor: pin S wait on X'
time waited: 1 min 11 sec
wait id: 41 p1: 'idn'=0xab221382
p2: 'value'=0x11200000000
p3: 'where'=0x500000000
}
and is blocked by
=> Oracle session identified by:
{
instance: 1 (book.book)
os id: 36827
process id: 21, oracle@gxqyydg4 (TNS V1-V3)
session id: 274
session serial #: 7
}
which is not in a wait:
{
last wait: 8 min 44 sec ago
blocking: 1 session
current sql: Select * from dept where deptno=:b1
short stack: <none: error encountered - ORA-32516: cannot wait for process 'Unix process pid: 36827, image: oracle@gxqyydg4 (TNS V1-V3)' to finish executing ORADEBUG command 'SHORT_STACK'; wait time exceeds 30000 ms>
wait history:
1. event: 'SQL*Net message from client'
time waited: 2 min 0 sec
wait id: 57 p1: 'driver id'=0x62657100
p2: '#bytes'=0x1
* time between wait #1 and #2: 0.000101 sec
2. event: 'SQL*Net message to client'
time waited: 0.000004 sec
wait id: 56 p1: 'driver id'=0x62657100
p2: '#bytes'=0x1
* time between wait #2 and #3: 0.000384 sec
3. event: 'SQL*Net message from client'
time waited: 2 min 1 sec
wait id: 55 p1: 'driver id'=0x62657100
p2: '#bytes'=0x1
}
Chain 1 Signature: <not in a wait><='cursor: pin S wait on X'
Chain 1 Signature Hash: 0x3a7b30c
-------------------------------------------------------------------------------
...//省略
===============================================================================
Extra information that will be dumped at higher levels:
[level 4] : 1 node dumps -- [LEAF] [LEAF_NW]
[level 5] : 2 node dumps -- [NO_WAIT] [INVOL_WT] [SINGLE_NODE] [NLEAF] [SINGLE_NODE_NW]
State of ALL nodes
([nodenum]/cnode/sid/sess_srno/session/ospid/state/[adjlist]):
[27]/1/28/25/0x85f9f0a0/37015/NLEAF/[273]
[273]/1/274/7/0x8595c500/36827/LEAF_NW/
[299]/1/300/15/0x85984060/36369/SINGLE_NODE/
*** 2018-03-01 10:22:19.848
===============================================================================
END OF HANG ANALYSIS
===============================================================================
*** 2018-03-01 10:22:19.849
===============================================================================
HANG ANALYSIS DUMPS:
oradebug_node_dump_level: 3
===============================================================================
State of LOCAL nodes
([nodenum]/cnode/sid/sess_srno/session/ospid/state/[adjlist]):
[27]/1/28/25/0x85f9f0a0/37015/NLEAF/[273]
[273]/1/274/7/0x8595c500/36827/LEAF_NW/
[299]/1/300/15/0x85984060/36369/SINGLE_NODE/
No processes qualify for dumping.
===============================================================================
HANG ANALYSIS DUMPS: END
===============================================================================
*** 2018-03-01 10:22:19.849
Oradebug command 'hanganalyze 3' console output:
Hang Analysis in /u01/app/oracle/diag/rdbms/book/book/trace/book_ora_37100.trc
--//对比等待事件,可以对上:
SYS@book> @ &r/wait
P1RAW P2RAW P3RAW P1 P2 P3 SID SERIAL# SEQ# EVENT STATUS STATE WAIT_TIME_MICRO SECONDS_IN_WAIT WAIT_CLASS
---------------- ---------------- ---------------- -------------- -------------- -------------- -------------- -------------- -------------- ---------------------------------------- -------- ------------------- --------------- --------------- --------------------
0000000062657100 0000000000000001 00 1650815232 1 0 274 7 58 SQL*Net message from client ACTIVE WAITED KNOWN TIME 120254317 949 Idle
00000000AB221382 0000011200000000 0000000500000000 2871137154 1176821039104 21474836480 28 25 49 cursor: pin S wait on X ACTIVE WAITING 63781034 64 Concurrency
---sid,serial#= 274,7 28,25,
SYS@book> select spid from v$process where addr in (select paddr from v$SESSION where sid in (274,28));
SPID
------
36827
37015
--//进程号也能对上.
--//转载:http://www.oracleblog.org/working-case/deal-with-row-cache-lock/
--//我们来看看([nodenum]/cnode/sid/sess_srno/session/ospid/state/start/finish/[adjlist]/predecessor)部分:
nodenum是hanganalyze自己为了记录这些会话而定制的编号,从0开始排起。
cnode是Node Id
sid是 Session ID
sess_srno是serial#
proc_ptr是Process Pointer
ospid 是OS Process ID
state 是node的状态
adjlist是临近的node(通常代表一个blocker node)
predecessor是Predecessor node ,通常代表一个 waiter node
state部分有几个比较重要的状态:
1.IN_HANG:这表示该node处于死锁状态,通常还有其他node(blocker)也处于该状态
2.LEAF/LEAF_NW:该node通常是blocker。通过条目的"predecessor"列可以判断这个node是否是blocker。LEAF说明该NODE没有等待其他
资源,而LEAF_NW则可能是没有等待其他资源或者是在使用CPU.
3.NLEAF:通常可以看作这些会话是被阻塞的资源。发生这种情况一般说明数据库发生性能问题而不是数据库hang
4.IGN/IGN_DMP:这类会话通常被认为是空闲会话,除非其adjlist列里存在node。如果是非空闲会话则说明其adjlist里的node正在等待其他node释放资源。
5.SINGLE_NODE/SINGLE_NODE_NW:近似于空闲会话
--//也可以执行如下,链接:http://blog.itpub.net/267265/viewspace-2141699/
/* Formatted on 2018/3/1 10:28:02 (QP5 v5.252.13127.32867) */
SELECT s.inst_id AS inst
,s.sid AS blocked_sid
,s.username AS blocked_user
,sa.sql_id AS blocked_sql_id
,TRUNC (s.p2 / 4294967296) AS blocking_sid
,b.username AS blocking_user
,b.sql_id AS blocking_sql_id
FROM gv$session s
JOIN gv$sqlarea sa ON sa.hash_value = s.p1
JOIN gv$session b
ON TRUNC (s.p2 / 4294967296) = b.sid AND s.inst_id = b.inst_id
JOIN gv$sqlarea sa2 ON b.sql_id = sa2.sql_id
WHERE s.event = 'cursor: pin S wait on X';
INST BLOCKED_SID BLOCKED_USER BLOCKED_SQL_I BLOCKING_SID BLOCKING_USER BLOCKING_SQL_
-------------- -------------- ------------------------------ ------------- -------------- ------------------------------ -------------
1 28 SCOTT g0vgm2upk44w2 274 SCOTT g0vgm2upk44w2
--//sid=274阻塞了sid=28.
--//在gdb界面选择继续:
(gdb) cont
Continuing.
Program received signal SIGUSR2, User defined signal 2.
0x0000000002123fa0 in kxsPeekBinds ()
(gdb) cont
Continuing.
Program received signal SIGSEGV, Segmentation fault.
0x0000000009805bd5 in slaac_int ()
(gdb) cont
Continuing.
Breakpoint 1, 0x0000000002123fa0 in kxsPeekBinds ()
#0 0x0000000002123fa0 in kxsPeekBinds ()
#1 0x00000000025b821c in opitca ()
#2 0x0000000001ecf7a9 in kksSetBindType ()
#3 0x0000000009641a89 in kksfbc ()
(gdb) cont
Continuing.
--//知道session 1执行完成,在看session 2,执行也ok.
--//session 1:
SCOTT@book(274,7)> Select * from dept where deptno=:b1;
DEPTNO DNAME LOC
---------- -------------- -------------
10 ACCOUNTING NEW YORK
--//session 2:
SCOTT@book(28,25)> Select * from dept where deptno=:b1;
= DEPTNO DNAME LOC
---------- -------------- -------------
20 RESEARCH DALLAS
--//session 3:
SYS@book> @ &r/wait
no rows selected
4.补充说明:
cursor: pin S wait on X 的p1,p2,p3参数:
--//参考http://www.oratea.com/2017/02/03/single-task-message%e7%ad%89%e5%be%85%e4%ba%8b%e4%bb%b6/
P1=2871137154
P2=1176821039104
P3=21474836480
--//idn 通过以前的测试表示hash_value
--//PARAMETER2参数:
--//它的前一半(高位):就是要加Mutex的会话的SID,这里也就是274号会话。274号会话要申请Mutex。后一半(低位)是"引用计数"或模式。
SYS@book> @ &r/10to16 1176821039104
10 to 16 HEX REVERSE16
-------------- -----------------------------------
0011200000000 0x00000000-12010000
--//0x112 = 274,就是session 1.
SYS@book> select 1176821039104,32,trunc(1176821039104/power(2,32)) ,mod(1176821039104,power(2,32)) from dual;
1176821039104 32 TRUNC(1176821039104/POWER(2,32)) MOD(1176821039104,POWER(2,32))
-------------- -------------- -------------------------------- ------------------------------
1176821039104 32 274 0
SYS@book> select sql_id,sql_text from v$sql where hash_value=2871137154 ;
SQL_ID SQL_TEXT
------------- ------------------------------------------------------------
g0vgm2upk44w2 Select * from dept where deptno=:b1
5.测试脚本:
--//附上mutex.sql脚本:
# cat mutex.sql
column kglnaown format a20
column MUTEX_TYPE format a20
column kglnaobj format a100
column LOCATION format a20
select * from (
SELECT kglnahsh hash
,SUM (sleeps) sleeps
,location
,mutex_type
,kglobt03 sqlid
,kglnaown
,replace(kglnaobj,chr(13)) c100
--,SUBSTR (kglnaobj, 1, 140) object
FROM x$kglob, v$mutex_sleep_history
WHERE kglnahsh = mutex_identifier
GROUP BY kglnaobj
,kglobt03
,kglnaown
,kglnahsh
,location
,mutex_type
ORDER BY sleeps DESC ) where rownum<= &1;
# cat wait.sql
select p1raw,p2raw,p3raw,p1,p2,p3,sid,serial#,seq#,event,status,state,wait_time_micro,seconds_in_wait,wait_class
from v$session where ( wait_class<>'Idle' or (status='ACTIVE' and STATE='WAITED KNOWN TIME'))
and sid not in (select sid from v$mystat where rownum=1)
order by event ;