5月26日某客户告警日志中出现的ORA-00600 [4400],[48]错误记录,并产生了
trace文件
:,
从trace文件名就可以看出出错的是某个并行子进程,其最近调用堆栈为ktcsod->ktcddt->kgeasi(报错),ktcsod与ktcddt均为Oracle中内核事务控制模块函数 (Kernel Transaction Control),又该并行子进程最近等待为"SQL*Net message from dblink"事件,共等待476s(wait_time=476 seconds)。trace显示该并行会话包含分布式事务锁:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
|
*** SERVICE
NAME
:(ETL) 2010-05-26 16:45:45.930
*** SESSION ID:(262.12024) 2010-05-26 16:45:45.930
*** 2010-05-26 16:45:45.930
ksedmp: internal
or
fatal error
ORA-00600: internal error code, arguments: [4400], [48], [], [], [], [], [], []
ORA-10387: parallel query server interrupt (normal)
----- Call Stack Trace -----
calling call entry argument
values
in
hex
location type point (? means dubious value)
-------------------- -------- -------------------- ----------------------------
ksedst+001c bl ksedst1 0FFFFFFFF ? 000000020 ?
ksedmp+0290 bl ksedst 1048DFFC0 ?
ksfdmp+0018 bl 03F5B014
kgeriv+0108 bl _ptrgl
kgeasi+0118 bl kgeriv 10496FEC8 ? 000000002 ?
700000010008000 ? 110000AD0 ?
110190290 ?
ktcddt+012c bl kgeasi 110190110 ? 110450040 ?
113000001130 ? 200000002 ?
100000001 ? 000000000 ?
000000030 ? 000000013 ?
ktcsod+0384 bl ktcddt 0000003C0 ? 000000018 ?
1048D4808 ?
kssdch_stage+0758 bl _ptrgl
....................................................
----- End of Call Stack Trace -----
******************* Dumping process map ****************
377144 : ora_p001_ETL
100000000 93710K
read
/
exec
oracle
1100007ef 2372K
read
/write oracle
9fffffff0000000 44K
read
/
exec
/usr/ccs/bin/usla64
9fffffff000b30a 0K
read
/write /usr/ccs/bin/usla64
900000000375880 484K
read
/
exec
/usr/lib/liblvm.a[shr_64.o]
9001000a0121ac8 135K
read
/write /usr/lib/liblvm.a[shr_64.o]
900000000426180 74K
read
/
exec
/usr/lib/libcfg.a[shr_64.o]
9001000a00f5ed0 26K
read
/write /usr/lib/libcfg.a[shr_64.o]
90000000034e280 2K
read
/
exec
/usr/lib/libcrypt.a[shr_64.o]
9001000a00d4760 0K
read
/write /usr/lib/libcrypt.a[shr_64.o]
90000000069a7b0 4K
read
/
exec
/usr/lib/libc.a[aio_64.o]
9001000a022a340 0K
read
/write /usr/lib/libc.a[aio_64.o]
90000000035f300 85K
read
/
exec
/usr/lib/libodm.a[shr_64.o]
9001000a00d5c08 35K
read
/write /usr/lib/libodm.a[shr_64.o]
9000000003ef080 83K
read
/
exec
/usr/lib/libperfstat.a[shr_64.o]
9001000a01d0818 9K
read
/write /usr/lib/libperfstat.a[shr_64.o]
900000000358000 0K
read
/
exec
/usr/lib/libdl.a[shr_64.o]
9001000a0219000 0K
read
/write /usr/lib/libdl.a[shr_64.o]
9000000007ff100 8588K
read
/
exec
/oracle/product/10.2.0/lib/libjox10.a[shr.o]
8001000a0000038 585K
read
/write /oracle/product/10.2.0/lib/libjox10.a[shr.o]
9000000004a0000 228K
read
/
exec
/usr/lib/libpthreads.a[shr_xpg5_64.o]
9001000a0144000 558K
read
/write /usr/lib/libpthreads.a[shr_xpg5_64.o]
900000000045500 2966K
read
/
exec
/usr/lib/libc.a[shr_64.o]
9001000a0000788 844K
read
/write /usr/lib/libc.a[shr_64.o]
Total 110843K
*******************
End
of
process map dump ****************
===================================================
PROCESS STATE
-------------
Process
global
information:
process: 7000000cf48bf98, call: 7000000b40ebdf0, xact: 7000000cdece7a8, curses: 7000000cf666540, usrses: 7000000cf666540
----------------------------------------
SO: 7000000cf48bf98, type: 2, owner: 0, flag: INIT/-/-/0x00
(process) Oracle pid=43, calls cur/
top
: 7000000b40ebdf0/7000000b40ebdf0, flag: (0) -
int
error: 0, call error: 0, sess error: 0, txn error 0
(post info)
last
post received: 115 0 4
last
post received-location: kslpsr
last
process
to
post me: 7000000cf479c38 1 6
last
post sent: 0 0 251
last
post sent-location: kxfpqr: QC
last
process posted
by
me: 7000000cf48a7f8 10 0
(latch info) wait_event=0 bits=0
Process
Group
:
DEFAULT
, pseudo proc: 7000000cf50bd50
O/S info:
user
: oracle, term: UNKNOWN, ospid: 377144
OSD pid info: Unix process pid: 377144, image: oracle@etl_a (P001)
SO: 7000000cf666540, type: 4, owner: 7000000cf48bf98, flag: INIT/-/-/0x00
(session) sid: 262 trans: 7000000cdece7a8, creator: 7000000cf48bf98, flag: (c0100041) USR/- BSY/-/-/DEL/-/-
DID: 0001-0028-000FFB1E, short-term DID: 0001-002B-00151DA1
txn branch: 7000000cdf412c8
oct: 0, prv: 0, sql: 0, psql: 0,
user
: 26/ETL
O/S info:
user
: bo, term: pts/0, ospid: 377144, machine: etl_a
program: oracle@etl_a (P001)
last
wait
for
'SQL*Net message from dblink'
blocking sess=0x0 seq=45 wait_time=476 seconds since wait started=0
driver id=0, #bytes=1, =0
Dumping Session Wait History
for
'SQL*Net message from dblink'
count
=1 wait_time=476
driver id=0, #bytes=1, =0
for
'SQL*Net message to dblink'
count
=1 wait_time=1
driver id=0, #bytes=1, =0
for
'log file sync'
count
=1 wait_time=7483
buffer#=ca9, =0, =0
for
'log file sync'
count
=1 wait_time=358
buffer#=ca9, =0, =0
for
'PX Deq: Execution Msg'
count
=1 wait_time=15527
sleeptime/senderid=1001ffff, passes=1, =0
for
'log file sync'
count
=1 wait_time=11855
buffer#=c57, =0, =0
for
'PX Deq: Execution Msg'
count
=1 wait_time=277142
sleeptime/senderid=1001ffff, passes=1, =0
for
'direct path write'
count
=1 wait_time=25
file number=5,
first
dba=103ca, block cnt=3
for
'direct path write'
count
=1 wait_time=6
file number=4,
first
dba=1abe1, block cnt=8
for
'row cache lock'
count
=1 wait_time=3307
cache id=5, mode=0, request=5
temporary
object counter: 0
----------------------------------------
Virtual Thread:
kgskvt: 7000000c0e86b58, sess: 7000000cf666540, vc: 0, proc: 7000000cf48bf98
consumer
group
cur: OTHER_GROUPS (upd? 0), mapped: DEFAULT_CONSUMER_GROUP, orig:
vt_state: 0x200, vt_flags: 0x30, blkrun: 0
is_assigned: 1, in_sched: 0 (0)
vt_active: 0 (pending: 0)
used quanta: 0 (cg: 0)
cpu start
time
: 0, quantum status: 0x0
quantum checks
to
skip: 0,
check
thresh: 0
idle
time
: 0, active
time
: 0 (cg: 0)
cpu yields: 0 (cg: 0), waits: 0 (cg: 0), wait
time
: 0 (cg: 0)
queued
time
outs: 0,
time
: 0 (cur 0, cg 0)
calls aborted: 0, num est
exec
limit hit: 0
undo
current
: 0k
max
: 0k
----------------------------------------
UOL used : 0 locks(used=0,
free
=2)
KGX Atomic Operation Log 7000000bdcf0568
Mutex 0(0, 0) idn 0 oper NONE
Cursor
Parent uid 262 efd 9 whr 11 slp 0
oper=NONE pt1=7000000c69d0878 pt2=7000000bef37090 pt3=7000000b47119c8
pt4=0 u41=1 stt=0
KGX Atomic Operation Log 7000000bdcf05b0
Mutex 7000000bdc2d028(0, 2) idn 0 oper NONE
Cursor
Stat uid 262 efd 12 whr 1 slp 0
oper=NONE pt1=7000000bdc2cef8 pt2=0 pt3=0
pt4=0 u41=0 stt=0
KGX Atomic Operation Log 7000000bdcf05f8
Mutex 0(0, 0) idn 0 oper NONE
Library Cache uid 262 efd 0 whr 0 slp 0
----------------------------------------
SO: 7000000cf7e3ec8, type: 5, owner: 7000000cf666540, flag: INIT/-/-/0x00
(enqueue) DX-00000010-00000000 DID: 0001-002B-00151DA9
lv: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 res_flag: 0x2
res: 0x7000000cf8bd0b8, mode: X, lock_flag: 0x0
own: 0x7000000cf666540, sess: 0x7000000cf666540, proc: 0x7000000cf48bf98, prv: 0x7000000cf8bd0c8
|
SO: 7000000cf7e3ec8, type: 5, owner: 7000000cf666540, flag: INIT/-/-/0x00
(enqueue) DX-00000010-00000000 DID: 0001-002B-00151DA9
lv: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 res_flag: 0x2
res: 0x7000000cf8bd0b8, mode: X, lock_flag: 0x0
own: 0x7000000cf666540, sess: 0x7000000cf666540, proc: 0x7000000cf48bf98, prv: 0x7000000cf8bd0c8
即DX(Distributed transaction entry)锁;可以猜测该并行会话在等待远程事务被清理,但等待超时,故报错。 查询metalink可以发现Bug 5223587与该错误一致,该Bug的特征为: 1. 出现ORA-600[4400],且调用栈为ktcddt ,说明涉及到了分布式事务 2. trace显示当时无打开的游标 3. 可以看到当时的等待为'SQL*Net message from dblink'事件 Metalink文档[ID
本文转自maclean_007 51CTO博客,原文链接:http://blog.51cto.com/maclean/1277478