一套HP-UX上的10.2.0.4系统出现ORA-00600[17175] Oracle600内部错误,相关的日志信息如下:
Wed Dec 1 01:57:55 2010 Errors in file /u01/app/oracle/admin/xgp2/bdump/xgp21_pmon_3250.trc: ORA-00600: internal error code, arguments: [17175], [255], [], [], [], [], [], [] ORA-00601: cleanup lock conflict Wed Dec 1 01:57:57 2010 Trace dumping is performing id=[cdmp_20101201015757] Wed Dec 1 01:58:05 2010 LGWR: terminating instance due to error 472 Wed Dec 1 01:58:05 2010 Errors in file /u01/app/oracle/admin/xgp2/bdump/xgp21_lms1_3291.trc: ORA-00472: PMON process terminated with error Wed Dec 1 01:58:05 2010 Errors in file /u01/app/oracle/admin/xgp2/bdump/xgp21_lms2_3293.trc: ORA-00472: PMON process terminated with error Wed Dec 1 01:58:05 2010 Errors in file /u01/app/oracle/admin/xgp2/bdump/xgp21_lms3_3295.trc: ORA-00472: PMON process terminated with error Wed Dec 1 01:58:05 2010 Errors in file /u01/app/oracle/admin/xgp2/bdump/xgp21_lms0_3289.trc: ORA-00472: PMON process terminated with error Wed Dec 1 01:58:05 2010 Errors in file /u01/app/oracle/admin/xgp2/bdump/xgp21_lmon_3283.trc: ORA-00472: PMON process terminated with error Wed Dec 1 01:58:05 2010 Errors in file /u01/app/oracle/admin/xgp2/bdump/xgp21_lmd0_3287.trc: ORA-00472: PMON process terminated with error Wed Dec 1 01:58:05 2010 Shutting down instance (abort) License high water mark = 421 /u01/app/oracle/admin/xgp2/bdump/xgp21_pmon_3250.trc Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production With the Partitioning, Real Application Clusters, OLAP, Data Mining and Real Application Testing options ORACLE_HOME = /u01/app/oracle/product/10.2.0/db_1 System name: HP-UX Node name: XGP2_db1 Release: B.11.31 Version: U Machine: ia64 Instance name: xgp21 Redo thread mounted by this instance: 1 Oracle process number: 2 Unix process pid: 3250, image: oracle@XGP2_db1 (PMON) *** SERVICE NAME:(SYS$BACKGROUND) 2010-12-01 01:57:55.933 *** SESSION ID:(333.1) 2010-12-01 01:57:55.933 *** 2010-12-01 01:57:55.933 ksedmp: internal or fatal error ORA-00600: internal error code, arguments: [17175], [255], [], [], [], [], [], [] ORA-00601: cleanup lock conflict ksedst <- ksedmp <- ksfdmp <- kgeriv <- kgesiv <- kgesic1 <- kghcln <- kslilcr <- $cold_ksl_cleanup <- ksepop <- kgepop <- kgesev <- ksesec0 <- $cold_kslges <- ksl_get_child_latch <- kslgpl <- es <- ksfglt <- kghext_numa <- ksmasgn <- kghnospc <- $cold_kghalo <- ksmdacnk <- ksmdget <- ksosp_alloc <- ksoreq_submit <- ksbsrv <- kmmssv <- kmmlsa <- kmmlod <- ksucln <- ksbrdp <- opirip <- $cold_opidrv <- sou2o <- $cold_opimai_real <- main <- main_opd_entry PROCESS STATE ------------- Process global information: process: c00000018d000078, call: c00000018d252238, xact: 0000000000000000, curses: c00000018d2508a8, usrses: c00000018d2508a8 ---------------------------------------- SO: c00000018d000078, type: 2, owner: 0000000000000000, flag: INIT/-/-/0x00 (process) Oracle pid=2, calls cur/top: c00000018d252238/c00000018d252238, flag: (e) SYSTEM int error: 0, call error: 0, sess error: 0, txn error 0 (post info) last post received: 0 0 48 last post received-location: ksoreq_reply last process to post me: c00000018d037978 1 64 last post sent: 0 0 24 last post sent-location: ksasnd last process posted by me: c00000018d001058 1 6 (latch info) wait_event=0 bits=90 holding (efd=5) c00000020001d500 Parent+children shared pool level=7 Location from where latch is held: kghfrunp: alloc: clatch nowait: Context saved from call: 0 state=busy, wlstate=free holding (efd=5) c00000020000b5f8 OS process allocation level=4 Location from where latch is held: ksoreq_submit: Context saved from call: 13835058076152957304 state=busy, wlstate=free Process Group: DEFAULT, pseudo proc: c0000004dd263230 O/S info: user: oracle, term: UNKNOWN, ospid: 3250 OSD pid info: Unix process pid: 3250, image: oracle@XGP2_db1 (PMON) SO: c0000004df4d5f28, type: 19, owner: c00000018d000078, flag: INIT/-/-/0x00 GES MSG BUFFERS: st=emp chunk=0x0000000000000000 hdr=0x0000000000000000 lnk=0x0000000000000000 flags=0x0 inc=4 outq=0 sndq=0 opid=2 prmb=0x0 mbg[i]=(2 19) mbg[b]=(0 0) mbg[r]=(0 0) fmq[i]=(4 1) fmq[b]=(0 0) fmq[r]=(0 0) mop[s]=20 mop[q]=1 pendq=0 zmbq=0 nonksxp_recvs=0 ------------process 0xc0000004df4d5f28-------------------- proc version : 0 Local node : 0 pid : 3250 lkp_node : 0 svr_mode : 0 proc state : KJP_NORMAL Last drm hb acked : 0 Total accesses : 181 Imm. accesses : 180 Locks on ASTQ : 0 Locks Pending AST : 0 Granted locks : 0 AST_Q: PENDING_Q: GRANTED_Q: ---------------------------------------- SO: c00000018d2f3610, type: 11, owner: c00000018d000078, flag: INIT/-/-/0x00 (broadcast handle) flag: (2) ACTIVE SUBSCRIBER, owner: c00000018d000078, event: 1, last message event: 1, last message waited event: 1, messages read: 0 channel: (c0000004dd29fdb0) scumnt mount lock scope: 1, event: 19, last mesage event: 0, publishers/subscribers: 0/19, messages published: 0 SO: c00000018d2508a8, type: 4, owner: c00000018d000078, flag: INIT/-/-/0x00 (session) sid: 333 trans: 0000000000000000, creator: c00000018d000078, flag: (51) USR/- BSY/-/-/-/-/- DID: 0001-0002-00000003, short-term DID: 0000-0000-00000000 txn branch: 0000000000000000 oct: 0, prv: 0, sql: 0000000000000000, psql: 0000000000000000, user: 0/SYS service name: SYS$BACKGROUND last wait for 'latch: shared pool' blocking sess=0x0000000000000000 seq=342 wait_time=175677 seconds since wait started=0 address=c0000002000fff60, number=d6, tries=7 Dumping Session Wait History for 'latch: shared pool' count=1 wait_time=175677 address=c0000002000fff60, number=d6, tries=7 for 'latch: shared pool' count=1 wait_time=97554 address=c0000002000fff60, number=d6, tries=6 for 'latch: shared pool' count=1 wait_time=78023 address=c0000002000fff60, number=d6, tries=5 for 'latch: shared pool' count=1 wait_time=38978 address=c0000002000fff60, number=d6, tries=4 for 'latch: shared pool' count=1 wait_time=38942 address=c0000002000fff60, number=d6, tries=3 for 'latch: shared pool' count=1 wait_time=19435 address=c0000002000fff60, number=d6, tries=2 for 'latch: shared pool' count=1 wait_time=12655 address=c0000002000fff60, number=d6, tries=1 for 'latch: shared pool' count=1 wait_time=8 address=c0000002000fff60, number=d6, tries=0 for 'os thread startup' count=1 wait_time=144253 =0, =0, =0 for 'os thread startup' count=1 wait_time=141360 =0, =0, =0 SO: c00000018d2f3500, type: 11, owner: c00000018d000078, flag: INIT/-/-/0x00 (broadcast handle) flag: (2) ACTIVE SUBSCRIBER, owner: c00000018d000078, event: 2, last message event: 40, last message waited event: 40, messages read: 1 channel: (c0000004dd29bbd8) system events broadcast channel scope: 2, event: 224634, last mesage event: 40, publishers/subscribers: 0/161, messages published: 1 SO: c00000018d252238, type: 3, owner: c00000018d000078, flag: INIT/-/-/0x00 (call) sess: cur c00000018d2508a8, rec 0, usr c00000018d2508a8; depth: 0 ---------------------------------------- SO: c00000018d2594b0, type: 5, owner: c00000018d252238, flag: INIT/-/-/0x00 (enqueue) PR-00000000-00000000 DID: 0001-0002-00000003 lv: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 res_flag: 0x2 res: 0xc0000004df401718, mode: X, lock_flag: 0x0 own: 0xc00000018d2508a8, sess: 0xc00000018d2508a8, proc: 0xc00000018d000078, prv: 0xc0000004df401728 ---------------------------------------- SO: c00000018d30b710, type: 16, owner: c00000018d000078, flag: INIT/-/-/0x00 (osp req holder) CHILD REQUESTS: (osp req) type=2(BACKGROUND) flags=0x20001(STATIC/-) state=1(INITED) err=0 pg=0 arg1=0 arg2=(null) reply=(null) pname=S018 pid=0 parent=c00000018d30b710 fulfill=0000000000000000 ---------------------------------------- SO: c0000004dbff09c0, type: 192, owner: c0000004dbff09c0, flag: -/-/-/0x00在metalink上搜索600[17175]内部错误相关的文档,可以找到该错误的大量信息:
Keywords: ora-00600 [17175] 1. Bug 6250251: ORA-00600 17175 DURING KGI CLEANUP - DUMP - ORADEBUG --ora-600 followed by ora-601 and instance crash with ORA-17175. --Also, setting of heap check event triggers this problem. In this case --it is event="10235 trace name context forever, level 27" 2. Bug 4216668 - Dump from INSERT / MERGE on internal columns (Doc ID 4216668.8) --INSERT or MERGE commands might core dump if operating on object types and internal columns are involved. 3. Bug 7590297: ORA-600 [17175] [255] ORA-601: CLEANUP LOCK CONFLICT CRASHED THE DATABASE 4. SR 3-2296150050 --The error has occurred when Oracle was cleaning shared pool latch/heap information about the process which died in middle. --There is no data corruption associated with this error. --This is evident from the function kghcln in the trace stack at which it failed. --This problem is usually the symptom of some earlier problem with the latch. --Either after a process has died, or a process has signaled an error while holding a shared pool latch, and the index to the shared pool latch is invalid. --There was a Bug 7590297 raised for this issue which could not be progressed due to unavailability of information. --From few earlier known issues - This can be due to PMON may sometimes signal ORA-601 while trying to start up additional shared servers or dispatchers. --There the workaround suggested was to Start the instance with max # of shared servers. --Can you reproduce the problem?If the instance has been restated the issue may not persist as it is related to memory. --If the issue persists then we have to perform the following to monitoring the instance to investigate further: --1. Set the following event in parameter file: --event="10257 trace name context forever, level 10" --event="601 trace name SYSTEMSTATE level 10" --The first event will cause PMON to dump info about shared server startup. --The second event will cause PMON to do a system state dump when the 601 occurs. --2. You should also have the track of this in intervals and save the historical results from: --SQL> select e.total_waits, e.total_timeouts, e.time_waited from v$session_event e, v$session s , v$bgprocess b where b.name='PMON' and s.paddr=b.paddr and e.sid=s.sid and e.event='process startup'; 5. SR 3-2123025401 --=== ODM Solution / Action Plan === --Disabled NUMA for resolution 6. SR 7314313.994 Analysis: Bug 6250251 and bug 4216668 are not applicable to this case. Bug 7590297 is applicable to this case, as the call stack, error message are the same with this case. But this patch is suspended as requested info is not available. SR 3-2296150050: same error message, same DB version, similar call stack; closed without solution. SR 3-2123025401: same error message, same DB version, similar call stack. The issue happened twice in that SR and solved by disabling NUMA SR 7314313.994: same error message, same DB version, similar call stack; closed without solution. ERROR: ORA-600 [17175] [a] VERSIONS: versions 9.2 to 10.1 DESCRIPTION: This error occurs when we are cleaning up a shared pool latch (either after a process has died, or a process has signaled an error while holding a shared pool latch), and the index to the shared pool latch is invalid. ARGUMENTS: Arg [a] index of the latch recovery structure - usually 255 FUNCTIONALITY: Generic Heap Manager IMPACT: INSTANCE HANG PROCESS FAILURE INSTANCE FAILURE以下为Oracle GCS给出的行动计划,GCS认为绝大多数ORA-00600 [17xxx]是由memory相关的问题引起的,这些问题往往在重启实例后就可以得到解决。并建议可以设置shared_servers=max_shared_servers后进一步观察:
From the uploaded files it looks like you were reported with ORA-00600 [17175] errors and crashed the instance.What is the current status after the restart of the database. Are you still reported with the same errors and crashing the instance ? Mostly the ORA-00600 [17xxx] errors are memory releated and might have got resolved after the database restart. Further looking at the uploaded trace file the failing functions and the error closely matches Bug 6958493and is closed as duplicate of BaseBug 6962340which is closed as could not able to reproduce the error. Also a smillar issue is reported inBug 3104250which is fixed in 10g, but that doesn't mean you cannot get this error for a new reason and that the same workaround would fix it. We need to implement the workaround and set: shared_servers=max_shared_servers if the error reproduces again. If this is still repeated issue then we can file a new bug with development for the same. ACTION PLAN =========== 1. Monitor the alertlog for the ORA-00600 [17175] errors for the next few days and if the database still crashes then please set shared_servers=max_shared_servers and see if the problem resolves or not.
本文转自maclean_007 51CTO博客,原文链接:http://blog.51cto.com/maclean/1277677