一ORACLE 10.2.0.5.0 标准版的数据库的告警日志出现ORA-00600错误,具体错误信息如下所示
Errors in file /u01/app/oracle/admin/SCM2/bdump/scm2_s003_20333.trc:
ORA-00600: internal error code, arguments: [kcblasm_1], [103], [], [], [], [], [], []
Wed Oct 21 09:22:02 CST 2015
Errors in file /u01/app/oracle/admin/SCM2/bdump/scm2_s003_20333.trc:
ORA-00600: internal error code, arguments: [kcblasm_1], [103], [], [], [], [], [], []
Wed Oct 21 09:22:05 CST 2015
Errors in file /u01/app/oracle/admin/SCM2/bdump/scm2_s006_21620.trc:
ORA-00600: internal error code, arguments: [kcblasm_1], [103], [], [], [], [], [], []
Wed Oct 21 09:22:10 CST 2015
在trc文件里面能看到有不少下面错误信息以及对应的SQL语句(都是同一SQL语句触发此类ORA-00600错误)
WARNING:Could not increase the asynch I/O limit to 736 for SQL direct I/O. It is set to 128
WARNING:Could not increase the asynch I/O limit to 736 for SQL direct I/O. It is set to 128
如果将该SQL语句在SQL Development里面执行,告警日志里面就会出现上述ORA-00600错误。同事查了下资料,怀疑是Bug 9949948 : PROCESS SPIN UNDER KSFDRWAT0 IF AIO-MAX-NR TOO LOW 导致, ORACLE Metalink里面描述如下:
Bug 9949948 Linux: Process spin under ksfdrwat0 if OS Async IO not configured high enough
This note gives a brief overview of bug 9949948.The content was last updated on: 28-OCT-2011
Click here for details of each of the sections below.
Affects:
Product (Component) Oracle Server (Rdbms) Range of versions believed to be affected Versions >= 10.2.0.4 but BELOW 11.1 Versions confirmed as being affected Platforms affected
- Linux X86-64bit
- Linux 32bit
It is believed to be a regression in default behaviour thus:
Regression introduced in 10.2.0.5
Fixed:
This issue is fixed in
Symptoms: |
Related To: |
|
Description
This problem is introduced in 10.2.0.5 It only affects platforms where Oracle has to reserve async IO slots, such as Linux platforms. If the OS async IO layer is underconfigured and an Oracle process cannot get sufficient AIO slots then rather than reverting to using non AIO call the process may go into an infinite spin under ksfdrwat0. Rediscovery notes: The spin will be preceded by messages in the trace file of the form: WARNING:io_submit failed due to kernel limitations MAXAIO for process=0 pending aio=0 WARNING:asynch I/O kernel limits is set at AIO-MAX-NR=65536 AIO-NR=65518 WARNING:1 Oracle process running out of OS kernelI/O resources aiolimit=0 Notice specifically that the value for aiolimit is reported as "0" for this bug. The process then spins in ksfdrwat0 typically with a stack showing skgfqio () ksfdgo () ksfdwtio () ksfdwat1 () ksfdrwat0 () <<< Spin point ksfdblock () kcflwi () kcflci () kcblci () kcblcio () kcblgt () kcbldrget () It will show repeated waits for "i/o slave wait", which can be misleading as that is normally considered an idle wait event. Workaround Raise the OS AIO limits such that the number of concurrent slot requirements never exceeds the OS limit. ie: Increase AIO-MAX-NR OR Disable async IO (Set DISK_ASYNCH_IO=FALSE) See Note:1313555.1 for additional notes on this issue.
Please note: The above is a summary description only. Actual symptoms can vary. Matching to any symptoms here does not confirm that you are encountering this problem. For questions about this bug please consult Oracle Support. |
References
Bug:9949948 (This link will only work for PUBLISHED bugs)
Note:245840.1 Information on the sections in this article
这篇博客WARNING:Could not increase the asynch I/O limit to xxx for SQL direct I/O已有较详细描述。 但是我查了一下资料,觉得更有可能是Bug 9800302 : 10.2.0.5 GETTING ORA-00600 [KCBLASM_1] 。关于Bug 9800302的Meta Link上文档描述如下所示:
PPLIES TO:
Oracle Database - Enterprise Edition - Version 10.2.0.5 to 10.2.0.5 [Release 10.2]Information in this document applies to any platform.
***Checked for relevance on 10-Jul-2015***
SYMPTOMS
Alert log reports:
The call stack in the generated trace file looks similar to:
CHANGES
This problem is hit only in 10.2.0.5, up to PSU 10.2.0.5.4 in which problem is fixed.
Problem is affecting 10.2.0.5 to 10.2.0.5.3.
CAUSE
This is is a known and common problem hit in 10.2.0.5, investigated and corrected in unpublished Bug 7612454.
The problem was introduced in 10.2.0.5, by the fix of Bug:7523755.
This has been corrected in PSU 10.2.0.5.4 and 11.2.
Please refer to:
Note 7612454.8 - Bug 7612454 - More "direct path read" operations / OERI:kcblasm_1
SOLUTION
The quickest way to solve the problem is to apply PSU 10.2.0.5.4, Patch 12419392 or later.
All alternative solutions for this problem are listed below:
- Upgrade the database to 11.2.
- OR -
- Apply 10.2.0.5.4 Patch Set Update (Patch 12419392) or later PSUs where bug is fixed.
The available PSUs are mentioned in "10.2.0.5 Patch Set Updates - List of Fixes in each PSU" (Document 1337394.1)
- OR -
- Apply interim Patch 7612454 on top of 10.2.0.5 (10.2.0.5.0-10.2.0.5.3):
- For UNIX / Linux platforms apply Patch 7612454 available for download on MOS.
- For Windows platforms apply Patch 3 or higher.
Please check Document 342443.1 for latest patches available for Windows on top of 10.2.0.5.
但是毕竟ORACLE 10.2.0.5版本存在这两个bug,所以导致出现这个问题的可能性都存在,于是我采用排除法,先将Oracle参数 DISK_ASYNC_IO=FALSE(Bug 9949948可以通过禁用异步IO解决), 然后执行导致异常出现的SQL语句,结果告警日志里面依然出现了ORA-00600: internal error code, arguments: [kcblasm_1], [103], [], [], [], [], [], []。看来可以排除是Bug 9949948引起这个错误,但是让我觉得纳闷的是这个SQL语句来自一个报表,而且运行很久了,以前也没有出现这个问题,最近不知道什么触发了这个 Bug。于是我先验证一下
在SQL语句执行前执行 ALTER SESSION SET “_hash_join_enabled”=FALSE,执行该语句不会出现ORA-00600错误。另外,将该SQL语句优化改写也能避免出现此类 ORA-00600错误。当然打上补丁p7612454解决此类问题才是最行之有效的解决方法。经过测试验证,发现应用补丁p7612454后,该SQL 语句不会再报ORA-00600: internal error code, arguments: [kcblasm_1], [103], [], [], [], [], [], []错误了。