一套Linux x86-64上的11.1.0.7 RAC系统,该RAC使用Netapps NFS作为共享存储。其中一个节点出现ORA-00600: internal error code, arguments: [kccchb_3]内部错误并导致实例意外终止,详细日志如下:
根据Metalink文档
Note 303986.1 RAC instance using NFS via Netapps report Ora-600 [kccchb_3]:
以上介绍了该ORA-00600 [kccchb_3]内部错误可以通过设置初始化参数filesystemio_options为DIRECTIO来workaround;实际上更好的选择可以是设置filesystemio_options为SETALL,即同时使用DIRECTIO和ASYNC异步IO。 顺便提一下在Linux上启用异步ASYNC IO不仅需要设置该filesystemio_options参数,我们还需要同时保证以下几点: 1.filesystemio_options参数设置为ASYNC或者SETALL 2.合理设置Linux Kernel内核参数fs.aio-max-nr,例如设置为3145728;以及参数fs.file-max =6815744 3.disk_asynch_io初始化参数设置为TRUE 4.在10gR2下oracle binary可能没有正确以async_on选项编译,我们需要手动重新make(The reason behind this behavior is that the LIBAIO_0.1 Linux OS library is not attached for io_getevents.)
1
2
3
4
5
6
7
8
9
|
Mon
Dec
27 00:03:13 2010
Error: Controlfile
sequence
number
in
file header
is
different
from
the one
in
memory
Please
check
that the correct mount optionsare used if controlfile
is
located
on
NFS
Errors
in
file /d02/oratst/TSMOTEBS/db/11.1.0/admin/TSMOTEBS1_l118464dcss3001/diag/rdbms/tsmotebs/TSMOTEBS1/trace/TSMOTEBS1_ckpt_15907.trc (incident=11353):
ORA-00600: internal error code, arguments: [kccchb_3], [555541], [555540], [555540], [], [], [], [], [], [], [], []
Incident details
in
: /d02/oratst/TSMOTEBS/db/11.1.0/admin/TSMOTEBS1_l118464dcss3001/diag/rdbms/tsmotebs/TSMOTEBS1/incident/incdir_11353/TSMOTEBS1_ckpt_15907_i11353.trc
Errors
in
file /d02/oratst/TSMOTEBS/db/11.1.0/admin/TSMOTEBS1_l118464dcss3001/diag/rdbms/tsmotebs/TSMOTEBS1/trace/TSMOTEBS1_ckpt_15907.trc:
ORA-00600: internal error code, arguments: [kccchb_3], [555541], [555540], [555540], [], [], [], [], [], [], [], []
CKPT (ospid: 15907): terminating the instance due
to
error 469
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
|
Symptoms
RAC instance using NFS via Netapps occasionally reports:
ORA-00600 [kccchb_3], [689], [0], [],[],[],[]
ORA-00600 [kclchkblk_3], [7], [738515663], [14], [], [], [], []
Cause
This
is
most probably a Netapp issue wherein
'noac'
doesn't
work
as
expected.
NetApp recommended mount options
for
Oracle9i RAC
on
Solaris:
rw,bg,vers=3,proto=tcp,hard,intr,rsize=32768,wsize=32768,forcedirectio,noac
NetApp recommended mount options
for
Linux SLES9, RHEL4, RHEL3 QU3
and
later:
rw,bg,hard,nointr,tcp,vers=3,timeo=600,rsize=ORACLE_BLOCK_SIZE,
wsize=ORACLE_BLOCK_SIZE,actimeo=0
NetApp recommended mount options
for
Linux RHAS2.1, RHEL3 pre QU3:
rw,bg,hard,nointr,tcp,vers=3,timeo=600,rsize=ORACLE_BLOCK_SIZE,
wsize=ORACLE_BLOCK_SIZE,noac
And
for
AS
2.1
in
/etc/modules.conf
add
:
options nfs nfs_uncached_io=1
Solution
Workaround
set
filesystemio_options=DIRECTIO
for
all
instances (
to
bypass
any
buffer cache)
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
|
SQL>shutdown immediate
[maclean@rh8 ~]$ cd $ORACLE_HOME/rdbms/lib
[maclean@rh8 lib]$ ln -s /usr/lib/libaio.so skgaio.o
[maclean@rh8 lib]$ make PL_ORALIBS=-laio -f ins_rdbms.mk async_on
rm -f /u01/oracle/product/10.2.0/db_1/rdbms/lib/skgaioi.o
cp /u01/oracle/product/10.2.0/db_1/rdbms/lib/skgaio.o /u01/oracle/product/10.2.0/db_1/rdbms/lib/skgaioi.o
chmod 755 /u01/oracle/product/10.2.0/db_1/bin
- Linking Oracle
rm -f /u01/oracle/product/10.2.0/db_1/rdbms/lib/oracle
gcc -o /u01/oracle/product/10.2.0/db_1/rdbms/lib/oracle -L/u01/oracle/product/10.2.0/db_1/rdbms/lib/ -L/u01/oracle/product/10.2.0/db_1/lib/ -L/u01/oracle/product/10.2.0/db_1/lib/stubs/ -L/usr/lib -lirc -Wl,-E `test -f /u01/oracle/product/10.2.0/db_1/rdbms/lib/sskgpsmti.o && echo /u01/oracle/product/10.2.0/db_1/rdbms/lib/sskgpsmti.o` /u01/oracle/product/10.2.0/db_1/rdbms/lib/opimai.o /u01/oracle/product/10.2.0/db_1/rdbms/lib/ssoraed.o /u01/oracle/product/10.2.0/db_1/rdbms/lib/ttcsoi.o /u01/oracle/product/10.2.0/db_1/rdbms/lib/defopt.o -Wl,
--whole-archive -lperfsrv10 -Wl,--no-whole-archive /u01/oracle/product/10.2.0/db_1/lib/nautab.o /u01/oracle/product/10.2.0/db_1/lib/naeet.o /u01/oracle/product/10.2.0/db_1/lib/naect.o /u01/oracle/product/10.2.0/db_1/lib/naedhs.o /u01/oracle/product/10.2.0/db_1/rdbms/lib/config.o -lserver10 -lodm10 -lnnet10 -lskgxp10 -lsnls10 -lnls10 -lcore10 -lsnls10 -lnls10 -lcore10 -lsnls10 -lnls10 -lxml10 -lcore10 -lunls10 -lsnls10 -lnls10 -lcore10 -lnls10 -lhasgen10 -lcore10 -lskgxn2 -locr10 -locrb10 -locrutl10 -lhasgen10 -lcore10 -lskgxn2 -lclient10 -lvsn10 -lcommon10 -lgeneric10 -lknlopt `if /usr/bin/ar tv /u01/oracle/product/10.2.0/db_1/rdbms/lib/libknlopt.a | grep xsyeolap.o > /dev/null 2>&1 ; then echo "-loraolap10" ; fi` -lslax10 -lpls10 -lplp10 -lserver10 -lclient10 -lvsn10 -lcommon10 -lgeneric10 -lknlopt -lslax10 -lpls10 -lplp10 -ljox10 -lserver10 -lclsra10 -ldbcfg10 -locijdbcst10 -lwwg `cat /u01/oracle/product/10.2.0/db_1/lib/ldflags` -lnsslb10 -lncrypt10 -lnsgr10 -lnzjs10 -ln10 -lnnz10 -lnl10 -lnro10 `cat /u01/oracle/product/10.2.0/db_1/lib/ldflags` -lnsslb10 -lncrypt10 -lnsgr10 -lnzjs10 -ln10 -lnnz10 -lnl10 -lmm -lsnls10 -lnls10 -lcore10 -lsnls10 -lnls10 -lcore10 -lsnls10 -lnls10 -lxml10 -lcore10 -lunls10 -lsnls10 -lnls10 -lcore10 -lnls10 `cat /u01/oracle/product/10.2.0/db_1/lib/ldflags` -lnsslb10 -lncrypt10 -lnsgr10 -lnzjs10 -ln10 -lnnz10 -lnl10 -lnro10 `cat /u01/oracle/product/10.2.0/db_1/lib/ldflags` -lnsslb10 -lncrypt10 -lnsgr10 -lnzjs10 -ln10 -lnnz10 -lnl10 -lsnls10 -lnls10 -lcore10 -lsnls10 -lnls10 -lcore10 -lsnls10 -lnls10 -lxml10 -lcore10 -lunls10 -lsnls10 -lnls10 -lcore10 -lnls10 `if /usr/bin/ar tv /u01/oracle/product/10.2.0/db_1/rdbms/lib/libknlopt.a | grep "kxmnsd.o" > /dev/null 2>&1 ; then echo " " ; else echo "-lordsdo10"; fi` -lctxc10 -lctx10 -lzx10 -lgx10 -lctx10 -lzx10 -lgx10 -lordimt10 -lsnls10 -lnls10 -lcore10 -lsnls10 -lnls10 -lcore10 -lsnls10 -lnls10 -lxml10 -lcore10 -lunls10 -lsnls10 -lnls10 -lcore10 -lnls10 -lsnls10 -lunls10 -lsnls10 -lnls10 -lcore10 -lsnls10 -lnls10 -lcore10 -lsnls10 -lnls10 -lxml10 -lcore10 -lunls10 -lsnls10 -lnls10 -lcore10 -lnls10 -laio `cat /u01/oracle/product/10.2.0/db_1/lib/sysliblist` -Wl,-rpath,/u01/oracle/product/10.2.0/db_1/lib -lm `cat /u01/oracle/product/10.2.0/db_1/lib/sysliblist` -ldl -lm -L/u01/oracle/product/10.2.0/db_1/lib
/u01/oracle/product/10.2.0/db_1/lib//libcore10.a(lcd.o):
In
function
`lcdprm
':
lcd.c:(.text+0x8e8): warning: the `gets'
function
is
dangerous
and
should
not
be used.
mv -f /u01/oracle/product/10.2.0/db_1/bin/oracle /u01/oracle/product/10.2.0/db_1/bin/oracleO
mv /u01/oracle/product/10.2.0/db_1/rdbms/lib/oracle /u01/oracle/product/10.2.0/db_1/bin/oracle
chmod 6751 /u01/oracle/product/10.2.0/db_1/bin/oracle
SQL> startup;
/* 通过以下方式可以验证ASYNC IO的工作情况 */
cat /proc/slabinfo | grep kio
[maclean@rh8 ~]$ cat /proc/slabinfo | grep kio
kioctx 56 72 320 12 1 : tunables 54 27 8 : slabdata 6 6 0
kiocb 15 15 256 15 1 : tunables 120 60 8 : slabdata 1 1 0
/* 这里kioctx的值大于零,说明正在使用异步ASYNC IO */
如何是11g,那么无需那么麻烦要重新编译async_on,同时可以通过以下手段了解异步ASYNC IO的使用情况:
SQL> col
name
for
a60
SQL>
SELECT
name
, asynch_io
FROM
v$datafile f,v$iostat_file i
WHERE
f.file# = i.file_no
AND
filetype_name =
'Data File'
;
NAME
ASYNCH_IO
------------------------------------------------------------ ---------
/standby/oradata/PROD/datafile/o1_mf_system_6q9dwgwh_.dbf ASYNC_ON
/standby/oradata/PROD/datafile/o1_mf_sysaux_6q9dwgyp_.dbf ASYNC_ON
/standby/oradata/PROD/datafile/o1_mf_undotbs1_6q9dwh0r_.dbf ASYNC_ON
/standby/oradata/PROD/datafile/o1_mf_users_6q9dwh2x_.dbf ASYNC_ON
/standby/oradata/PROD/datafile/o1_mf_example_6q9dzhh1_.dbf ASYNC_ON
/standby/oradata/PROD/datafile/o1_mf_enc_6q9fdzcx_.dbf ASYNC_ON
6
rows
selected.
/* 可以看到ASYNCH_IO的状态为
ON
,说明正使用异步IO */
|
本文转自maclean_007 51CTO博客,原文链接:http://blog.51cto.com/maclean/1277681