环境 :
Oracle 10.2.0.4
CentOS 5.x 64bit
2 node RAC, raw 设备
通过光纤连HP存储.
存储出现故障并修复后, 其中一台(node2)的instance无法启动, 报错如下
1. $ORACLE_BASE/admin/$SID/bdump/alter_$sid.log
2. $CRS_HOME/log/node2
Error: KGXGN polling error (15)
Mon Oct 15 11:39:11 2012
Errors in file /opt/oracle/admin/skycac/bdump/skycac2_lmon_5763.trc:
ORA-29702: error occurred in Cluster Group Service operation
LMON: terminating instance due to error 29702
Mon Oct 15 11:39:12 2012
System state dump is made for local instance
Mon Oct 15 11:39:12 2012
Errors in file /opt/oracle/admin/skycac/bdump/skycac2_diag_5759.trc:
ORA-29702: error occurred in Cluster Group Service operation
Mon Oct 15 11:39:12 2012
Trace dumping is performing id=[cdmp_20121015113912]
Mon Oct 15 11:39:12 2012
Instance terminated by LMON, pid = 5763
2. $CRS_HOME/log/node2
2012-10-15 14:30:26.364
[crsd(14402)]CRS-1205:Auto-start failed for the CRS resource . Details in db-192-168-xxx-xxx.
3. crs_stat -t结果中除了instance_node2 OFFLINE 其他都正常.
解决办法 :
1. cd $ORACLE_HOME/rdbms/lib
2. rename the original library (if exists)
mv libskgxp10.so libskgxp10.so.old
3. Relink to configure UDP for IPC
make -f ins_rdbms.mk rac_on
make -f ins_rdbms.mk ipc_udp
make -f ins_rdbms.mk ioracle
4. Check whether the library exists
ls -l $ORACLE_HOME/lib/libskgxp10.so
5. start instance
srvctl start instance -d $dbname -i $instance_name
【参考】