环境 :
Oracle 10.2.0.4
CentOS 5.x 64bit
2 node RAC, raw 设备
通过光纤连HP存储.
存储出现故障并修复后, 其中一台(node2)的instance无法启动, 报错如下
1. ORACLEBASE/admin/SID/bdump/alter_$sid.log
2. $CRS_HOME/log/node2
Error: KGXGN polling error (15) Mon Oct 15 11:39:11 2012 Errors in file /opt/oracle/admin/skycac/bdump/skycac2_lmon_5763.trc: ORA-29702: error occurred in Cluster Group Service operation LMON: terminating instance due to error 29702 Mon Oct 15 11:39:12 2012 System state dump is made for local instance Mon Oct 15 11:39:12 2012 Errors in file /opt/oracle/admin/skycac/bdump/skycac2_diag_5759.trc: ORA-29702: error occurred in Cluster Group Service operation Mon Oct 15 11:39:12 2012 Trace dumping is performing id=[cdmp_20121015113912] Mon Oct 15 11:39:12 2012 Instance terminated by LMON, pid = 5763
AI 代码解读
2. $CRS_HOME/log/node2
2012-10-15 14:30:26.364 [crsd(14402)]CRS-1205:Auto-start failed for the CRS resource . Details in db-192-168-xxx-xxx.
AI 代码解读
3. crs_stat -t结果中除了instance_node2 OFFLINE 其他都正常.
解决办法 :
1. cd $ORACLE_HOME/rdbms/lib 2. rename the original library (if exists) mv libskgxp10.so libskgxp10.so.old 3. Relink to configure UDP for IPC make -f ins_rdbms.mk rac_on make -f ins_rdbms.mk ipc_udp make -f ins_rdbms.mk ioracle 4. Check whether the library exists ls -l $ORACLE_HOME/lib/libskgxp10.so 5. start instance srvctl start instance -d $dbname -i $instance_name
AI 代码解读
【参考】