最近虚拟机下的Oracle 10g RAC搬家,搬家完毕之后,Oracle 集群resource之VIP无法正常启动,收到了CRS-0233: Resource or relatives are currently involved with another operation 错误提示。为为啥呢,原来啊,搬家了地址发生变化了,你得使用你家里的新地址阿....
1、环境描述 Oracle 10g RAC + Suse 10 注,将RAC虚拟机搬家之后,通常情况下我们在添加虚拟机时选择复制(移动这个没试过,不知道是否会有问题) 其次,由于选择了复制,虚拟机要生成一个新的UUID(UUID是指在一台机器上生成的数字,它保证对在同一虚拟环境中的机器唯一性)。 同时MAC地址以及网络接口名也会发生相应的变化(首次启动虚拟机原来的eth0以及eth1不可用),通常情况下需要对此修改。 不同的Linux系统对新网络接口有不同的处理方法。Oracle Linux以及RedHat可以直接把原来的删除然后将新的网络接口名重命名为原来的。 对于SuseLinux稍微有点麻烦,可以参考这里: http://blog.csdn.net/robinson_0612/article/details/8131771 2、CRS-1006/CRS-0215/CRS-0233错误 #修改网卡之后,重新启动两个节点 #resource vip 状态为offline oracle@bo2dbp:~> ./crs_stat.sh |grep bo2dbp Resource name Target State -------------- ------ ----- ora.bo2dbp.ASM1.asm ONLINE ONLINE on bo2dbp ora.bo2dbp.LISTENER_BO2DBP.lsnr ONLINE OFFLINE ora.bo2dbp.LISTENER_ORA10G_BO2DBP.lsnr ONLINE OFFLINE ora.bo2dbp.gsd ONLINE ONLINE on bo2dbp ora.bo2dbp.ons ONLINE OFFLINE ora.bo2dbp.vip ONLINE OFFLINE ora.ora10g.db ONLINE ONLINE on bo2dbp ora.ora10g.ora10g1.inst ONLINE ONLINE on bo2dbp #尝试手动启动ons oracle@bo2dbp:~> crs_start ora.bo2dbp.ons Attempting to start `ora.bo2dbp.ons` on member `bo2dbp` Start of `ora.bo2dbp.ons` on member `bo2dbp` failed. CRS-1006: No more members to consider CRS-0215: Could not start resource 'ora.bo2dbp.ons'. #通过onsctl方式启动也收到失败信息 oracle@bo2dbp:~> onsctl start Number of onsconfiguration retrieved, numcfg = 2 onscfg[0] {node = bo2dbp.2gotrade.com, port = 6200} Adding remote host bo2dbp.2gotrade.com:6200 onscfg[1] {node = bo2dbs.2gotrade.com, port = 6200} Adding remote host bo2dbs.2gotrade.com:6200 Number of onsconfiguration retrieved, numcfg = 2 onscfg[0] {node = bo2dbp.2gotrade.com, port = 6200} Adding remote host bo2dbp.2gotrade.com:6200 onscfg[1] {node = bo2dbs.2gotrade.com, port = 6200} Adding remote host bo2dbs.2gotrade.com:6200 onsctl: ons failed to start #尝试手动启动vip,收到了CRS-0233错误 oracle@bo2dbp:~> crs_start ora.bo2dbp.vip CRS-0233: Resource or relatives are currently involved with another operation. 3、分析故障 #查看节点bo2dbp主机上的ip地址 oracle@bo2dbp:~> ifconfig #当前系统的网络接口名为eth2, eth5 eth2 Link encap:Ethernet HWaddr 00:0C:29:4A:66:28 inet addr:192.168.7.51 Bcast:192.168.7.255 Mask:255.255.255.0 eth5 Link encap:Ethernet HWaddr 00:0C:29:4A:66:32 inet addr:10.10.7.51 Bcast:10.10.7.255 Mask:255.255.255.0 #查看节点bo2dbp上集群网络层的配置,网络接口名与实际的网卡名一致 oracle@bo2dbp:~> oifcfg iflist eth2 192.168.7.0 eth5 10.10.7.0 oracle@bo2dbp:~> oifcfg getif -global #此处发现网络接口名与实际的网卡名不一致 eth3 192.168.7.0 global public eth4 10.10.7.0 global cluster_interconnect #查看节点bo2dbs主机上的ip地址 oracle@bo2dbs:~> ifconfig eth5 Link encap:Ethernet HWaddr 00:0C:29:27:43:EB inet addr:10.10.7.52 Bcast:10.10.7.255 Mask:255.255.255.0 eth6 Link encap:Ethernet HWaddr 00:0C:29:27:43:E1 inet addr:192.168.7.52 Bcast:192.168.7.255 Mask:255.255.255.0 #查看节点bo2dbp上集群网络层的配置,网络接口名与实际的网卡名一致 oracle@bo2dbs:~> oifcfg iflist eth6 192.168.7.0 eth5 10.10.7.0 oracle@bo2dbs:~> oifcfg getif -global #此处同样发现网络接口名与实际的网卡名不一致,应该是原来的网络接口名 eth3 192.168.7.0 global public eth4 10.10.7.0 global cluster_interconnect #从上面的情况来看,各个节点的网络接口名不一致,同时网络集群层的配置信息使用了原来的接口名,应当对其更新 #为了统一网络接口名,下面将其使用统一的名字为bond1, bond2 4、解决故障 #将网卡统一重命名,方法参照:http://blog.csdn.net/robinson_0612/article/details/8131771 #下面是重命名后的结果 oracle@bo2dbp:~> oifcfg iflist bond1 192.168.7.0 bond2 10.10.7.0 oracle@bo2dbs:~> oifcfg iflist bond1 192.168.7.0 bond2 10.10.7.0 #下面的查询中集群层的public与cluster_interconnect还是之前的配置信息 #应该需要改成一致,此处我们先不改,看看会出现什么错误 oracle@bo2dbp:~> oifcfg getif -global eth3 192.168.7.0 global public eth4 10.10.7.0 global cluster_interconnect oracle@bo2dbs:~> oifcfg getif -global eth3 192.168.7.0 global public eth4 10.10.7.0 global cluster_interconnect #重新启动crs oracle@bo2dbp:~> sudo -s /u01/oracle/crs/bin/crsctl start crs root'''s password: Attempting to start CRS stack The CRS stack will be started shortly #下面的查询表明crs后台进程正常 oracle@bo2dbp:~> crsctl check crs CSS appears healthy CRS appears healthy EVM appears healthy #下面查询的结果还是跟以前一样 oracle@bo2dbp:~> ./crs_stat.sh |grep bo2dbp Resource name Target State -------------- ------ ----- ora.bo2dbp.ASM1.asm ONLINE ONLINE on bo2dbp ora.bo2dbp.LISTENER_BO2DBP.lsnr ONLINE OFFLINE ora.bo2dbp.LISTENER_ORA10G_BO2DBP.lsnr ONLINE OFFLINE ora.bo2dbp.gsd ONLINE ONLINE on bo2dbp ora.bo2dbp.ons ONLINE OFFLINE ora.bo2dbp.vip ONLINE OFFLINE ora.ora10g.db ONLINE ONLINE on bo2dbp ora.ora10g.ora10g1.inst ONLINE ONLINE on bo2dbp #将所有的资源全部关闭 oracle@bo2dbp:~> crs_stop -all #使用oifcfg修改集群层网络配置 oracle@bo2dbp:~> oifcfg delif -global oracle@bo2dbp:~> oifcfg getif -global oracle@bo2dbp:~> oifcfg setif -global bond1/192.168.7.0:public oracle@bo2dbp:~> oifcfg setif -global bond2/10.10.7.0:cluster_interconnect oracle@bo2dbp:~> oifcfg getif -global bond1 192.168.7.0 global public bond2 10.10.7.0 global cluster_interconnect #reboot之后,资源状态还是跟之前一样 #先查看vip日志信息,我们先来解决vip的问题 bo2dbp:/u01/oracle/crs/log/bo2dbp/racg # tail -50 ora.bo2dbp.vip.log 2012-12-28 11:25:13.783: [RACG][2151948784] [16581][2151948784][ora.bo2dbp.vip]: clsrcexecut: env ORACLE_CONFIG_HOME=/u01/oracle/crs 2012-12-28 11:25:13.783: [RACG][2151948784] [16581][2151948784][ora.bo2dbp.vip]: clsrcexecut: cmd = /u01/oracle/crs/bin/racgeut -e _USR_ORA_DEBUG=0 54 /u01/oracle/crs/bin/racgvip start bo2dbp 2012-12-28 11:25:13.783: [RACG][2151948784] [16581][2151948784][ora.bo2dbp.vip]: clsrcexecut: rc = 1, time = 3.220s 2012-12-28 11:25:16.979: [RACG][2151948784] [16581][2151948784][ora.bo2dbp.vip]: clsrcexecut: env ORACLE_CONFIG_HOME=/u01/oracle/crs 2012-12-28 11:25:16.979: [RACG][2151948784] [16581][2151948784][ora.bo2dbp.vip]: clsrcexecut: cmd = /u01/oracle/crs/bin/racgeut -e _USR_ORA_DEBUG=0 54 /u01/oracle/crs/bin/racgvip check bo2dbp 2012-12-28 11:25:16.979: [RACG][2151948784] [16581][2151948784][ora.bo2dbp.vip]: clsrcexecut: rc = 1, time = 3.190s 2012-12-28 11:25:16.979: [RACG][2151948784] [16581][2151948784][ora.bo2dbp.vip]: end for resource = ora.bo2dbp.vip, action = start, status = 1, time = 6.430s 2012-12-28 11:25:23.807: [RACG][695611888] [17488][695611888][ora.bo2dbp.vip]: eth3: error fetching interface information: Device not found #此处提示eth3没有找到,我们希望使用的是bond1 checkIf: interface eth3 is down Invalid parameters, or failed to bring up VIP (host=bo2dbp) 2012-12-28 11:25:23.807: [RACG][695611888] [17488][695611888][ora.bo2dbp.vip]: clsrcexecut: env ORACLE_CONFIG_HOME=/u01/oracle/crs 2012-12-28 11:25:23.807: [RACG][695611888] [17488][695611888][ora.bo2dbp.vip]: clsrcexecut: cmd = /u01/oracle/crs/bin/racgeut -e _USR_ORA_DEBUG=0 54 /u01/oracle/crs/bin/racgvip start bo2dbp 2012-12-28 11:25:23.807: [RACG][695611888] [17488][695611888][ora.bo2dbp.vip]: clsrcexecut: rc = 1, time = 3.220s 2012-12-28 11:25:27.018: [RACG][695611888] [17488][695611888][ora.bo2dbp.vip]: clsrcexecut: env ORACLE_CONFIG_HOME=/u01/oracle/crs 2012-12-28 11:25:27.018: [RACG][695611888] [17488][695611888][ora.bo2dbp.vip]: clsrcexecut: cmd = /u01/oracle/crs/bin/racgeut -e _USR_ORA_DEBUG=0 54 /u01/oracle/crs/bin/racgvip check bo2dbp 2012-12-28 11:25:27.018: [RACG][695611888] [17488][695611888][ora.bo2dbp.vip]: clsrcexecut: rc = 1, time = 3.210s 2012-12-28 11:25:27.018: [RACG][695611888] [17488][695611888][ora.bo2dbp.vip]: end for resource = ora.bo2dbp.vip, action = start, status = 1, time = 6.450s 2012-12-28 11:25:33.822: [RACG][3600347632] [18308][3600347632][ora.bo2dbp.vip]: eth3: error fetching interface information: Device not found #再次出现eth3没有找到的错误提示 checkIf: interface eth3 is down Invalid parameters, or failed to bring up VIP (host=bo2dbp) 2012-12-28 11:25:33.822: [RACG][3600347632] [18308][3600347632][ora.bo2dbp.vip]: clsrcexecut: env ORACLE_CONFIG_HOME=/u01/oracle/crs 2012-12-28 11:25:33.822: [RACG][3600347632] [18308][3600347632][ora.bo2dbp.vip]: clsrcexecut: cmd = /u01/oracle/crs/bin/racgeut -e _USR_ORA_DEBUG=0 54 /u01/oracle/crs/bin/racgvip start bo2dbp 2012-12-28 11:25:33.822: [RACG][3600347632] [18308][3600347632][ora.bo2dbp.vip]: clsrcexecut: rc = 1, time = 3.210s 2012-12-28 11:25:37.063: [RACG][3600347632] [18308][3600347632][ora.bo2dbp.vip]: clsrcexecut: env ORACLE_CONFIG_HOME=/u01/oracle/crs 2012-12-28 11:25:37.063: [RACG][3600347632] [18308][3600347632][ora.bo2dbp.vip]: clsrcexecut: cmd = /u01/oracle/crs/bin/racgeut -e _USR_ORA_DEBUG=0 54 /u01/oracle/crs/bin/racgvip check bo2dbp 2012-12-28 11:25:37.063: [RACG][3600347632] [18308][3600347632][ora.bo2dbp.vip]: clsrcexecut: rc = 1, time = 3.240s 2012-12-28 11:25:37.063: [RACG][3600347632] [18308][3600347632][ora.bo2dbp.vip]: end for resource = ora.bo2dbp.vip, action = start, status = 1, time = 6.490s #从上面的日志可知还是那个网络接口名的问题 #也可以看出action = start, status = 1, time = 6.490s 这个地方应该是Target为Online,而实际上State为offline #网络接口层也改了,那就是这个eth3还在OCR中没有被更新,接下来尝试更新 bo2dbp:/u01/oracle/crs/bin # ./srvctl modify nodeapps -n bo2dbp -A 192.168.7.61/255.255.255.0/bond1 #对第二个节点上也采用相同的方式来更新 bo2dbs:~ # /u01/oracle/crs/bin/srvctl modify nodeapps -n bo2dbs -A 192.168.7.62/255.255.255.0/bond1 #此时成功启动vip oracle@bo2dbp:~> crs_start ora.bo2dbp.vip Attempting to start `ora.bo2dbp.vip` on member `bo2dbp` Start of `ora.bo2dbp.vip` on member `bo2dbp` succeeded. #接下来查看ons的日志信息 oracle@bo2dbp:/u01/oracle/crs/log/bo2dbp/racg> tail -20 ora.bo2dbp.ons.log ............ onscfg[0] {node = bo2dbp.2gotrade.com, port = 6200} Adding remote host bo2dbp.2gotrade.com:6200 onscfg[1] {node = bo2dbs.2gotrade.com, port = 6200} Adding remote host bo2dbs.2gotrade.com:6200 ons is n 2012-12-28 11:00:49.345: [ RACG][2554102256] [19071][2554102256][ora.bo2dbp.ons]: ot running ... 2012-12-28 11:00:49.345: [ RACG][2554102256] [19071][2554102256][ora.bo2dbp.ons]: clsrcexecut: env ORACLE_CONFIG_HOME=/u01/oracle/crs 2012-12-28 11:00:49.345: [ RACG][2554102256] [19071][2554102256][ora.bo2dbp.ons]: clsrcexecut: cmd = /u01/oracle/crs/bin/racgeut -e _USR_ORA_DEBUG=0 540 /u01/oracle/crs/bin/onsctl ping 2012-12-28 11:00:49.345: [ RACG][2554102256] [19071][2554102256][ora.bo2dbp.ons]: clsrcexecut: rc = 1, time = 0.210s 2012-12-28 11:00:49.346: [ RACG][2554102256] [19071][2554102256][ora.bo2dbp.ons]: end for resource = ora.bo2dbp.ons, action = start, status = 1, time = 7.560s 2012-12-28 11:00:55.661: [ RACG][368746992] [19812][368746992][ora.bo2dbp.ons]: onsctl: shutting down ons daemon ... CONNECT: Connection refused Number of onsconfiguration retrieved, numcfg = 2 onscfg[0] {node = bo2dbp.2gotrade.com, port = 6200} Adding remote host bo2dbp.2gotrade.com:6200 onscfg[1] {node = bo2dbs.2gotrade.com, 2012-12-28 11:00:55.661: [ RACG][368746992] [19812][368746992][ora.bo2dbp.ons]: port = 6200} ............... #关于ons的这个错误,记得之前有类似的情形,之前是在安装的时候碰到的,是由于没有本地回环造成的。 #这个问题再次出现了,由于从原来的配置复制/etc/hosts时不小心那个地方被注释掉了,汗...... #具体参考 http://blog.csdn.net/robinson_0612/article/details/6303583 #尝试启动ons成功 oracle@bo2dbp:~> crs_start ora.bo2dbp.ons Attempting to start `ora.bo2dbp.ons` on member `bo2dbp` Start of `ora.bo2dbp.ons` on member `bo2dbp` succeeded. #Author : Robinson #Blog : http://blog.csdn.net/robinson_0612 5、小结 a、对于RAC环境下虚拟机的迁移之后,首选需要更改各个虚拟机配置文件中所有磁盘的路径(local disk,asmdisk,ocr,votingdisk) b、添加虚拟机方式使用了复制方式(移动方式不确定)后将导致网卡发生变化,主要是保证MAC地址唯一 c、需要对网络重新进行配置,如果希望使用原来的网络接口名,则重命名或命令行下修改相关配置文件使得保留原有网卡名 d、如果使用了X window方式修改网络配置,应注意hosts文件是否发生相应的修改 e、如果使用了新的网络接口名或者新的IP地址,应该重新配置集群网络层 f、同时也需要将新网络接口名或者新的IP地址更新的OCR g、最后一句,从日志来分析与解决问题是源头,是快速定位问题的最佳途径
更多参考
有关Oracle RAC请参考
使用crs_setperm修改RAC资源的所有者及权限
使用crs_profile管理RAC资源配置文件
RAC 数据库的启动与关闭
再说 Oracle RAC services
Services in Oracle Database 10g
Migrate datbase from single instance to Oracle RAC
Oracle RAC 连接到指定实例
Oracle RAC 负载均衡测试(结合服务器端与客户端)
Oracle RAC 服务器端连接负载均衡(Load Balance)
Oracle RAC 客户端连接负载均衡(Load Balance)
ORACLE RAC 下非缺省端口监听配置(listener.ora tnsnames.ora)
ORACLE RAC 监听配置 (listener.ora tnsnames.ora)
配置 RAC 负载均衡与故障转移
CRS-1006 , CRS-0215 故障一例
基于Linux (RHEL 5.5) 安装Oracle 10g RAC
使用 runcluvfy 校验Oracle RAC安装环境
有关Oracle 网络配置相关基础以及概念性的问题请参考:
配置非默认端口的动态服务注册
配置sqlnet.ora限制IP访问Oracle
Oracle 监听器日志配置与管理
设置 Oracle 监听器密码(LISTENER)
配置ORACLE 客户端连接到数据库
有关基于用户管理的备份和备份恢复的概念请参考
Oracle 冷备份
Oracle 热备份
Oracle 备份恢复概念
Oracle 实例恢复
Oracle 基于用户管理恢复的处理
SYSTEM 表空间管理及备份恢复
SYSAUX表空间管理及恢复
Oracle 基于备份控制文件的恢复(unsing backup controlfile)
有关RMAN的备份恢复与管理请参考
RMAN 概述及其体系结构
RMAN 配置、监控与管理
RMAN 备份详解
RMAN 还原与恢复
RMAN catalog 的创建和使用
基于catalog 创建RMAN存储脚本
基于catalog 的RMAN 备份与恢复
RMAN 备份路径困惑
使用RMAN实现异机备份恢复(WIN平台)
使用RMAN迁移文件系统数据库到ASM
linux 下RMAN备份shell脚本
使用RMAN迁移数据库到异机
有关ORACLE体系结构请参考
Oracle 表空间与数据文件
Oracle 密码文件
Oracle 参数文件
Oracle 联机重做日志文件(ONLINE LOG FILE)
Oracle 控制文件(CONTROLFILE)
Oracle 归档日志
Oracle 回滚(ROLLBACK)和撤销(UNDO)
Oracle 数据库实例启动关闭过程
Oracle 10g SGA 的自动化管理
Oracle 实例和Oracle数据库(Oracle体系结构)