Oracle 11g RAC 添加节点故障之--CRS资源启动故障

简介:

系统环境:

操作系统:RedHat EL5.5

集群软件: GI 11G

数据库软件:Oracle 11.2.0.1

故障原因:

   由于新节点(node3)是从原有的老节点(node2),克隆而来,在添加新节点时,忘了修改新节点的主机名,导致出现以下故障:

1、新节点CRS service启动正常

[root@node3 ~]# crsctl check crs

CRS-4638: Oracle High Availability Services is online

CRS-4537: Cluster Ready Services is online

CRS-4529: Cluster Synchronization Services is online

CRS-4533: Event Manager is online

2、listener 资源启动失败

[root@node3 ~]# crs_stat -t

Name           Type           Target    State     Host        

------------------------------------------------------------

ora.DG1.dg     ora....up.type ONLINE    ONLINE    node1      

ora....ER.lsnr ora....er.type ONLINE    ONLINE    node1      

ora....N1.lsnr ora....er.type ONLINE    ONLINE    node1      

ora....VOTE.dg ora....up.type ONLINE    ONLINE    node1      

ora.RCY.dg     ora....up.type ONLINE    ONLINE    node1      

ora.asm        ora.asm.type   ONLINE    ONLINE    node1      

ora.eons       ora.eons.type  ONLINE    ONLINE    node1      

ora.gsd        ora.gsd.type   ONLINE    ONLINE    node1      

ora....network ora....rk.type ONLINE    ONLINE    node1      

ora....SM1.asm application    ONLINE    ONLINE    node1      

ora....E1.lsnr application    ONLINE    ONLINE    node1      

ora.node1.gsd  application    ONLINE    ONLINE    node1      

ora.node1.ons  application    ONLINE    ONLINE    node1      

ora.node1.vip  ora....t1.type ONLINE    ONLINE    node1      

ora....SM2.asm application    ONLINE    ONLINE    node2      

ora....E2.lsnr application    ONLINE    ONLINE    node2      

ora.node2.gsd  application    ONLINE    ONLINE    node2      

ora.node2.ons  application    ONLINE    ONLINE    node2      

ora.node2.vip  ora....t1.type ONLINE    ONLINE    node2      

ora....SM3.asm application    ONLINE    ONLINE    node3      

ora....E3.lsnr application    OFFLINE   OFFLINE        

ora.node3.gsd  application    ONLINE    ONLINE    node3      

ora.node3.ons  application    ONLINE    ONLINE    node3      

ora.oc4j       ora.oc4j.type  ONLINE    ONLINE    node2      

ora.ons        ora.ons.type   ONLINE    ONLINE    node1      

ora.pmydb.db   ora....se.type OFFLINE   OFFLINE              

ora....taf.svc ora....ce.type OFFLINE   OFFLINE              

ora.prod.db    ora....se.type OFFLINE   OFFLINE              

ora....ry.acfs ora....fs.type ONLINE    ONLINE    node1      

ora.scan1.vip  ora....ip.type ONLINE    ONLINE    node1  

[root@node3 ~]# crs_stat |grep lsn

NAME=ora.LISTENER.lsnr

NAME=ora.LISTENER_SCAN1.lsnr

NAME=ora.node1.LISTENER_NODE1.lsnr

NAME=ora.node2.LISTENER_NODE2.lsnr

NAME=ora.node3.LISTENER_NODE3.lsnr

NAME=ora.node3.LISTENER_NODE3.lsnr 服务启动失败

3、listener 资源故障解决方法:

手工启动:

[root@node3 ~]# crs_start  ora.node3.LISTENER_NODE3.lsnr -f

CRS-2527: Unable to start 'ora.LISTENER.lsnr' because it has a 'hard' dependency on 'ora.cluster_vip_net1.type'

CRS-2525: All instances of the resource 'ora.node1.vip' are already running; relocate is not allowed because the force option was not specified

CRS-2525: All instances of the resource 'ora.node2.vip' are already running; relocate is not allowed because the force option was not specified

CRS-0222: Resource 'ora.node3.LISTENER_NODE3.lsnr' has dependency error.

从以上日志可以看出,listener 在启动时,缺少vip的支持(node3 vip service)

[root@node3 ~]# crs_stat |grep vip

NAME=ora.node1.vip

TYPE=ora.cluster_vip_net1.type

NAME=ora.node2.vip

TYPE=ora.cluster_vip_net1.type

NAME=ora.scan1.vip

TYPE=ora.scan_vip.type

缺少node vip service!

添加vip service:

[root@node3 ~]# srvctl add -h

The SRVCTL add command adds the configuration and the Oracle Clusterware application to the OCR for the cluster database, named instances, named services, or for the named nodes.

Usage: srvctl add vip -n <node_name> -k <network_number> -A <name|ip>/<netmask>/[if1[|if2...]] [-v]

[root@node1 ~]# cat /etc/hosts

# Do not remove the following line, or various programs

# that require network functionality will fail.

127.0.0.1      localhost

192.168.8.41   node1

192.168.8.43   node1-vip

10.1.1.1       node1-priv

192.168.8.42   node2

192.168.8.44   node2-vip

10.1.1.2       node2-priv

192.168.8.45   scan_ip

192.168.8.46   node3

192.168.8.47   node3-vip

10.1.1.3       node3-priv

[root@node3 ~]#  srvctl add vip -n node3 -A 192.168.8.47/255.255.255.0/eth0 -k 0

PRCN-2049 : The network attributes specified (network number: 0, subnet: 192.168.8.0, adapters: eth0) conflict with an already registered network (network number: 1, subnet: 192.168.8.0, adapters: eth0)

[root@node3 ~]#  srvctl add vip -n node3 -A 192.168.8.47/255.255.255.0/eth0 -k 1

[root@node3 ~]#

添加node3 vip service 成功!

[root@node3 ~]# crs_stat |grep vip

NAME=ora.node1.vip

TYPE=ora.cluster_vip_net1.type

NAME=ora.node2.vip

TYPE=ora.cluster_vip_net1.type

NAME=ora.node3.vip

TYPE=ora.cluster_vip_net1.type

NAME=ora.scan1.vip

TYPE=ora.scan_vip.type

[root@node3 ~]#

启动ora.node3.LISTENER_NODE3.lsnr service:

[root@node3 ~]# crs_stat |grep lsn

NAME=ora.LISTENER.lsnr

NAME=ora.LISTENER_SCAN1.lsnr

NAME=ora.node1.LISTENER_NODE1.lsnr

NAME=ora.node2.LISTENER_NODE2.lsnr

NAME=ora.node3.LISTENER_NODE3.lsnr


[root@node3 ~]# crs_start -f ora.node3.LISTENER_NODE3.lsnr

Attempting to start `ora.node3.vip` on member `node3`

Start of `ora.node3.vip` on member `node3` succeeded.

Attempting to start `ora.LISTENER.lsnr` on member `node3`

Start of `ora.LISTENER.lsnr` on member `node3` succeeded.

[root@node3 ~]#

node3 Grid 日志:

[root@node3 node3]# pwd

/u01/11.2.0/grid/log/node3

[root@node3 node3]# tail -f alertnode3.log

[gpnpd(7901)]CRS-2332:Error pushing GPnP profile to "mdns:service:gpnp._tcp.local.://node1:12802/agent=gpnpd,cname=node-cluster,host=node1,pid=3431/gpnpd h:node1 c:node-cluster".

2014-04-17 14:47:42.399

[ctssd(8044)]CRS-2408:The clock on host node3 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.

2014-04-17 14:48:14.444

[ctssd(8044)]CRS-2408:The clock on host node3 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.

2014-04-17 14:48:18.952

[cssd(7954)]CRS-1601:CSSD Reconfiguration complete. Active nodes are node1 node2 node3 .

[ctssd(8044)]CRS-2408:The clock on host node3 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.

2014-04-17 14:49:34.501

[ctssd(8044)]CRS-2408:The clock on host node3 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.

2014-04-17 14:49:58.515

[ctssd(8044)]CRS-2408:The clock on host node3 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.

2014-04-17 14:50:22.552

[ctssd(8044)]CRS-2408:The clock on host node3 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.

2014-04-17 14:50:46.595

[ctssd(8044)]CRS-2408:The clock on host node3 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.


4、添加Oracle 节点

CRS资源已经启动成功,但在node1 上添加 node3 的Oracle 节点时,报以下错误!

wKioL1NPlOKx6eb6AAKzT7Swcho980.jpg

    重新启动node的CRS service ,仍然不能解决,最后尝试在node2 上添加node3 的Oracle新节点,却成功(node1 不能添加,还是个不解之谜;可能是在添加CRS节点时是在node1上执行,添加出现了错误(在node3 运行root.sh时)导致node不能正常识别node3的新节点)


node3 记录以下日志:

[root@node3 node3]# tail -f alertnode3.log

[gpnpd(7901)]CRS-2332:Error pushing GPnP profile to "mdns:service:gpnp._tcp.local.://node1:12802/agent=gpnpd,cname=node-cluster,host=node1,pid=3431/gpnpd h:node1 c:node-cluster".

2014-04-17 14:47:42.399











本文转自 客居天涯 51CTO博客,原文链接:http://blog.51cto.com/tiany/1397335,如需转载请自行联系原作者
目录
相关文章
|
3月前
|
存储 Oracle 关系型数据库
数据库数据恢复—ORACLE常见故障的数据恢复方案
Oracle数据库常见故障表现: 1、ORACLE数据库无法启动或无法正常工作。 2、ORACLE ASM存储破坏。 3、ORACLE数据文件丢失。 4、ORACLE数据文件部分损坏。 5、ORACLE DUMP文件损坏。
190 11
|
5月前
|
存储 Oracle 关系型数据库
数据库数据恢复—Oracle ASM磁盘组故障数据恢复案例
Oracle数据库数据恢复环境&故障: Oracle ASM磁盘组由4块磁盘组成。Oracle ASM磁盘组掉线 ,ASM实例不能mount。 Oracle数据库故障分析&恢复方案: 数据库数据恢复工程师对组成ASM磁盘组的磁盘进行分析。对ASM元数据进行分析发现ASM存储元数据损坏,导致磁盘组无法挂载。
|
7月前
|
Oracle 关系型数据库
分布式锁设计问题之Oracle RAC保证多个节点写入内存Page的一致性如何解决
分布式锁设计问题之Oracle RAC保证多个节点写入内存Page的一致性如何解决
102 0
|
8月前
|
负载均衡 Oracle 关系型数据库
关系型数据库Oracle 资源共享
【7月更文挑战第10天】
63 1
|
8月前
|
Oracle 关系型数据库 数据库
关系型数据库Oracle 故障转移能力
【7月更文挑战第10天】
90 2
|
10月前
rac 节点驱逐
rac 节点驱逐
70 0
|
10月前
|
运维 Oracle 关系型数据库
服务器数据恢复-raid5故障导致上层oracle数据库故障的数据恢复案例
服务器数据恢复环境: 一台服务器中有一组由24块FC硬盘组建的raid5磁盘阵列,linux操作系统+ext3文件系统,服务器上层部署有oracle数据库。 服务器故障&检测: raid5阵列中有两块硬盘出现故障掉线,导致服务器上层卷无法挂载,oracle数据库无法正常使用。 通过管理后台查看服务器中硬盘的状态,显示有两块硬盘处于离线状态。
|
Oracle 关系型数据库
oracle ORA-00054 资源正忙
oracle ORA-00054 资源正忙
|
存储 Oracle 关系型数据库
Oracle中“ORA-00060: 等待资源时检测到死锁” 或存储过程编译卡死的一种解决方法
Oracle中“ORA-00060: 等待资源时检测到死锁” 或存储过程编译卡死的一种解决方法
934 0
|
存储 Oracle 算法
数据库数据恢复-ORACLE数据库常见故障的数据恢复可能性分析
ORACLE数据库常见故障: 1、ORACLE数据库无法启动或无法正常工作。 2、ORACLE数据库ASM存储破坏。 3、ORACLE数据库数据文件丢失。 4、ORACLE数据库数据文件部分损坏。 5、ORACLE数据库DUMP文件损坏。

推荐镜像

更多