【RAC】PMON: terminating the instance due to error 481-阿里云开发者社区

【RAC】PMON: terminating the instance due to error 481

2016-04-14 4676

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

简介：

Applies to:

Oracle Server - Enterprise Edition - Version: 11.2.0.2.0 and later [Release: 11.2 and later ]

Information in this document applies to any platform.

Symptoms

On 11.2.0.2+ cluster, instance is running on one node, startup instance on the other node(s) fails with:

PMON (ospid: 487580): terminating the instance due to error 481

If ASM is used, +ASMn alert log shows:

Sat Oct 01 19:19:38 2011

MMNL started with pid=21, OS id=6488362

lmon registered with NM - instance number 2 (internal mem no 1)

Sat Oct 01 19:21:37 2011

PMON (ospid: 4915562): terminating the instance due to error 481

Sat Oct 01 19:21:37 2011

System state dump requested by (instance=2, sid=4915562 (PMON)), summary=[abnormal instance termination].

System State dumped to trace file /u01/app/oracle/diag/asm/+asm/+ASM2/trace/+ASM2_diag_4915388.trc

Dumping diagnostic data in directory=[cdmp_20111001192138], requested by (instance=2, sid=4915562 (PMON)), summary=[abnormal instance termination].

Sat Oct 01 19:21:38 2011

License high water mark = 1

Instance terminated by PMON, pid = 4915562

+ASMn_diag_xxx.trc trace shows:

*** 2011-10-01 19:19:37.526

Reconfiguration starts [incarn=0]

*** 2011-10-01 19:19:37.526

I'm the voting node

Group reconfiguration cleanup

kjzdattdlm: Can not attach to DLM (LMON up=[TRUE], DB mounted=[FALSE]).

...... << repeated messages

If ASM is not used, then DB instance could fail with the same error:

Mon Jul 04 16:22:50 2011

Starting ORACLE instance (normal)

...

Mon Jul 04 16:22:54 2011

MMNL started with pid=24, OS id=667660

starting up 1 shared server(s) ...

lmon registered with NM - instance number 2 (internal mem no 1)

Mon Jul 04 16:26:15 2011

PMON (ospid: 487580): terminating the instance due to error 481

lmon trace shows:

*** 2011-07-04 16:22:59.852

=====================================================

kjxgmpoll: CGS state (0 1) start 0x4e11785e cur 0x4e117863 rcfgtm 5 sec

...

*** 2011-07-04 16:26:14.248

=====================================================

kjxgmpoll: CGS state (0 1) start 0x4e11785e cur 0x4e117926 rcfgtm 200 sec

dia0 trace shows:

*** 2011-07-04 16:22:53.414

Reconfiguration starts [incarn=0]

*** 2011-07-04 16:22:53.414

I'm the voting node

Group reconfiguration cleanup

kjzdattdlm: Can not attach to DLM (LMON up=[TRUE], DB mounted=[FALSE]).

...<< repeated message

Changes

This could happen during patching or after node reboot.

Cause

The problem is caused by HAIP is not ONLINE on either the running node or the problem node(s).

Basically the ASM or DB instance(s) can not startup if they use a different cluster_interconnect than the running instance.

With HAIP ONLINE, all instances (DB and ASM) should use HAIP IP address: 169.254.x.x.

If on any node HAIP is OFFLINE, the ASM and DB instance will use the native private network address which causes communication problem with the instance using HAIP.

Use the following commands to verify HAIP status, as grid user:

$ crsctl stat res -t -init

check for resource ora.cluster_interconnect.haip status.

In this example, HAIP is OFFLINE on the running node 1, hence +ASM1 is using 10.1.1.1 as cluster_interconnect, while on node 2 HAIP is ONLINE, +ASM2 is using HAIP 169.254.239.144 as cluster_interconnect, causing communication problem between them and +ASM2 can not startup.

alert_+ASM1.log shows:

Cluster communication is configured to use the following interface(s) for this instance

10.1.1.1

alert_+ASM2.log shows:

Cluster communication is configured to use the following interface(s) for this instance

169.254.239.144

Solution

The solution is to start HAIP on all nodes before start ASM or DB instance by either restart HAIP resource or restart the GI stack.

For this example, +ASM1 was started first with HAIP OFFLINE:

1. Try to start HAIP manually on node 1

as grid user:

$ crsctl start res ora.cluster_interconnect.haip -init

To verify:

$ crsctl stat res -t -init

2. If this succeeds, then restart ora.asm resource (note, this will bring down all dependent diskgroup resource and db resource):

as root user:

# crsctl stop res ora.crsd -init

# crsctl stop res ora.asm -init -f

# crsctl start res ora.asm -init

# crsctl start res ora.crsd -init

startup any dependent resource as necessary

3. If above does not help, try to restart the GI stack on node 1, see if HAIP can be ONLINE after that.

As root user:

# crsctl stop crs

# crsctl start crs

Check $GRID_HOME/log//agent/ohasd/orarootagent_root/orarootagent_root.log for any HAIP error.

4. Once HAIP is ONLINE on node 1, proceed to start ASM on the rest of cluster nodes and ensure HAIP are ONLINE on all nodes.

$ crsctl start res ora.asm -init

ASM or DB instances should be able to start on all nodes after above.

【RAC】PMON: terminating the instance due to error 481

热门文章

最新文章

相关电子书

探索云世界

热门

云计算

大数据

云原生

人工智能

数据库

开发与运维

活动广场

任务中心

训练营

直播

乘风者计划

下载

镜像站

技术资料

【RAC】PMON: terminating the instance due to error 481

热门文章

最新文章

相关电子书