查看日志文件没有任何日志。报错
查看MESSAGE日志仅仅有,
Feb 11 23:23:32 RACDB01 grid: [ID 702911 user.error]
exec /u01/app/grid/product/11.2.0/grid/perl/bin/perl -I/u01/app/grid/product/11.2.0/grid/perl/lib /u01/app/grid/product/11.2.0/grid/bin/crswrapexece.pl
而且启动貌似HANG主了一段时间。
查看文档,可能的原因
1、 /u01/app/grid/product/11.2.0/grid/perl/bin/perl 权限不对
2、 /tmp/.oracle/npohasd 问题,这好像是一个11.2.0.1的bug。
3、可能有系统RUNLEVEL问题,必须是3和5级别
1、 /u01/app/grid/product/11.2.0/grid/perl/bin/perl问题
杀掉 OHASH.REBOOT进程
然后处理
HAS Does Not Start After Server Reboot. CRS-4124 And CRS-4000 Errors (文档 ID 1624661.1)
2、 /tmp/.oracle/npohasd 问题
杀掉 OHASH.REBOOT进程
然后处理
Cluster failed to start due to problem with socket pipe npohasd (文档 ID 1612325.1)
CAUSE
Permission issue
Relinked the binaries and restarted the server again so that init.ohasd came up fine, but ohasd and other daemons wouldn't start and no sockets get created
OS start S96ohasd, it will wait for init.ohasd to write the pipe.
What happened here is init.ohasd was started, then all socket files got removed by the manual removal, then when you start ohasd again, it will wait there since those socket files was removed manually
SOLUTION
WORKAROUND:
-----------
Clear all sockets under /var/tmp/.oracle or /tmp/.oracle if any and then open two terminals of the same node, where stack is not coming up.
1) On Terminal 1 , issue as Root user :-
crsctl start crs
2) Simultaneously , on node2 , issue below command as Root user , once npohasd socket has been created.
/bin/dd if=/tmp/.oracle/npohasd of=/dev/null bs=1024 count=1
3) Now if you check on terminal 1 , the CRS stack would start coming up.
ps -ef |grep d.bin
4) Once entire CRS stack is up, you can press CTRL+C and come out of the dd command running on 2nd terminal.
Check and validate all resources are online using
crsctl stat res -t
crsctl stat res -t -init
CRS-4124: Oracle High Availability Services startup failed.
CRS-4000: Command Start failed, or completed with errors.
CRS-4000: Command Start failed, or completed with errors.
查看MESSAGE日志仅仅有,
Feb 11 23:23:32 RACDB01 grid: [ID 702911 user.error]
exec /u01/app/grid/product/11.2.0/grid/perl/bin/perl -I/u01/app/grid/product/11.2.0/grid/perl/lib /u01/app/grid/product/11.2.0/grid/bin/crswrapexece.pl
而且启动貌似HANG主了一段时间。
查看文档,可能的原因
1、 /u01/app/grid/product/11.2.0/grid/perl/bin/perl 权限不对
2、 /tmp/.oracle/npohasd 问题,这好像是一个11.2.0.1的bug。
3、可能有系统RUNLEVEL问题,必须是3和5级别
1、 /u01/app/grid/product/11.2.0/grid/perl/bin/perl问题
杀掉 OHASH.REBOOT进程
然后处理
HAS Does Not Start After Server Reboot. CRS-4124 And CRS-4000 Errors (文档 ID 1624661.1)
CAUSE
The ownership of perl execution file in GRID_HOME has been changed for some reason. It should be owned by grid user, in this case, it is not "oracle", but "grid"
ls -l /u01/app/grid/product/11.2.0/grid/perl/bin/perl
-rwx------. 1 oracle oinstall 1424555 Jul 21 2011 perl
SOLUTION
Rectify the perl owner to GRID owner. In this case the GRID owner is grid user:
chown grid /u01/app/grid/product/11.2.0/grid/perl/bin/perl
-rwx------. 1 grid oinstall 1424555 Jul 21 2011 perl
2、 /tmp/.oracle/npohasd 问题
杀掉 OHASH.REBOOT进程
然后处理
Cluster failed to start due to problem with socket pipe npohasd (文档 ID 1612325.1)
CRS does not start after server reboot and manually start CRS/HAS fails with CRS-4124, CRS-4000:
# /u01/app/grid/product/11.2.0/grid/bin/crsctl start has
CRS-4124: Oracle High Availability Services startup failed.
CRS-4000: Command Start failed, or completed with errors.
CRS-4124: Oracle High Availability Services startup failed.
CRS-4000: Command Start failed, or completed with errors.
There is no update in the alert_<hostname>.log or other CRS log files. From the OS system log it shows:
Feb 11 23:23:32 RACDB01 grid: [ID 702911 user.error] exec /u01/app/grid/product/11.2.0/grid/perl/bin/perl -I/u01/app/grid/product/11.2.0/grid/perl/lib /u01/app/grid/product/11.2.0/grid/bin/crswrapexece.pl
/u01/app/grid/product/11.2.0/grid/crs/install/s_crsconfig_racdb01_env.txt /u01/app/grid/product/11.2.0/grid/bin/ohasd.bin "reboot"
/u01/app/grid/product/11.2.0/grid/crs/install/s_crsconfig_racdb01_env.txt /u01/app/grid/product/11.2.0/grid/bin/ohasd.bin "reboot"
CAUSE
Permission issue
Relinked the binaries and restarted the server again so that init.ohasd came up fine, but ohasd and other daemons wouldn't start and no sockets get created
OS start S96ohasd, it will wait for init.ohasd to write the pipe.
What happened here is init.ohasd was started, then all socket files got removed by the manual removal, then when you start ohasd again, it will wait there since those socket files was removed manually
SOLUTION
WORKAROUND:
-----------
Clear all sockets under /var/tmp/.oracle or /tmp/.oracle if any and then open two terminals of the same node, where stack is not coming up.
1) On Terminal 1 , issue as Root user :-
crsctl start crs
2) Simultaneously , on node2 , issue below command as Root user , once npohasd socket has been created.
/bin/dd if=/tmp/.oracle/npohasd of=/dev/null bs=1024 count=1
3) Now if you check on terminal 1 , the CRS stack would start coming up.
ps -ef |grep d.bin
4) Once entire CRS stack is up, you can press CTRL+C and come out of the dd command running on 2nd terminal.
Check and validate all resources are online using
crsctl stat res -t
crsctl stat res -t -init