以下内容为模拟操作,因为客户核心数据库是不允许把日志拿出来的哈。
不过处理过程和思路几乎是一样的。
一、故障描述
11G RAC -单机ADG,备端HAS服务无法启动。
二、现象
查看了集群的各种日志,均无任何日志输出。
[root@roidb2 bin]# pwd
/u01/app/11.2.0/grid/bin
[root@roidb2 bin]# ./crsctl start has
[root@roidb2 bin]# --无输出,不提示报错,也不提示成功启动
[root@roidb2 bin]#
怎么办,怎么办?第一次遇到这样的问题。问了客户,说了周五做了搬迁工作,难道是磁盘出了问题,还是权限出了问题。按照这个思路查了一遍,也没有什么发现。回过头来,整理了一下思路,使用strace来看一下,也许会有意想不到的收获。
[root@roidb2 bin]# strace ./crsctl start has
execve("./crsctl", ["./crsctl", "start", "has"], [/* 28 vars */]) = -1 ENOEXEC (Exec format error) --格式错误
dup(2) = 3
fcntl(3, F_GETFL) = 0x8002 (flags O_RDWR|O_LARGEFILE)
fstat(3, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 1), ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7ff367e29000
lseek(3, 0, SEEK_CUR) = -1 ESPIPE (Illegal seek)
write(3, "strace: exec: Exec format error\n", 32strace: exec: Exec format error
) = 32
close(3) = 0
munmap(0x7ff367e29000, 4096) = 0
exit_group(1) = ?
[root@roidb2 bin]#
为什么会是这样的报错,难道是文件又问题?继续往下查..........
[root@roidb2 bin]# ls -l crsctl
-rwxr-xr-x 1 root root 0 Dec 11 20:54 crsctl
[root@roidb2 bin]# file crsctl
crsctl: empty --竟然是空文件!!!!!!!
[root@roidb2 bin]#
怎么办,怎么办?我们知道这是一个脚本文件,那么,我们从其他节点copy一个文件怎么样呢?
三、处理过程
--远程传输一个文件过来呗
<roidb1:+ASM1:/home/grid>$scp /u01/app/11.2.0/grid/bin/crsctl root@192.168.1.212:/u01/app/11.2.0/grid/bin/
root@192.168.1.212's password:
crsctl 100% 8574 8.4KB/s 00:00
<roidb1:+ASM1:/home/grid>$
[root@roidb2 bin]# file crsctl
crsctl: POSIX shell script text executable
[root@roidb2 bin]# ./crsctl start has
CRS-4123: Oracle High Availability Services has been started.
[root@roidb2 bin]#
--搞定
--学习官方都怎么写脚本
[root@roidb2 bin]# cat crsctl
#!/bin/sh
#
# Copyright (c) 2001, 2013, Oracle and/or its affiliates. All rights reserved.
# Notes:
# - This script should only use clsecho.bin directly and not clsecho(which is
# this same script).
# - FIXME: crswrap should process hostname locally as well just like init.ohasd.
### Main ###
ORA_CRS_HOME=/u01/app/11.2.0/grid
MY_HOST=roidb1
ORACLE_USER=grid
ORACLE_HOME=/u01/app/11.2.0/grid
CRF_HOME=/u01/app/11.2.0/grid
export ORA_CRS_HOME ORACLE_HOME CRF_HOME
#limits
CRS_LIMIT_CORE=unlimited
CRS_LIMIT_MEMLOCK=unlimited
CRS_LIMIT_OPENFILE=65536
CRS_LIMIT_STACK=2048
#export the limit variables
export CRS_LIMIT_CORE CRS_LIMIT_MEMLOCK CRS_LIMIT_OPENFILE CRS_LIMIT_STACK
#listener
CRS_LSNR_STACK=10240
export CRS_LSNR_STACK
# Unset env var ORACLE_BASE before spawning any processes.
unset ORACLE_BASE
[ -z "$PERL" ] && PERL="/u01/app/11.2.0/grid/perl/bin/perl -I${ORA_CRS_HOME}/perl/lib"
LOGMSG="/bin/logger -puser.err"
CLSECHO="/u01/app/11.2.0/grid/bin/clsecho.bin"
PLATFORM=`/bin/uname`
case $PLATFORM in
Linux)
ORACLUSTER_LIB=/etc/ORCLcluster/lib
LD_LIBRARY_PATH=/u01/app/11.2.0/grid/lib:$ORACLUSTER_LIB
export LD_LIBRARY_PATH
# forcibly eliminate LD_ASSUME_KERNEL to ensure NPTL where available
LD_ASSUME_KERNEL=
export LD_ASSUME_KERNEL
LOGGER="/usr/bin/logger"
if [ ! -f "$LOGGER" ];then
LOGGER="/bin/logger"
fi
LOGMSG="$LOGGER -puser.err"
;;
HP-UX) MACH_HARDWARE=`/bin/uname -m`
if [ "$MACH_HARDWARE" = "ia64" ]; then
SO_EXT=so
NMAPIDIR_64=/opt/nmapi/nmapi2/lib/hpux64
NMAPIDIR_32=/opt/nmapi/nmapi2/lib/hpux32
else
SO_EXT=sl
NMAPIDIR_64=/opt/nmapi/nmapi2/lib/pa20_64
NMAPIDIR_32=/opt/nmapi/nmapi2/lib
fi
case $0 in
*/lsnodes|lsnodes)
if [ ! -f $NMAPIDIR_64/libnmapi2.so -a ! -f $NMAPIDIR_32/libnmapi2.so ]; then
/bin/echo "No vendor clusterware installed."
exit 1
fi
;;
esac
LD_LIBRARY_PATH=/u01/app/11.2.0/grid/lib:$NMAPIDIR_64:/usr/lib:$LD_LIBRARY_PATH
SHLIB_PATH=/u01/app/11.2.0/grid/lib32:$NMAPIDIR_32:$SHLIB_PATH
export LD_LIBRARY_PATH
export SHLIB_PATH
;;
SunOS) ARCH_NAME=`/bin/uname -p`
if [ "${ARCH_NAME}" = "sparc" ]; then
LD_LIBRARY_PATH_64=/u01/app/11.2.0/grid/lib:/opt/ORCLcluster/lib:/usr/lib/sparcv9:/usr/ucblib/sparcv9:$LD_LIBRARY_PATH_64
else
LD_LIBRARY_PATH_64=/u01/app/11.2.0/grid/lib:/opt/ORCLcluster/lib:/usr/lib/amd64:/usr/ucblib/amd64:$LD_LIBRARY_PATH_64
fi
LD_LIBRARY_PATH=/u01/app/11.2.0/grid/lib:/opt/ORCLcluster/lib:/usr/lib:/usr/ucblib:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH_64
export LD_LIBRARY_PATH
GREP='/usr/bin/grep'
/usr/bin/coreadm | $GREP 'process core dumps' | $GREP 'enabled' > /dev/null
STATUS1=$?
/usr/bin/coreadm | $GREP 'global core dumps' | $GREP 'enabled' > /dev/null
STATUS2=$?
if [ "$STATUS1" != "0" ] && [ "$STATUS2" != "0" ];
then
/usr/bin/coreadm -e global > /dev/null 2>&1
fi
/usr/bin/coreadm | $GREP 'process setid' | $GREP 'enabled' > /dev/null
STATUS1=$?
/usr/bin/coreadm | $GREP 'global setid' | $GREP 'enabled' > /dev/null
STATUS2=$?
if [ "$STATUS1" != "0" ] && [ "$STATUS2" != "0" ];
then
/usr/bin/coreadm -e global-setid > /dev/null 2>&1
fi
# Solaris allows partitioning of resources by Projects.
# On Solaris, start crsd/ohasd using the default Project of
# the owner of the Grid Home. See bugs 9442360 / 5629487.
PROJECT=`/usr/bin/projects -d $ORACLE_USER`
# If no project is set use the default root project
if [ "$PROJECT" = "" ]; then
PROJECT="user.root"
fi
;;
AIX) ORACLUSTER_LIB=/opt/ORCLcluster/lib
LIBPATH=/u01/app/11.2.0/grid/lib:$ORACLUSTER_LIB:/usr/lib
LD_LIBRARY_PATH=$LIBPATH:$LD_LIBRARY_PATH
AIXTHREAD_SCOPE=S
export LIBPATH
export LD_LIBRARY_PATH
export AIXTHREAD_SCOPE
;;
*) /bin/echo "ERROR: Unknown Operating System"
exit -1
;;
esac
# enable GIPCHA consistently along with root scripts
case $PLATFORM in
Linux)
GIPCD_PASSTHROUGH=false
export GIPCD_PASSTHROUGH
;;
HP-UX)
GIPCD_PASSTHROUGH=false
export GIPCD_PASSTHROUGH
;;
SunOS)
GIPCD_PASSTHROUGH=false
export GIPCD_PASSTHROUGH
;;
AIX)
GIPCD_PASSTHROUGH=false
export GIPCD_PASSTHROUGH
;;
OSF1)
;;
esac
case $0 in
*.bin)
ORASYM=/u01/app/11.2.0/grid/bin/`basename $0 .bin`
;;
*)
ORASYM=$0.bin
;;
esac
export ORASYM
case $ORASYM in
*ocrpatch*)
if [ ! -x $ORASYM ]
then
/bin/echo "NOTE:"
/bin/echo "The ocrpatch binary is not part of the software distribution;"
/bin/echo "ocrpatch can only be obtained and used by Oracle Support."
exit -1
fi
;;
*ocssd*)
if [ "$PLATFORM" = "AIX" ]
then
UID=`id -u`
if [ $UID -eq 0 ]; # do not want to do su in SIHA
then
SU='/bin/su'
$SU $ORACLE_USER -c "/bin/sh -c 'ulimit -c unlimited; $ORASYM $@'"
exit 0
fi
fi
;;
*ohasd*)
CRSWRAPEXECE="/u01/app/11.2.0/grid/bin/crswrapexece.pl"
ENV_FILE="${ORA_CRS_HOME}/crs/install/s_crsconfig_${MY_HOST}_env.txt"
export ENV_FILE
if [ ! -f "$CRSWRAPEXECE" ]
then
$LOGMSG "$CRSWRAPEXECE script is not found"
exit 1;
fi
# we attempt to set limits here and check if return code is 0
# if not we generate an alert using clsecho
# see init.ohasd.sbs for a full rationale
#STACK_SIZE limit. The goal is to reduce thread usage across the grid
#infrastructure bottom up from the ohasd wrapper (Bug 9154152).
#Only the soft limit is set so that any process even unpriviledged can
#reincrease it up to the administrator set hard limit
ulimit -Ss 2048
if [ "$?" != "0" ]
then
$CLSECHO -p has -f crs -l -m 6021 "Ss" "2048"
fi
case $PLATFORM in
Linux)
# MEMLOCK limit is for Bug 9136459
ulimit -l unlimited
if [ "$?" != "0" ]
then
$CLSECHO -p has -f crs -l -m 6021 "l" "unlimited"
fi
ulimit -c unlimited
if [ "$?" != "0" ]
then
$CLSECHO -p has -f crs -l -m 6021 "c" "unlimited"
fi
ulimit -n 65536
if [ "$?" != "0" ]
then
$CLSECHO -p has -f crs -l -m 6021 "n" "65536"
fi
;;
*)
ulimit -c unlimited
if [ "$?" != "0" ]
then
$CLSECHO -p has -f crs -l -m 6021 "c" "unlimited"
fi
ulimit -n 65536
if [ "$?" != "0" ]
then
$CLSECHO -p has -f crs -l -m 6021 "n" "65536"
fi
;;
esac
$LOGMSG "exec $PERL /u01/app/11.2.0/grid/bin/crswrapexece.pl $ENV_FILE $ORASYM \"$@\""
exec $PERL /u01/app/11.2.0/grid/bin/crswrapexece.pl $ENV_FILE $ORASYM "$@"
# Reached here only if exec fails
/bin/echo "Failed to execute \"exec $PERL /u01/app/11.2.0/grid/bin/crswrapexece.pl $ENV_FILE $ORASYM \"$@\""
$LOGMSG "Failed to execute \"exec $PERL /u01/app/11.2.0/grid/bin/crswrapexece.pl $ENV_FILE $ORASYM \"$@\""
exit 1;
;;
*)
if [ "$PLATFORM" = "AIX" ]
then
# Prevents the setting of RT_GRQ for non-ocssd and non-cssagent processes
# RT_GRQ is turned on globally for all processes in the environment file
# generated by s_crsconfig_lib.pm during install setup, for AIX platform.
# This should prevent rdbms RT processes from inheriting this attribute
# since crsd will not have RT_GRQ set.
#
# NOTE: cssdagent and monitor does not need a special case since they
# do not use this wrapper script. So the '*)' case here does not
# apply and they *will* inherit RT_GRQ attribute, as intended
RT_GRQ=
export RT_GRQ
fi
;;
esac
# Solaris allows partitioning of resources by Projects.
# On Solaris, start crsd/ohasd using the default Project of
# the owner of the Grid Home. See bugs 9442360 / 5629487.
case $PLATFORM in
SunOS)
case $ORASYM in
*ohasd*|*crsd*)
exec /usr/bin/newtask -p $PROJECT $ORASYM "$@"
;;
*)
exec $ORASYM "$@"
;;
esac
;;
*)
exec $ORASYM "$@"
;;
esac
[root@roidb2 bin]#
小结:
1.数据库、主机的启停一定要正常步骤进行,切记直接断电。
2.搬迁之前,做好备份工作,移动安装设备要注意轻拿轻放。
本文转自 roidba 51CTO博客,原文链接:http://blog.51cto.com/roidba/2049554,如需转载请自行联系原作者