sqlplus应当是DBA 1.0时代使用最为频繁的管理工具,经常有经验丰富的老DBA会提到自己敲过几万次的sqlplus:),但有的时候这个吃饭家伙也会不好用,偶尔还会出现Segmentation fault错误,亦或者彻底hang住。在这里我介绍几种应对sqlplus无法正常使用的应对方法: 1.出现Segmentation fault,这种情况下一般是sqlplus 2进制文件被损坏了,可以通过重新build一个sqlplus来解决问题
[oracle@rh2 bin]$ sqlplus
Segmentation fault

/* 使用$ORACLE_HOME/sqlplus/lib目录下的make文件,编译一个新的sqlplus */

[oracle@rh2 ~]$ make -f $ORACLE_HOME/sqlplus/lib/ins_sqlplus.mk  newsqlplus

Linking /s01/oracle/product/11.2.0/dbhome_1/sqlplus/bin/sqlplus
rm -f /s01/oracle/product/11.2.0/dbhome_1/sqlplus/bin/sqlplus
gcc -o /s01/oracle/product/11.2.0/dbhome_1/sqlplus/bin/sqlplus -m64
-L/s01/oracle/product/11.2.0/dbhome_1/sqlplus/lib/ -L/s01/oracle/product/11.2.0/dbhome_1/lib/
-L/s01/oracle/product/11.2.0/dbhome_1/lib/stubs/  /s01/oracle/product/11.2.0/dbhome_1/sqlplus/lib/s0afimai.o
-lsqlplus -lclntsh  `cat /s01/oracle/product/11.2.0/dbhome_1/lib/ldflags`    -lncrypt11 -lnsgr11 -lnzjs11
-ln11 -lnl11 -lnro11 `cat /s01/oracle/product/11.2.0/dbhome_1/lib/ldflags`    -lncrypt11 -lnsgr11 -lnzjs11
-ln11 -lnl11 -lnnz11 -lzt11 -lztkg11 -lztkg11 -lclient11 -lnnetd11  -lvsn11 -lcommon11 -lgeneric11 -lmm
-lsnls11 -lnls11  -lcore11 -lsnls11 -lnls11 -lcore11 -lsnls11 -lnls11 -lxml11 -lcore11 -lunls11 -lsnls11 -lnls11
-lcore11 -lnls11 `cat /s01/oracle/product/11.2.0/dbhome_1/lib/ldflags`    -lncrypt11 -lnsgr11 -lnzjs11 -ln11
-lnl11 -lnro11 `cat /s01/oracle/product/11.2.0/dbhome_1/lib/ldflags`    -lncrypt11 -lnsgr11 -lnzjs11 -ln11 -lnl11
-lclient11 -lnnetd11  -lvsn11 -lcommon11 -lgeneric11   -lsnls11 -lnls11  -lcore11 -lsnls11 -lnls11 -lcore11 -lsnls11
-lnls11 -lxml11 -lcore11 -lunls11 -lsnls11 -lnls11 -lcore11 -lnls11 -lclient11 -lnnetd11  -lvsn11
-lcommon11 -lgeneric11 -lsnls11 -lnls11  -lcore11 -lsnls11 -lnls11 -lcore11 -lsnls11 -lnls11 -lxml11
-lcore11 -lunls11 -lsnls11 -lnls11 -lcore11 -lnls11   `cat /s01/oracle/product/11.2.0/dbhome_1/lib/sysliblist`
-Wl,-rpath,/s01/oracle/product/11.2.0/dbhome_1/lib -lm -lpthread   `cat /s01/oracle/product/11.2.0/dbhome_1/lib/sysliblist`
-ldl -lm -lpthread  -L/s01/oracle/product/11.2.0/dbhome_1/lib
/bin/chmod 755 /s01/oracle/product/11.2.0/dbhome_1/sqlplus/bin/sqlplus
rm -f /s01/oracle/product/11.2.0/dbhome_1/bin/sqlplus
mv -f /s01/oracle/product/11.2.0/dbhome_1/sqlplus/bin/sqlplus /s01/oracle/product/11.2.0/dbhome_1/bin/sqlplus
/bin/chmod 751 /s01/oracle/product/11.2.0/dbhome_1/bin/sqlplus
rm -f /s01/oracle/product/11.2.0/dbhome_1/sqlplus/lib/libsqlplus.so
rm -rf /s01/oracle/product/11.2.0/dbhome_1/sqlplus/bin/sqlplus

[oracle@rh2 ~]$ sqlplus  / as sysdba

SQL*Plus: Release Production on Wed May 11 21:38:21 2011

Copyright (c) 1982, 2010, Oracle.  All rights reserved.

Connected to:
Oracle Database 11g Enterprise Edition Release - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options
2.出现sqlplus之后hang住的现象,hang的原因存在多种可能: 1)instance hanging数据库实例hang住,这种情况下sqlplus无法正常登陆到正hang的实例,而登陆到其他实例是可以的;若在10g以后版本中可以使用-prelim选项登陆实例,使用该选项登陆后无法执行普通的SQL语句,但可以使用oradebug内部调试工具,通过oradebug收集必要的hanganalyze信息后,可以进一步判断hang住的原因并决定下一步的操作。
[oracle@rh2 ~]$ sqlplus  / as sysdba
.............................we suspend here!!!

[oracle@rh2 ~]$ sqlplus  -prelim / as sysdba

SQL*Plus: Release Production on Wed May 11 21:46:27 2011

Copyright (c) 1982, 2010, Oracle.  All rights reserved.

SQL> oradebug setmypid;
Statement processed.

SQL> oradebug dump hanganalyze 4;
Statement processed.

SQL> oradebug dump systemstate 266;
Statement processed.

SQL> oradebug tracefile_name

/s01/orabase/diag/rdbms/prod/PROD1/trace/PROD1_ora_23436.trc   -- where dump resides
将以上trc文件提交给Oracle Support或者资深的Oracle技术人员,以便他们分析出实例hang住的原因,通过调整参数或者修复bug可以避免再次出现类似的状况。 2)一执行sqlplus就出现挂起现象,甚至没有登陆任何数据库。一般这种情况是在读取sqlplus 2进制文件或其相关的共享库文件(.so文件)时遇到了问题,或者是在实际system call系统调用execve("sqlplus")时遇到了错误,一般我们可以使用系统跟踪工具strace(Linux)或truss(Unix)工具来分析这种挂起现象:
/* Unix */

truss -o sqlplus_hang.log sqlplus

/* Linux */

strace -o sqlplus_hang.log sqlplus

head -10 sqlplus_hang.log

execve("/s01/db_1/bin/sqlplus", ["sqlplus"], [/* 28 vars */]) = -1 ENOEXEC (Exec format error)
可以看到以上strace记录中发现了调用execve函数(execve() executes the program pointed to by filename)运行sqlplus程序时出现了ENOEXEC错误,该ENOEXEC错误代码说明我们正在执行一个格式无效的可执行文件,具体的解释如下:
This error indicates that a request has been made to execute a file which, although it has the
appropriate permissions, does not start with a valid magic number. A magic number is the first two
bytes in a file, used to determine what type of file it is.

You tried to execute a file that is not in a valid executable format. The most common format for
binary programs under linux is called ELF. Note that your shell will run ascii files that have the
executable bit set as a shell script (ie run it as shell commands).

You can reproduce this by doing
    $ dd if=/dev/random of=myfile bs=1k count=1 $ chmod +x myfile $ ./myfile zsh: exec format error: ./myfile

Note that there is a very slight possibility that you could create a valid program that does
something bad to your system!!
Note, you can have user defined ways of running programs using Linux's binfmt_misc. See
to be continued ............