[20170419]ora-28547.txt
--//测试环境,莫名奇妙使用toad登陆出现如下错误(使用ezconnect方式)。
ORA-28547: connection to server failed, probable Oracle Net admin error
$ oerr ora 28547
28547, 0000, "connection to server failed, probable Oracle Net admin error"
// *Cause: A failure occurred during initialization of a network connection
// from a client process to the Oracle server. The connection
// was completed but a disconnect occurred while trying to
// perform protocol-specific initialization, usually due to
// use of different network protocols by opposite sides
// of the connection. This usually is caused by incorrect
// Oracle Net administrative setup for database links or external
// procedure calls. The most frequent specific causes are:
// -- The connection uses a connect string which refers
// to a Heterogeneous Services agent instead of
// an Oracle server.
// -- The connection uses a connect string which includes
// an (HS=) specification.
// -- Hitting Ctrl+C before connection initialization is
// complete.
// -- Using DRCP on Windows with SQLNET.AUTHENTICATION_SERVICES
// set to NTS.
// -- Connection timeout due to SQLNET.INBOUND_CONNECT_TIMEOUT
// parameter when all shared servers are busy in
// shared server configuration.
// *Action: Check Oracle Net administration in the following ways:
// -- When using TNSNAMES.ORA or an Oracle Names server, make sure
// that the client connection to the Oracle server uses
// the correct service name or SID.
// -- Check LISTENER.ORA on the connection end point's host machine
// to assure that this service name or SID refers
// to the correct server.
// -- Confirm in TNSNAMES.ORA or the equivalent service definition
// that the connect string does not contain (HS=).
// -- Set SQLNET.AUTHENTICATION_SERVICES to NONE when using DRCP
// on Windows.
//
--//我改用sqlplus测试也一样。
sqlplus scott/book@192.168.100.78:1521/book
ERROR:
ORA-28547: connection to server failed, probable Oracle Net admin error
--//而改用tnsnames.ora 登陆,没有问题。重启监听问题依旧。
sqlplus scott/book@78
--//最后我选择重启数据库看看,在shutdown immediate时,我注意alert文件出现如下信息时:
Waiting for dispatcher 'D000' to shutdown
--//停留了很久,才想起来可能共享服务相关的进程可能出现问题,因为我的配置参数里面:
SYS@book> show parameter dispatchers
NAME TYPE VALUE
---------------- ------- --------------------------------------
dispatchers string (PROTOCOL=TCP) (SERVICE=book,bookXDB)
max_dispatchers integer
--//这样如果使用ezconnect连接模式,实际上使用优先使用共享服务模式。链接:http://blog.itpub.net/267265/viewspace-2124172/
$ rlsql scott/book@127.0.0.1:1521/book
SCOTT@127.0.0.1:1521/book> select saddr,sid,paddr,status,server from v$session where sid in (select sid from v$mystat);
SADDR SID PADDR STATUS SERVER
---------------- ---------- ---------------- -------- ---------
00000000854B75C0 222 000000008591A280 ACTIVE SHARED
SCOTT@127.0.0.1:1521/book> @ &r/spid
SID SERIAL# SPID PID P_SERIAL# C50
---------- ---------- ------ ------- ---------- --------------------------------------------------
222 47 37225 20 1 alter system kill session '222,47' immediate;
--//当然因为重启数据库,现在已经正常使用,如何模拟这个问题呢?我自己做了一个测试。
1.环境:
SCOTT@127.0.0.1:1521/book> @ &r/ver1
PORT_STRING VERSION BANNER
------------------------------ -------------- --------------------------------------------------------------------------------
x86_64/Linux 2.4.xx 11.2.0.4.0 Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
$ ps -ef | grep ora_[sd]000
oracle 37223 1 0 Apr18 ? 00:00:00 ora_d000_book
oracle 37225 1 0 Apr18 ? 00:00:02 ora_s000_book
--//共享服务器使用进程ora_d000_book,然后再紧系ora_s000_book进程,注意看前面的spid=37225.通过lsnrctl services也可以看出来。
$ lsnrctl services
LSNRCTL for Linux: Version 11.2.0.4.0 - Production on 19-APR-2017 08:44:18
Copyright (c) 1991, 2013, Oracle. All rights reserved.
Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=127.0.0.1)(PORT=1521)))
Services Summary...
Service "book" has 2 instance(s).
Instance "book", status UNKNOWN, has 1 handler(s) for this service...
Handler(s):
"DEDICATED" established:0 refused:0
LOCAL SERVER
Instance "book", status READY, has 2 handler(s) for this service...
Handler(s):
"DEDICATED" established:7 refused:0 state:ready
LOCAL SERVER
"D000" established:12 refused:0 current:1 max:1022 state:ready
DISPATCHER <machine: gxqyydg4, pid: 37223>
(ADDRESS=(PROTOCOL=tcp)(HOST=gxqyydg4)(PORT=47060))
Service "book123" has 1 instance(s).
Instance "book", status UNKNOWN, has 1 handler(s) for this service...
Handler(s):
"DEDICATED" established:0 refused:0
LOCAL SERVER
Service "bookXDB" has 1 instance(s).
Instance "book", status READY, has 1 handler(s) for this service...
Handler(s):
"D000" established:12 refused:0 current:1 max:1022 state:ready
DISPATCHER <machine: gxqyydg4, pid: 37223>
(ADDRESS=(PROTOCOL=tcp)(HOST=gxqyydg4)(PORT=47060))
The command completed successfully
2.先停止进程ora_s000_book(进程号37225)看看。
$ kill -l
1) SIGHUP 2) SIGINT 3) SIGQUIT 4) SIGILL
5) SIGTRAP 6) SIGABRT 7) SIGBUS 8) SIGFPE
9) SIGKILL 10) SIGUSR1 11) SIGSEGV 12) SIGUSR2
13) SIGPIPE 14) SIGALRM 15) SIGTERM 16) SIGSTKFLT
17) SIGCHLD 18) SIGCONT 19) SIGSTOP 20) SIGTSTP
21) SIGTTIN 22) SIGTTOU 23) SIGURG 24) SIGXCPU
25) SIGXFSZ 26) SIGVTALRM 27) SIGPROF 28) SIGWINCH
29) SIGIO 30) SIGPWR 31) SIGSYS 34) SIGRTMIN
35) SIGRTMIN+1 36) SIGRTMIN+2 37) SIGRTMIN+3 38) SIGRTMIN+4
39) SIGRTMIN+5 40) SIGRTMIN+6 41) SIGRTMIN+7 42) SIGRTMIN+8
43) SIGRTMIN+9 44) SIGRTMIN+10 45) SIGRTMIN+11 46) SIGRTMIN+12
47) SIGRTMIN+13 48) SIGRTMIN+14 49) SIGRTMIN+15 50) SIGRTMAX-14
51) SIGRTMAX-13 52) SIGRTMAX-12 53) SIGRTMAX-11 54) SIGRTMAX-10
55) SIGRTMAX-9 56) SIGRTMAX-8 57) SIGRTMAX-7 58) SIGRTMAX-6
59) SIGRTMAX-5 60) SIGRTMAX-4 61) SIGRTMAX-3 62) SIGRTMAX-2
63) SIGRTMAX-1 64) SIGRTMAX
--//linux SIGSTOP(19) 是给进程发stop信号。 SIGCONT(18) 是继续运行,注好像其他OS的信号数字不一样,使用kill -l 参考就明白了。
$ kill -19 37225
$ rlsql scott/book@127.0.0.1:1521/book
SQL*Plus: Release 11.2.0.4.0 Production on Wed Apr 19 08:47:52 2017
Copyright (c) 1982, 2013, Oracle. All rights reserved.
ERROR:
ORA-28547: connection to server failed, probable Oracle Net admin error
--//OK,现在模拟出来问题了,至于为什么hang住,已经无法猜测。这个时候如果看监听日志:
$ tail -f listener.log
19-APR-2017 08:52:17 * (CONNECT_DATA=(SERVICE_NAME=book)(CID=(PROGRAM=sqlplus)(HOST=xxxx)(USER=oracle))) * (ADDRESS=(PROTOCOL=tcp)(HOST=127.0.0.1)(PORT=7512)) * establish * book * 0
--//这也是我当时感觉奇怪的地方,无法定位问题。
$ kill -18 37225
--//再登陆就ok了。
--//这也是共享服务器模式的弊端,参考链接:http://blog.itpub.net/267265/viewspace-2124172/
3.继续测试,kill 进程ora_d000_book呢?
$ kill -9 37225
$ rlsql scott/book@127.0.0.1:1521/book
SQL*Plus: Release 11.2.0.4.0 Production on Wed Apr 19 08:56:17 2017
Copyright (c) 1982, 2013, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
SCOTT@127.0.0.1:1521/book> select saddr,sid,paddr,status,server from v$session where sid in (select sid from v$mystat);
SADDR SID PADDR STATUS SERVER
---------------- ---------- ---------------- -------- ---------
00000000854BA6A0 221 000000008591A280 ACTIVE SHARED
--//要等上1小会,大约40秒后进入。
SCOTT@127.0.0.1:1521/book> @ &r/spid
SID SERIAL# SPID PID P_SERIAL# C50
---------- ---------- ------ ------- ---------- --------------------------------------------------
221 17 41328 20 2 alter system kill session '221,17' immediate;
$ ps -ef | grep ora_[sd]000
oracle 37223 1 0 Apr18 ? 00:00:00 ora_d000_book
oracle 41328 1 0 08:56 ? 00:00:00 ora_s000_book
--//可以发现生成新的ora_s000_book进程,进程号=41328.你还可以发现一个细节PADDR与前面一样的(kill -9 前)
4.如果先停止进程ora_d000_book(进程号37223)看看。
$ rlsql scott/book@127.0.0.1:1521/book
SQL*Plus: Release 11.2.0.4.0 Production on Wed Apr 19 09:02:39 2017
Copyright (c) 1982, 2013, Oracle. All rights reserved.
--等2,3分钟依旧没有反映。
$ kill -18 37223
--//马上出现提示,而且监听log才出现
19-APR-2017 09:05:59 * (CONNECT_DATA=(SERVICE_NAME=book)(CID=(PROGRAM=sqlplus)(HOST=gxqyydg4)(USER=oracle))) * (ADDRESS=(PROTOCOL=tcp)(HOST=127.0.0.1)(PORT=7678)) * establish * book * 0
19-APR-2017 09:06:01 * service_update * book * 0
--//这也说明sNNN相关进程hang住了。
SCOTT@127.0.0.1:1521/book> select saddr,sid,paddr,status,server from v$session where sid in (select sid from v$mystat);
SADDR SID PADDR STATUS SERVER
---------------- ---------- ---------------- -------- ---------
00000000854BA6A0 221 000000008591A280 ACTIVE SHARED
5.继续测试kill -9 ora_d000_book的情况:
$ ps -ef | grep ora_[sd]000
oracle 37223 1 0 Apr18 ? 00:00:00 ora_d000_book
oracle 41328 1 0 08:56 ? 00:00:00 ora_s000_book
$ rlsql scott/book@127.0.0.1:1521/book
SQL*Plus: Release 11.2.0.4.0 Production on Wed Apr 19 09:09:18 2017
Copyright (c) 1982, 2013, Oracle. All rights reserved.
ERROR:
ORA-12541: TNS:no listener
--等上一小会,就ok了。
$ ps -ef | grep ora_[sd]000
oracle 41328 1 0 08:56 ? 00:00:00 ora_s000_book
oracle 41424 1 0 09:09 ? 00:00:00 ora_d000_book
--//另外alert里面也有相关记录:
Wed Apr 19 08:56:43 2017
found dead shared server 'S000', pid = (20, 1)
Wed Apr 19 09:03:18 2017
Incremental checkpoint up to RBA [0x2c2.10a1f.0], current log tail at RBA [0x2c2.11415.0]
Wed Apr 19 09:09:19 2017
found dead dispatcher 'D000', pid = (19, 1)
--//oracle后台也在监测,如果发现这些进程被kill,自动建立新的进程,当然需要等1小会....
--//从以上测试看,最怕进程阻塞,导致后续的语句以及登陆出现问题。
--//从另外一个方面说明,如果使用共享模式登陆,要密切注意相关进程的状态。max_dispatchers参数以及sql语句优化。