概要说明
WARNING: inbound connection timed out (ORA-3136)
这个错误我以前一直没有遇到过,今天早上客户在MSN上,他的一台Oracle Server很忙,alert中频繁的出现这个错误提示,导致连接无法成功。下面是alert 文件的错误提示:
......
Wed Feb 27 09:03:02 2008 Completed checkpoint up to RBA [0x184d.2.10], SCN: 1203810646 Wed Feb 27 09:03:27 2008 WARNING: inbound connection timed out (ORA-3136) Wed Feb 27 09:03:27 2008 WARNING: inbound connection timed out (ORA-3136) Wed Feb 27 09:03:27 2008 WARNING: inbound connection timed out (ORA-3136) Wed Feb 27 09:03:27 2008 WARNING: inbound connection timed out (ORA-3136) Wed Feb 27 09:03:27 2008 WARNING: inbound connection timed out (ORA-3136) Wed Feb 27 09:03:27 2008 WARNING: inbound connection timed out (ORA-3136) Wed Feb 27 09:03:27 2008 WARNING: inbound connection timed out (ORA-3136) Wed Feb 27 09:03:27 2008 WARNING: inbound connection timed out (ORA-3136) Wed Feb 27 09:04:30 2008 Incremental checkpoint up to RBA [0x184d.e5a6.0], current log tail at RBA [0x184d.43aaa.0] Wed Feb 27 09:05:02 2008
......
这个WARNING我以前没接触过,但从字面上可以看到应该是connect time out,以前常见的是ORA-12170。所以很纳闷。
初步分析
1)检查listener.ora和sqlnet.ora的参数设置,未发现其他异常
$ cat listener.ora ################ # Filename......: listener.ora # Name..........: # Date..........: ################ ADMIN_RESTRICTIONS_LISTENER = on LISTENER = (ADDRESS_LIST = (ADDRESS = (PROTOCOL = IPC) (KEY = HS5.WORLD) ) (ADDRESS= (PROTOCOL = IPC) (KEY = HS5) ) (ADDRESS = (COMMUNITY = SAP.WORLD) (PROTOCOL = TCP) (HOST = GVSHS5DB) (PORT = 1527) ) ) STARTUP_WAIT_TIME_LISTENER = 0 CONNECT_TIMEOUT_LISTENER = 10 TRACE_LEVEL_LISTENER = OFF SID_LIST_LISTENER = (SID_LIST = (SID_DESC = (SID_NAME = HS5) (ORACLE_HOME = /oracle/HS5/102_64) ) ) $ cat sqlnet.ora ################ # Filename......: sqlnet.ora ################ AUTOMATIC_IPC = ON TRACE_LEVEL_CLIENT = OFF NAMES.DEFAULT_DOMAIN = WORLD # 05.01.06 unsorported parameter now #NAME.DEFAULT_ZONE = WORLD # 05.01.06 set the default to 10 SQLNET.EXPIRE_TIME = 10 # 05.01.06 set to default #TCP.NODELAY=YES # 05.01.06 set to 32768 DEFAULT_SDU_SIZE=32768 $
2)通过topas、vmstat可以看到当前系统的负载很高,cpu基本上是100%(略)
所以初步可以断定是因为系统负载过重导致连接timeout。
进一步分析
因为这个WARNING我以前没见过,所以就直接查阅了Oracle 相关资料。原来这是10gR2上新加的一个属性,可以通过SQLNET.INBOUND_CONNECT_TIMEOUT来设置,默认情况下是60秒。
导致这个WARNING出现的主要原因可能是:
1)Server gets a connection request from a malicious(恶意) client which is not supposed to connect to the database , in which case the error thrown is the correct behavior. You can get the client address for which the error was thrown via sqlnet log file.
2)The server receives a valid client connection request but the client takes a long time to authenticate more than the default 60 seconds.
3)The DB server is heavily loaded due to which it cannot finish the client logon within the timeout specified.
那么如何定位导致这个WARNING出现的呢?
The default value of 60 seconds is good enough in most conditions for the database server to authenticate a client connection. If its taking longer period, then its worth checking all the below points before going for the workadound:
1. Check whether local connection on the database server is sucessful & quick.
2. If local connections are quick ,then check for underlying network delay with the help of your network administrator.
3. Check whether your Database performance has degraded by anyway.
4. Check alert log for any critical errors for eg, ORA-600 or ORA-7445 and get them resolved first.
These critical errors might have triggered the slowness of the database server.
As a workaround to avoid only this warning messages, you can set the parameters SQLNET.INBOUND_CONNECT_TIMEOUT
and INBOUND_CONNECT_TIMEOUT_ to the value more than 60.
For e.g 120. So that the client will have more time to provide the authentication information to the database. You may have to further tune these parameter values according to your setup.
To set these parameter
1. In server side sqlnet.ora file add SQLNET.INBOUND_CONNECT_TIMEOUT
For e.g
SQLNET.INBOUND_CONNECT_TIMEOUT = 120
2. In listener.ora file - INBOUND_CONNECT_TIMEOUT_ = 110
For e.g if the listener name is LISTENER then -
INBOUND_CONNECT_TIMEOUT_LISTENER = 110
Note:From Oracle version 10.2.0.3 onwards the default value of INBOUND_CONNECT_TIMEOUT_ is 60 seconds. For previous releases it is zero by default.
How to check whether inbound timout is active for the listener and database server
For eg. INBOUND_CONNECT_TIMEOUT_ =4
You can check whether the parameter is active or not by simply doing telnet to the listener port.
$ telnet
for eg.
$ telnet 192.168.12.13 1521
The telnet session should disconnect after 4 seconds which indicates that the inbound connection timeout for the listener is active.
To check whether database server sqlnet.inbound_connect_timeout is active:
Eg. sqlnet.inbound_connect_timeout =5
a. For Dedicated server setup, enable the support level sqlnet server tracing will show the timeout value as below:
niotns: Enabling CTO, value=5000 (milliseconds) <== 5 seconds niotns: Not enabling dead connection detection. niotns: listener bequeathed shadow coming to life...
b. For shared Server setup,
$ telnet
For eg.
$ telnet 192.168.12.13 51658
The telnet session should disconnect after 5 seconds which indicates that the sqlnet.inbound_connection_timeout is active.