应用程序以及客户端工具(Toad、PL/SQL Developer等)出现突然连接不上数据库服务器的情况,监听日志listener.log里面出现了TSN-12518与TSN-12540错误,如下所示,
27-JAN-2015 10:10:19 * (CONNECT_DATA=(SERVICE_NAME=scm2)(CID=(PROGRAM=c:\windows\system32\inetsrv\w3wp.exe)(HOST=POS)(USER=SYSTEM)))
* (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.xxx.xxx)(PORT=1667)) * establish * scm2 * 12518
TNS-12518: TNS:listener could not hand off client connection
TNS-12540: TNS:internal limit restriction exceeded
27-JAN-2015 10:10:19 * (CONNECT_DATA=(SERVICE_NAME=scm2)(CID=(PROGRAM=c:\windows\system32\inetsrv\w3wp.exe)(HOST=CEGWEB1)(USER=NETWO
RK?SERVICE))) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.xxx.xxx)(PORT=1284)) * establish * scm2 * 12518
TNS-12518: TNS:listener could not hand off client connection
TNS-12540: TNS:internal limit restriction exceeded
经过这几天的努力,终于解决了这个问题,所以修改、补充了这篇先前的文章,希望能对其他人有所借鉴。下面整理了一下解决问题的步骤。如下所示:
1:首先查看监听服务的进程是否正常,如下所示,监听服务完全正常
ps -ef | grep tnslsnr
oraescm 6669 5979 0 14:03 pts/5 00:00:00 grep tnslsnr
oraescm 29877 1 0 08:12 ? 00:00:11 /u01/app/oracle/product/10.2.0/db_1/bin/tnslsnr LISTENER -inherit
2:检查查看监听服务的状态(lsnrctl status)
3: 检查lsnrctl services后, 发现一个奇怪的情况Dispatcher没有拒绝监听的记录,如下所示
4:检查Dispatcher进程的是否繁忙,结果查询时一直查询不出结果,立即取消了当前查询
像平时查询时(如下所示)的速度是非常快的。如果”busy rate%“比例超过50%一般建议加SHARED SERVER OR DISPATCHER
SQL> select name,(busy/(busy+idle))*100 "busy rate%" from v$dispatcher;
NAME busy rate%
---- ----------
D000 .194393707
D001 .124595175
D002 .102238233
D003 .097416229
D004 .140182022
D005 .114123279
D006 .137454634
D007 .148511476
D008 .148901287
D009 .111103697
10 rows selected.
5:检查会话、进程数量。发现当前会话数为483,历史最大值为720. sessions最大值为1000 ,processes参数最大值为870, 应该可以排除processes与session参数值设置过小导致。
select * from v$resource_limit;
select count(1) from v$session;
SELECT SESSIONS_MAX,SESSIONS_WARNING,SESSIONS_CURRENT,SESSIONS_HIGHWATER
FROM v$license;
select count(1) from v$process;
select count(1) from v$session;
select busy/(busy+idle) from v$shared_server;
select name,busy/(busy+idle)*100 "Dispatcher Busy Rate" from v$dispatcher;
6:检查服务器CPU、RAM资源。如下所示,CPU利用率非常低。检查内存,发现物理内存只剩下204M大小,但是Swap还有13729M剩余。另外告警日志里面并没有发现任何错误信息。
在官方文档查看TSN-12540错误信息,如下所示:
TNS-12540: TNS:internal limit restriction exceeded
Cause: Too many TNS connections open simultaneously.
Action: Wait for connections to close and re-try.
http://docs.oracle.com/cd/B28359_01/network.111/b28316/troublestng.htm#CEGDGHBJ
TNS-12540/ORA-12540: TNS:internal limit restriction exceeded and TNS-00510: Internal limit restriction exceeded
Cause: An internal limit has been exceeded. Possible limits include:
Number of open connection that Oracle Net can process simultaneously
Number of memory buffers which can be used simultaneously
Number of processes a particular database instance is allowed
The first two are examples of hard limits. The third is an example of a limit which can be increased by setting PROCESSES parameter in the database initialization file to a larger value. In this case, a TNS-12500/ORA-12500 error is also returned. In some cases, these errors can be caused by the same conditions which cause TNS-12549/ORA-12549 and TNS-00519 errors.
Action: Perform these steps:
Wait for the open connections to close and retry. If the error persists, then check the sqlnet.log or listener.log file for detailed error stack information.
ORA-12540: TNS:internal limit restriction exceeded
Cause: Too many TNS connections open simultaneously.
Action: Wait for connections to close and re-try.
This error is not an indication of a network problem unless it occurs on all connections (in other words you have never been able to connect). If so it is a .ora configuration problem.
If you can make connections up to a certain point but then fails it is an indication of a resource limitation at the os level (this sounds like your scenario)
Typical problems are:
Out of system memory / swap
Out of process slots in the process table
Streams resources depleted
Physical connections allows by the kernal.
Out of File Handles
但是如下所示,检查了Linux系统的一些内核参数设置,也没有发现有设置不当的地方
[orxxm@xxxx ~]$ ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 193217
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 65536
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 4194302
cpu time (seconds, -t) unlimited
max user processes (-u) 16384
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
check to count the number of existing socket handles on your db server (as root):
[oraescm@eellnx18 bdump]$ lsof -i 4 -a | wc -l
473
[oraescm@eellnx18 bdump]$ cat /proc/sys/fs/file-max
2414114
最后,我们决定增加Dispatcher的数量,从8改为10,尝试解决问题,结果问题不再重现。问题解决了。
SQL>alter system set dispatchers='(protocol=TCP)(disp=10)(serv=SCM2)' scope=both;
SQL> show parameter dispatchers
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
dispatchers string (protocol=TCP)(disp=10)(serv=S
CM2)
max_dispatchers integer
SQL>
205-02-03 添加:
几天过去了,悲催的是今天下午又出现了TSN-12540错误,纠结啊,这个问题已经把我折腾惨了!