ORACLE 索引并行引起的direct path read temp和latch free等待导致进程数超过最大数

简介:     2016年10月27日下午,测试同事说测试数据库连接不上了,让我们DBA查看问题并解决一下。    操作系统:Red Hat Enterprise Linux Server release 6.
    2016年10月27日下午,测试同事说测试数据库连接不上了,让我们DBA查看问题并解决一下。
   操作系统:Red Hat Enterprise Linux Server release 6.6 (Santiago)
    数据库版本:
[oracle@se31 ~]$ sqlplus / as sysdba
SQL*Plus: Release 11.2.0.4.0 Production on Fri Oct 28 08:59:04 2016
Copyright (c) 1982, 2013, Oracle.  All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
SQL> select * from v$version;
BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
PL/SQL Release 11.2.0.4.0 - Production
CORE 11.2.0.4.0 Production
TNS for Linux: Version 11.2.0.4.0 - Production
NLSRTL Version 11.2.0.4.0 - Production
   刚登陆上数据库服务器,发现服务器的IO已经被耗光:

    查看数据库告警日志,发现最大进程数告警:
Thu Oct 27 14:24:51 2016
ORA-00020: ???讴???)
 ORA-20 errors will not be written to the alert log for
 the next minute. Please look at trace files to see all
 the ORA-20 errors.
Process P198 submission failed with error = 20
Thu Oct 27 14:25:58 2016
ORA-00020: maximum number of processes (300) exceeded
 ORA-20 errors will not be written to the alert log for
 the next minute. Please look at trace files to see all
 the ORA-20 errors.
    数据库的最大进程数设置确实是300,但是测试库是针对测试用的,最大进程数设置稍微小点容易暴露项目得问题,并且平时测试没有出现过类似情况。
SQL> show parameter processes
NAME     TYPE VALUE
------------------------------------ ----------- ------------------------------
aq_tm_processes     integer 1
db_writer_processes     integer 3
gcs_server_processes     integer 0
global_txn_processes     integer 1
job_queue_processes     integer 1000
log_archive_max_processes     integer 4
processes     integer 300
SQL> 
   先排除最大进程数,继续查看数据库中的非空闲等待事件:
SQL> select event,count(*) from v$session where wait_class<>'Idle' group by event;
EVENT     COUNT(*)
---------------------------------------------------------- ----------
direct path read temp    5
latch free   23
SQL*Net message to client    2
   然后,按照等待事件查看相关会话执行的sql语句:
SQL> select sid from v$session where event='&event_name';
Enter value for event_name: latch free
old   1: select sid from v$session where event='&event_name'
new   1: select sid from v$session where event='latch free'
SID
-----------
11
26
52
56
67
113
120
139
189
244
245
265
267
291
315
326
363
402
414
423
434
447
22 rows selected.
SQL> set linesize 1000
col username for a15
col osuser for a15
select sid,wait_class_id,user#,username,lockwait,status,osuser,sql_id,PREV_SQL_ID from v$session where 
wait_class_id= 1893977003 and sid=&sid;SQL> SQL> SQL>   2  
Enter value for sid: 11
old   2: wait_class_id= 1893977003 and sid=&sid
new   2: wait_class_id= 1893977003 and sid=11
SID       WAIT_CLASS_ID      USER# USERNAME            LOCKWAIT      STATUS OSUSER          SQL_ID                  PREV_SQL_ID
----------- -------------               ---------- ---------------          ----------------     ----------   ---------------     -------------              -------------
11        1893977003          156      BMI_NANCHONG                        ACTIVE    ASP.NET v4.0 c8m5td2tmpcxg
   查看  c8m5td2tmpcxg的sql语句:
SQL> select sql_fulltext from v$sqlarea where sql_id='c8m5td2tmpcxg';

select *
  from (select row_.*, rownum NumRow
          from (select b.table_par    as tablepar,
                       a.bill_no      as billcode,
                       b.item_date    as itemdate,
                       b.item_id      as itemid,
                       b.item_name    as name,
                       b.numbers,
                       b.price,
                       b.costs,
                       b.drug_spec    as specification s,
                       b.package_unit as itemunit,
                       
                       b.reject_money     as rejectmone y,
                       b.reject_num       as rejectnumber,
                       b.reject_reson     as rejectreso n,
                       a.admission_number as admiss ionnumber,
                       b.physician_name   as doctorna me,
                       b.deptname         as deptname,
                       a.patient_name     as patientnam e,
                       d.dosage_name      as dosagename,
                       
                       nvl(d.varchar01, zs.varchar01) as packages,
                       nvl(d.manu_name, zs.manu_name) as manuname,
                       a.PATIENT_ID as BillPatientI d,
                       a.ADMISSION_DATE as BillAdmi ssonDate,
                       a.DISCHARGE_DATE as BillDisc hargeDate,
                       b.hospital_remark as hospitalremark,
                       zdr.region_name as region_name
                  from dw_billdetail b
                 inner join dw_bill a
                    on b.pid = a.hisid
                   and b.table_par = a.table_par
                  left join dw_zd_drug d
                    on b. item_id = d.item_id
                  left join dw_zd_service zs
                    on b.item_id = zs.item_id
                  left join d w_zd_region zdr
                    on zdr.region_id = a.bmi_code
                 where 1 = 1
                   AND a.andit_manu_statu
                 s IN (:paramBillStatus0,
                             :paramBillStatus1,
                             :paramBillStatus2,
                             :paramBillStatus3)
                   and b.reject_money > 0
                   and a.hospital_id = :paramHosptialID4
                   and b.table_par >= :stabPar5
                   and b.table_par <= :etabPar6
                 order by b.pid, b.item_date) row_
         where rownum <= 50)
 where NumRow > 0
    查看c8m5td2tmpcxg的子游标及执行计划情况:
SQL> select child_number,executions,buffer_gets,is_bind_sensitive BS,
is_bind_aware BA,is_shareable SH,plan_hash_value
from v$sql
where sql_id='&sql_id';  2    3    4  
Enter value for sql_id: c8m5td2tmpcxg
old   4: where sql_id='&sql_id'
new   4: where sql_id='c8m5td2tmpcxg'
CHILD_NUMBER EXECUTIONS BUFFER_GETS B B S PLAN_HASH_VALUE
------------           ----------         -----------         - - -       ---------------
  0                       18           669286643    Y N Y      3571375093
  1                        0           3013153881   Y N Y      3571375093
  2                        1                       380   Y N Y      1661143992
  3                        0                    78051   Y N Y      1661143992
  5                        0            870616787    N N Y      3571375093
   经过查看只有0、1、3能看到执行计划,
select * from table(dbms_xplan.display_cursor('c8m5td2tmpcxg',0,'typical'));

select * from table(dbms_xplan.display_cursor('c8m5td2tmpcxg',1,'typical'))

select * from table(dbms_xplan.display_cursor('c8m5td2tmpcxg',2,'typical'))

其余的子游标对应的执行计划已经查不到了:
SQL> c/2/3
  1* select * from table(dbms_xplan.display_cursor('c8m5td5tmpcxg',3,'typical'))
SQL> /
PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID: c8m5td5tmpcxg, child number: 3 cannot be found
SQL> c/3/5
  1* select * from table(dbms_xplan.display_cursor('c8m5td5tmpcxg',5,'typical'))
SQL> /
PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID: c8m5td5tmpcxg, child number: 5 cannot be found
  另外一个等待事件的会话发起的sql语句还是: c8m5td5tmpcxg
SQL> select sid from v$session where event='&event_name';
Enter value for event_name: direct path read temp
old   1: select sid from v$session where event='&event_name'
new   1: select sid from v$session where event='direct path read temp'
SID
-----------
28
113
139
215
315
326
355
414
8 rows selected.
SQL> select wait_time,wait_class_id,state from v$session_wait where sid=&sid;
Enter value for sid: 28
old   1: select wait_time,wait_class_id,state from v$session_wait where sid=&sid
new   1: select wait_time,wait_class_id,state from v$session_wait where sid=28
 WAIT_TIME WAIT_CLASS_ID STATE
---------- ------------- -------------------
1    1740759767 WAITED KNOWN TIME
SQL> set linesize 1000
col username for a15
col osuser for a15
select sid,wait_class_id,user#,username,lockwait,status,osuser,sql_id,PREV_SQL_ID from v$session where 
wait_class_id= 1740759767 and sid=&sid;SQL> SQL> SQL>   2  
Enter value for sid: 28
old   2: wait_class_id= 1740759767 and sid=&sid
new   2: wait_class_id= 1740759767 and sid=28
SID WAIT_CLASS_ID      USER# USERNAME     LOCKWAIT      STATUS OSUSER SQL_ID        PREV_SQL_ID
----------- ------------- ---------- --------------- ---------------- ---------- --------------- ------------- -------------
28    1740759767 156 BMI_NANCHONG      ACTIVE ASP.NET v4.0 c8m5td2tmpcxg
   令人奇怪的是,这条sql语句上并没有并行相关的Hints,但是其执行计划里却体现出了并行执行QN,查看方案下的表并没有发现有开启并行的表:
SQL> select table_name,degree from user_tables where table_name in('DW_BILL','DW_BILLDETAIL','DW_ZD_DRUG','DW_ZD_SERVICE','DW_ZD_REGION');
TABLE_NAME                        DEGREE
---------------------------------------- --------------------
DW_BILL                             1
DW_BILLDETAIL                  1
DW_ZD_DRUG                    1
DW_ZD_REGION                 1
DW_ZD_SERVICE                1
    经过与开发、测试同事了解,应用ASP.NET端的部署没有开启并行设置,然后查看了这几张表下的索引,果然索引下均有并行。

    看来是查询执行时,走了索引,触发了并行,耗尽了数据库的进程数,导致数据库对新连接无响应;接下来的处理措施是:查杀消耗CPU高的会话,取消掉索引的并行:
   按等待事件类型查杀会话:
SQL> select 'alter system kill session ',''''||a.sid||','||a.serial#||''';'
from v$session a  where event='latch free';  2  
'ALTERSYSTEMKILLSESSION'   ''''||A.SID||','||A.SERIAL#||''';'
-------------------------- ------------------------------------------------------------------------------------
alter system kill session  '11,85';
alter system kill session  '26,12795';
alter system kill session  '52,43';
alter system kill session  '56,3';
alter system kill session  '67,9473';
alter system kill session  '100,5';
alter system kill session  '120,37';
alter system kill session  '189,67';
alter system kill session  '244,19499';
alter system kill session  '245,9039';
alter system kill session  '265,13217';
alter system kill session  '267,237';
alter system kill session  '291,53';
alter system kill session  '363,8251';
alter system kill session  '402,39799';
alter system kill session  '423,2479';
alter system kill session  '434,1';
alter system kill session  '447,7877';
alter system kill session  '452,33';
19 rows selected.
SQL> select event,count(*) from v$session where wait_class<>'Idle' group by event;
EVENT                                                  COUNT(*)
---------------------------------------------------------- ----------
direct path read temp                              8
SQL*Net message to client                       1
SQL> select 'alter system kill session ',''''||a.sid||','||a.serial#||''';'
from v$session a  where event='&event_name';  2  
Enter value for event_name: direct path read temp
old   2: from v$session a  where event='&event_name'
new   2: from v$session a  where event='direct path read temp'
'ALTERSYSTEMKILLSESSION'   ''''||A.SID||','||A.SERIAL#||''';'
-------------------------- ------------------------------------------------------------------------------------
alter system kill session  '28,5109';
alter system kill session  '113,3';
alter system kill session  '139,15';
alter system kill session  '215,3';
alter system kill session  '315,5';
alter system kill session  '326,6017';
alter system kill session  '355,33';
alter system kill session  '414,3';
8 rows selected.
    杀完 latch free、direct path read temp等待会话,数据库服务器的CPU恢复正常:

   接下来取消索引的并行:

   由于修改sql相关对象的并行度,相当于其相关对象发生了DDL变更,会引起sql引擎对在相关对象上执行的sql语句重新解析,做完索引的并行度取消,
sql语句恢复正常,服务器没有发生并行使用过多的进程的情况:
    c8m5td2tmpcxg果然被重新硬解析,原来的5个子游标变成了现在的3个:
   SQL> select child_number,executions,buffer_gets,is_bind_sensitive BS,
is_bind_aware BA,is_shareable SH,plan_hash_value
from v$sql
where sql_id='&sql_id';  2    3    4  
Enter value for sql_id: c8m5td2tmpcxg
old   4: where sql_id='&sql_id'
new   4: where sql_id='c8m5td2tmpcxg'
CHILD_NUMBER EXECUTIONS BUFFER_GETS B B S PLAN_HASH_VALUE
------------ ---------- ----------- - - - ---------------
  1      0  3013153881 Y N Y      3571375093
  2      6      238503 Y N Y      3286212919
  3      0       78051 Y N Y      1661143992
   sql语句的执行计划恢复正常,没有了并行:

    如果你有不同见解或更好的解决思路,欢迎留言讨论!




目录
相关文章
|
2月前
|
Linux Shell
【Linux】详解进程终止&&进程等待
【Linux】详解进程终止&&进程等待
|
2月前
|
Linux
【Linux】—— 进程等待 wait&&waitpid
【Linux】—— 进程等待 wait&&waitpid
【Linux】—— 进程等待 wait&&waitpid
|
2月前
|
Linux
进程等待(wait和wait函数)【Linux】
进程等待(wait和wait函数)【Linux】
|
4月前
|
存储 Linux C语言
Linux进程等待
Linux进程等待
|
4月前
|
调度
多进程并行执行
多进程并行执行
35 0
|
5月前
|
存储 Linux
Linux中的进程等待(超详细)
Linux中的进程等待(超详细)
131 1
|
6月前
|
机器学习/深度学习 缓存 Java
Python 线程,进程,多线程,多进程以及并行执行for循环笔记
Python 线程,进程,多线程,多进程以及并行执行for循环笔记
442 0
Python 线程,进程,多线程,多进程以及并行执行for循环笔记
|
4天前
|
存储 Linux Shell
Linux:进程等待 & 进程替换
Linux:进程等待 & 进程替换
29 9
|
20天前
|
存储 Linux
【linux进程控制(二)】进程等待--父进程是如何等待子进程死亡的?
【linux进程控制(二)】进程等待--父进程是如何等待子进程死亡的?