原创 转载请注明出处
2010年6月13日 POMS生产故障分析及解决方法
一、问题描述和分析
2010年6月13日早上9点左右我们发现了生产库出现性能问题,随即我们查看其服务器性能发现CPU,I/O,内存利用率都基本正常。随后我们进行了初步分析,找到了几条问题SQL,但总结起来主要是如下两条:
1、
SELECT COUNT(CPL.CHANGE_ID)
FROM T_PRODUCT_FEE PF, T_POLICY_CHANGE PC, T_CONTRACT_PRODUCT_LOG CPL
WHERE PF.POLICY_ID = PC.POLICY_ID
AND PC.SERVICE_ID = 213
AND PC.CHANGE_STATUS = 3
AND PC.CHANGE_ID = CPL.CHANGE_ID
AND PF.PRODUCT_NUM = CPL.PRODUCT_NUM
AND PF.PRODUCT_ID = CPL.PRODUCT_ID
AND PF.POLICY_ID = CPL.POLICY_ID
AND PF.LIST_ID = :B1
2、
SELECT DECODE(SUM(PF.PERIOD_PREM), NULL, 0, SUM(PF.PERIOD_PREM))
FROM T_PRODUCT_FEE PF, T_CONTRACT_MASTER CM, T_PRODUCT_LIFE PL
WHERE PF.POLICY_ID = CM.POLICY_ID
AND CM.VALIDATE_DATE
AND CM.VALIDATE_DATE >= ADD_MONTHS(TO_DATE(:B1, 'yyyymm'), -26)
AND PF.CHARGE_TYPE = 1
AND PF.PRODUCT_ID = PL.PRODUCT_ID
AND (PL.PERIOD_TYPE 2 OR EXISTS
(SELECT 1
FROM T_LIFE_BASIC CC
WHERE CC.COVERAGE_YEAR 1
AND CC.PRODUCT_ID = PL.PRODUCT_ID))
AND PF.FEE_TYPE IN (41, 47, 48)
AND PF.POLICY_YEAR = 1
AND PKG_LIFE_CALC_BASIC_COMM.F_IS_EFFECT(PF.LIST_ID) = '0'
AND CM.PREM_STATUS NOT IN (2, 3, 4)
AND CM.NB_AGENCY_ID IN
(SELECT AGENCY_ID
FROM T_AGENCY
START WITH AGENCY_ID = :B2
CONNECT BY PRIOR AGENCY_ID = PARENT_ID);
经过了和开发人员的仔细确认,确定这两个语句有一定的关联性,语句2正是通过方法
PKG_LIFE_CALC_BASIC_COMM.F_IS_EFFECT来调用的语句1,带入真实值后语句2的执行时间为15秒左右,如果多次调用势必会引起数据库性能问题,而事实就是如此,仔细分析语句2,语句2的执行计划如下:
Plan hash value: 1942344324
----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
----------------------------------------------------------------------------------------------------
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:09.91 | 1301K|
|* 2 | FILTER | | 1 | | 0 |00:00:09.91 | 1301K|
|* 3 | HASH JOIN SEMI | | 1 | 1 | 0 |00:00:09.91 | 1301K|
| 4 | NESTED LOOPS | | 1 | 1 | 79 |00:00:10.64 | 1301K|
| 5 | NESTED LOOPS | | 1 | 1 | 79 |00:00:10.64 | 1301K|
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
| 6 | NESTED LOOPS | | 1 | 5 | 85 |00:00:10.86 | 1300K|
|* 7 | TABLE ACCESS FULL | T_PRODUCT_FEE | 1 | 22 | 19202 |00:00:09.35 | 1262K|
|* 8 | TABLE ACCESS BY INDEX ROWID| T_CONTRACT_MASTER | 19202 | 1 | 85 |00:00:00.42 |
|* 9 | INDEX UNIQUE SCAN | PK_T_CONTRACT_MASTER | 19202 | 1 | 19202 |00:00:00.18 | 19204 |
|* 10 | TABLE ACCESS BY INDEX ROWID | T_POLICY_FEE | 85 | 1 | 79 |00:00:00.01 | 25
|* 11 | INDEX UNIQUE SCAN | PK_T_POLICY_FEE | 85 | 1 | 85 |00:00:00.01 | 172 |
| 12 | TABLE ACCESS BY INDEX ROWID | T_PRODUCT_LIFE | 79 | 1 | 79 |00:00:00.01 | 81 |
|* 13 | INDEX UNIQUE SCAN | PK_T_PRODUCT_LIFE | 79 | 1 | 79 |00:00:00.01 | 2 |
| 14 | VIEW | VW_NSO_1 | 1 | 58 | 54 |00:00:00.01 | 23 |
|* 15 | FILTER | | 1 | | 54 |00:00:00.01 | 23 |
|* 16 | CONNECT BY WITH FILTERING | | 1 | | 54 |00:00:00.01 | 23 |
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
| 17 | TABLE ACCESS BY INDEX ROWID| T_AGENCY | 1 | | 1 |00:00:00.01 | 2 |
|* 18 | INDEX UNIQUE SCAN | PK_T_AGENCY | 1 | 1 | 1 |00:00:00.01 | 1 |
|* 19 | HASH JOIN | | 3 | | 53 |00:00:00.01 | 21 |
| 20 | CONNECT BY PUMP | | 3 | | 54 |00:00:00.01 | 0 |
| 21 | TABLE ACCESS FULL | T_AGENCY | 3 | 58 | 177 |00:00:00.01 | 21 |
| 22 | TABLE ACCESS FULL | T_AGENCY | 0 | 58 | 0 |00:00:00.01 | 0 |
|* 23 | INDEX RANGE SCAN | UNI_LIFE_BASIC__PRODUCT_INFO | 0 | 39 | 0 |00:00:00.01 | 0 |
----------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
2 - filter((TO_NUMBER("PL"."PERIOD_TYPE")2 OR IS NOT NULL))
3 - access("CM"."NB_AGENCY_ID"="$nso_col_1")
7 - filter(("PF"."POLICY_YEAR"=2 AND INTERNAL_FUNCTION("PF"."FEE_TYPE") AND TO_NUMBER("PF"."CHARG
"PKG_LIFE_CALC_BASIC_COMM"."F_IS_EFFECT"("PF"."LIST_ID")=0))
8 - filter(("CM"."NB_AGENCY_ID" IS NOT NULL AND "CM"."VALIDATE_DATE"
hh24:mi:ss') AND "CM"."PREM_STATUS"2 AND "CM"."VALIDATE_DATE">=TO_DATE('2009-03-01 00:00:00
ADD_MONTHS(INTERNAL_FUNCTION("CM"."VALIDATE_DATE"),12)
"CM"."PREM_STATUS"3 AND "CM"."PREM_STATUS"4))
9 - access("PF"."POLICY_ID"="CM"."POLICY_ID")
10 - filter((TO_NUMBER("PFE"."RECEIV_STATUS")=2 AND "PFE"."RECEIVED_TIME"
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
hh24:mi:ss') AND "PFE"."RECEIVED_TIME">=ADD_MONTHS(INTERNAL_FUNCTION("CM"."VALIDATE_DATE"),12
11 - access("PF"."FEE_ID"="PFE"."FEE_ID")
13 - access("PF"."PRODUCT_ID"="PL"."PRODUCT_ID")
15 - filter((TO_DATE('2009-04-01 00:00:00', 'yyyy-mm-dd hh24:mi:ss')>TO_DATE('2009-03-01 00:00:00'
hh24:mi:ss') AND TO_DATE('2009-03-01 00:00:00', 'yyyy-mm-dd hh24:mi:ss')
hh24:mi:ss')))
16 - filter("AGENCY_ID"=121)
18 - access("AGENCY_ID"=121)
19 - access("PARENT_ID"=NULL)
23 - access("CC"."PRODUCT_ID"=:B1)
filter("CC"."COVERAGE_YEAR"1)
在进行第7步的时候花费了大量的时间,原因是,执行计划把PKG_LIFE_CALC_BASIC_COMM.F_IS_EFFECT这个筛选条件提到了语句的第一步执行,
这个时候返回的值很多,而每个返回的值都会去调用PKG_LIFE_CALC_BASIC_COMM.F_IS_EFFECT进行处理,所以语句1就会出现大量的重复操作,假设我这个时候返回了3W条那PKG_LIFE_CALC_BASIC_COMM.F_IS_EFFECT就会调用3W次,而语句1就会出现3W次,所以我们的问题转移到如何改变执行计划在最后一步来进行PKG_LIFE_CALC_BASIC_COMM.F_IS_EFFECT的处理。而这种等待既不会导致大量CPU计算,也不会导致大量I/O,因为进行的所有读均为逻辑读,也就是从内存进行读写,而ORACLE SGA的值是恒定的2.5G左右,所以在最初看来,服务器性能视乎还是比较正常的。处于这些原因,我们开始考虑使用视图的方式来进行改写。
经过初步改写语句执行计划如下:
----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
----------------------------------------------------------------------------------------------------
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:09.32 | 1298K|
|* 2 | FILTER | | 1 | | 0 |00:00:09.32 | 1298K|
|* 3 | HASH JOIN SEMI | | 1 | 1 | 0 |00:00:09.32 | 1298K|
| 4 | NESTED LOOPS | | 1 | 1 | 79 |00:00:10.03 | 1298K|
| 5 | NESTED LOOPS | | 1 | 1 | 79 |00:00:10.03 | 1298K|
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
| 6 | NESTED LOOPS | | 1 | 5 | 85 |00:00:10.24 | 1297K|
|* 7 | TABLE ACCESS FULL | T_PRODUCT_FEE | 1 | 22 | 19155 |00:00:09.14 | 1259K|
|* 8 | TABLE ACCESS BY INDEX ROWID| T_CONTRACT_MASTER | 19155 | 1 | 85 |00:00:00.41 |
|* 9 | INDEX UNIQUE SCAN | PK_T_CONTRACT_MASTER | 19155 | 1 | 19155 |00:00:00.19 | 19157 |
|* 10 | TABLE ACCESS BY INDEX ROWID | T_POLICY_FEE | 85 | 1 | 79 |00:00:00.01 | 25
|* 11 | INDEX UNIQUE SCAN | PK_T_POLICY_FEE | 85 | 1 | 85 |00:00:00.01 | 172 |
| 12 | TABLE ACCESS BY INDEX ROWID | T_PRODUCT_LIFE | 79 | 1 | 79 |00:00:00.01 | 81 |
|* 13 | INDEX UNIQUE SCAN | PK_T_PRODUCT_LIFE | 79 | 1 | 79 |00:00:00.01 | 2 |
| 14 | VIEW | VW_NSO_1 | 1 | 58 | 54 |00:00:00.01 | 23 |
|* 15 | FILTER | | 1 | | 54 |00:00:00.01 | 23 |
|* 16 | CONNECT BY WITH FILTERING | | 1 | | 54 |00:00:00.01 | 23 |
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
| 17 | TABLE ACCESS BY INDEX ROWID| T_AGENCY | 1 | | 1 |00:00:00.01 | 2 |
|* 18 | INDEX UNIQUE SCAN | PK_T_AGENCY | 1 | 1 | 1 |00:00:00.01 | 1 |
|* 19 | HASH JOIN | | 3 | | 53 |00:00:00.01 | 21 |
| 20 | CONNECT BY PUMP | | 3 | | 54 |00:00:00.01 | 0 |
| 21 | TABLE ACCESS FULL | T_AGENCY | 3 | 58 | 177 |00:00:00.01 | 21 |
| 22 | TABLE ACCESS FULL | T_AGENCY | 0 | 58 | 0 |00:00:00.01 | 0 |
|* 23 | INDEX RANGE SCAN | UNI_LIFE_BASIC__PRODUCT_INFO | 0 | 39 | 0 |00:00:00.01 | 0 |
----------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
2 - filter((TO_NUMBER("PL"."PERIOD_TYPE")2 OR IS NOT NULL))
3 - access("CM"."NB_AGENCY_ID"="$nso_col_1")
7 - filter(("PF"."POLICY_YEAR"=2 AND INTERNAL_FUNCTION("PF"."FEE_TYPE") AND TO_NUMBER("PF"."CHARG
"PKG_LIFE_CALC_BASIC_COMM"."F_IS_EFFECT"("PF"."LIST_ID")=0))
8 - filter(("CM"."NB_AGENCY_ID" IS NOT NULL AND "CM"."VALIDATE_DATE"
hh24:mi:ss') AND "CM"."PREM_STATUS"2 AND "CM"."VALIDATE_DATE">=TO_DATE('2009-03-01 00:00:00
hh24:mi:ss') AND ADD_MONTHS(INTERNAL_FUNCTION("CM"."VALIDATE_DATE"),12)
hh24:mi:ss') AND "CM"."PREM_STATUS"3 AND "CM"."PREM_STATUS"4))
9 - access("PF"."POLICY_ID"="CM"."POLICY_ID")
10 - filter((TO_NUMBER("PFE"."RECEIV_STATUS")=2 AND "PFE"."RECEIVED_TIME"
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
'yyyy-mm-dd hh24:mi:ss') AND "PFE"."RECEIVED_TIME">=ADD_MONTHS(INTERNAL_FUNCTION("CM"."VALIDA
11 - access("PF"."FEE_ID"="PFE"."FEE_ID")
13 - access("PF"."PRODUCT_ID"="PL"."PRODUCT_ID")
15 - filter((TO_DATE('2009-04-01 00:00:00', 'yyyy-mm-dd hh24:mi:ss')>TO_DATE('2009-03-01 00:00:00'
hh24:mi:ss') AND TO_DATE('2009-03-01 00:00:00', 'yyyy-mm-dd hh24:mi:ss')
hh24:mi:ss')))
16 - filter("AGENCY_ID"=121)
18 - access("AGENCY_ID"=121)
19 - access("PARENT_ID"=NULL)
23 - access("CC"."PRODUCT_ID"=:B1)
filter("CC"."COVERAGE_YEAR"1)
仔细分析后发现问题出在第7布,执行计划,并没有按照我们意图在进行所有的筛选条件过后再进行PKG_LIFE_CALC_BASIC_COMM.F_IS_EFFECT的处理,执行计划基本和以前一样。
但是为了临时解决问题,我们通过建立临时中间表的方法来解决这个问题,对子查询建立临时表,强制改变其访问路径。但是我们同时也考虑到,建立临时表毕竟需要对临时表进行频繁的INSERT和DELETE,所以最好的方式还是改写SQL本身,改变其访问路径。
所以经过我仔细分析后,希望通过改写语句如下:
select decode(sum(period_prem), null, 0, sum(period_prem)) from
(select decode(pkg_life_calc_basic_comm.f_is_effect(pf.list_id),'0',PF.list_id) list_id, decode(pkg_life_calc_basic_comm.f_is_effect(PF.list_id),'0',nvl(pf.period_prem, 0) )period_prem
from t_product_fee pf,
t_contract_master cm,
t_product_life pl,
t_policy_fee pfe
where pf.policy_id = cm.policy_id
and cm.validate_date
ADD_MONTHS(to_date('201005', 'yyyymm'), -13)
and cm.validate_date >=
ADD_MONTHS(to_date('201005', 'yyyymm'), -14)
and pf.fee_id = pfe.fee_id
and pfe.received_time >=
ADD_MONTHS(cm.validate_date, 12)
and pfe.received_time
ADD_MONTHS(to_date('201005', 'yyyymm'), 1)
and pfe.receiv_status = 2
and pf.charge_type = 1
and pf.product_id = pl.product_id
and (pl.period_type 2 or exists
(select 1
from t_life_basic cc
where cc.coverage_year 1
and cc.product_id = pl.product_id))
and pf.fee_type in (43, 47, 48)
and pf.policy_year = 2
and cm.prem_status not in (2, 3, 4)
and cm.nb_agency_id in
(select agency_id
from t_agency
start with agency_id = 121
Connect By Prior agency_id = parent_id));
如红色部分使用DECODE语句来改写WHERE语句,再次查看执行计划如下:
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
Plan hash value: 4085472284
----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
----------------------------------------------------------------------------------------------------
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.29 | 20999 |
|* 2 | FILTER | | 1 | | 0 |00:00:00.29 | 20999 |
| 3 | NESTED LOOPS | | 1 | 5 | 2019 |00:00:00.25 | 20941 |
| 4 | NESTED LOOPS | | 1 | 5 | 2019 |00:00:00.21 | 18920 |
| 5 | NESTED LOOPS | | 1 | 101 | 2137 |00:00:00.17 | 12507 |
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
|* 6 | TABLE ACCESS FULL | T_CONTRACT_MASTER | 1 | 100 | 2749 |00:00:00.03 |
|* 7 | TABLE ACCESS BY INDEX ROWID| T_PRODUCT_FEE | 2749 | 1 | 2137 |00:00:00.10 | 80
|* 8 | INDEX RANGE SCAN | IDX_PRODUCT_FEE__POLICY_ID | 2749 | 2 | 7161 |00:00:00.03 |
|* 9 | TABLE ACCESS BY INDEX ROWID | T_POLICY_FEE | 2137 | 1 | 2019 |00:00:00.05 | 64
|* 10 | INDEX UNIQUE SCAN | PK_T_POLICY_FEE | 2137 | 1 | 2137 |00:00:00.02 | 4276 |
| 11 | TABLE ACCESS BY INDEX ROWID | T_PRODUCT_LIFE | 2019 | 1 | 2019 |00:00:00.03 | 2
|* 12 | INDEX UNIQUE SCAN | PK_T_PRODUCT_LIFE | 2019 | 1 | 2019 |00:00:00.01 | 2 |
|* 13 | INDEX RANGE SCAN | UNI_LIFE_BASIC__PRODUCT_INFO | 6 | 39 | 0 |00:00:00.01 |
|* 14 | FILTER | | 2 | | 0 |00:00:00.01 | 46 |
|* 15 | CONNECT BY WITH FILTERING | | 2 | | 108 |00:00:00.01 | 46 |
| 16 | TABLE ACCESS BY INDEX ROWID| T_AGENCY | 2 | | 2 |00:00:00.01 | 4 |
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
|* 17 | INDEX UNIQUE SCAN | PK_T_AGENCY | 2 | 1 | 2 |00:00:00.01 | 2 |
|* 18 | HASH JOIN | | 6 | | 106 |00:00:00.01 | 42 |
| 19 | CONNECT BY PUMP | | 6 | | 108 |00:00:00.01 | 0 |
| 20 | TABLE ACCESS FULL | T_AGENCY | 6 | 58 | 354 |00:00:00.01 | 42 |
| 21 | TABLE ACCESS FULL | T_AGENCY | 0 | 58 | 0 |00:00:00.01 | 0 |
----------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(((TO_NUMBER("PL"."PERIOD_TYPE")2 OR IS NOT NULL) AND IS NOT NULL))
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
6 - filter(("CM"."VALIDATE_DATE"
"CM"."PREM_STATUS"2 AND "CM"."VALIDATE_DATE">=TO_DATE('2009-03-01 00:00:00', 'yyyy-mm-dd hh
ADD_MONTHS(INTERNAL_FUNCTION("CM"."VALIDATE_DATE"),12)
"CM"."PREM_STATUS"3 AND "CM"."PREM_STATUS"4))
7 - filter(("PF"."POLICY_YEAR"=2 AND INTERNAL_FUNCTION("PF"."FEE_TYPE") AND TO_NUMBER("PF"."CHARG
8 - access("PF"."POLICY_ID"="CM"."POLICY_ID")
9 - filter((TO_NUMBER("PFE"."RECEIV_STATUS")=2 AND "PFE"."RECEIVED_TIME"
'yyyy-mm-dd hh24:mi:ss') AND "PFE"."RECEIVED_TIME">=ADD_MONTHS(INTERNAL_FUNCTION("CM"."VALIDA
10 - access("PF"."FEE_ID"="PFE"."FEE_ID")
12 - access("PF"."PRODUCT_ID"="PL"."PRODUCT_ID")
13 - access("CC"."PRODUCT_ID"=:B1)
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
filter("CC"."COVERAGE_YEAR"1)
14 - filter("AGENCY_ID"=:B1)
15 - filter("AGENCY_ID"=121)
17 - access("AGENCY_ID"=121)
18 - access("PARENT_ID"=NULL)
此时已经第7步没有进行方法处理,这个处理会再所有的谓词过滤后执行。这个语句的执行时间为0.3秒左右,比起最开始的15秒得到了大量的提升。
二、解决方法
所以这个问题的解决方法如下:
1、改写SQL语句如下:
select /*+ gather_plan_statistics */
decode(sum(period_prem), null, 0, sum(period_prem)) from
(select decode(pkg_life_calc_basic_comm.f_is_effect(pf.list_id),'0',PF.list_id) list_id, decode(pkg_life_calc_basic_comm.f_is_effect(PF.list_id),'0',nvl(pf.period_prem, 0) )period_prem
from t_product_fee pf,
t_contract_master cm,
t_product_life pl,
t_policy_fee pfe
where pf.policy_id = cm.policy_id
and cm.validate_date
ADD_MONTHS(to_date('201005', 'yyyymm'), -13)
and cm.validate_date >=
ADD_MONTHS(to_date('201005', 'yyyymm'), -14)
and pf.fee_id = pfe.fee_id
and pfe.received_time >=
ADD_MONTHS(cm.validate_date, 12)
and pfe.received_time
ADD_MONTHS(to_date('201005', 'yyyymm'), 1)
and pfe.receiv_status = 2
and pf.charge_type = 1
and pf.product_id = pl.product_id
and (pl.period_type 2 or exists
(select 1
from t_life_basic cc
where cc.coverage_year 1
and cc.product_id = pl.product_id))
and pf.fee_type in (43, 47, 48)
and pf.policy_year = 2
and cm.prem_status not in (2, 3, 4)
and cm.nb_agency_id in
(select agency_id
from t_agency
start with agency_id = 121
Connect By Prior agency_id = parent_id));
2、考虑到我们的系统session_cached_cursors过小,同时引起了一些等待,所以尽可能更改参数session_cached_cursors加大每个会话能够缓存CURSOR的数量。