1,问题现象
DG环境,凌晨时刻,实例DOWN,需要进行分析,观察DB Alert日志,可以明确看到ORA-04021报错,以及打印LGWR TRACE日志。
2,问题分析和处理建议
经查询由于Bug引发系统down,Bug信息:
Issue matches with bug 11712267 and bug 16717701
Since two bugs are matching with the case,
You can try with option (1) . As per Bug 11712267
change the cursor_sharing to force on Active dataguard (ADG).
Monitor your environment for sometime.
If it crashes again then follow with the option (2)
Option (2):
As per bug description
LGWR can request DBINSTANCE lock in X mode without any timeout which can lead to a hang / deadlock.
Both fixes are already included in 11.2.0.4 but the fix is DISABLED by default.
== > To ENABLE the fix one has to set == > "_adg_parselock_timeout" > to the number of centi-seconds == > LGWR should wait
before backing off and retrying the request.
Value should be in centi seconds. == > I Don't think there is really any hard fast rule for a value - at default (0) it will not timeout.
A value representing a few seconds seems reasonable - if LGWR has been stuck for say 5 seconds waiting it seems reasonable
guess it is not going to get the lock.
The param just causes it to abort the current attempt and retry If you want to play safe can start with a higher value then decrease later.
A higher value will just mean more sessions blocked for longer in case of the deadlock situation.
500 Seems reasonable , but I have no data to base it on.
There should be a statistic "ADG parselock X get attempts" If it gets set too small that value would likely
increase a lot due to keep timing out and retrying.
This is a dynamic parameter
Follow option (1) .
change the cursor_sharing to force on ADG
If issue re-appears then follow option (2) as below
Please set "_adg_parselock_timeout" to 500 == >
SQL > alter system set "_adg_parselock_timeout"=500 scope=both sid='*';
处理建议:
第一修改 cursor_sharing 参数为Force,目的是什么,是为了让硬解析的SQL大部分转为软解析,降低shared pool内存资源,以及内存栓锁及CPU的争用。
第二修改_adg_parselock_timeout 为500,及LGWR获取内存栓锁默认是0,不允许延迟,设置为5s,允许性能极差或资源争用的情况下,延长LGWR的申请时间。