某省的生产库,收到告警短信oracle目录100%,登陆系统查看兼职被报错刷屏了。
Mon Oct 23 23:25:18 EAT 2017
Thread 1 advanced to log sequence 39420 (LGWR switch)
Current log# 7 seq# 39420 mem# 0: /vgxxx03/oradata/xxx/vg03_1_rd71.log
Current log# 7 seq# 39420 mem# 1: /vgxxx04/oradata/xxx/vg04_1_rd72.log
Tue Oct 24 01:02:04 EAT 2017
Errors in file /oracle/admin/esshlj/udump/xxx1_ora_11790.trc:
ORA-07445: 出现异常错误: 核心转储 [kadfih()+33] [SIGSEGV] [Address not mapped to object] [0x91A1A1A2A000F] [] []
Tue Oct 24 01:02:06 EAT 2017
Trace dumping is performing id=[cdmp_20171024010206]
Tue Oct 24 01:02:41 EAT 2017
Errors in file /oracle/admin/1_pmon_27125.trc:
ORA-07445: exception encountered: core dump [kadfih()+33] [SIGSEGV] [Address not mapped to object] [0x91A1A1A2A000F] [] []
Tue Oct 24 01:02:43 EAT 2017
Errors in file /oracle/admin/1_pmon_27125.trc:
ORA-00001: unique constraint (108.) violated
ORA-00001: unique constraint (108.) violated
ORA-00001: unique constraint (.) violated
ORA-00602: internal programming exception
ORA-00602: internal programming exception
ORA-00602: internal programming exception
ORA-07445: exception encountered: core dump [kadfih()+33] [SIGSEGV] [Address not mapped to object] [0x91A1A1A2A000F] [] []
Tue Oct 24 01:07:10 EAT 2017
Errors in file /oracle/admin/1_ora_16532.trc:
ORA-00600: 内部错误代码, 参数: [kghfrmrg:prv], [0xC00000073B406C28], [], [], [], [], [], []
Tue Oct 24 01:08:15 EAT 2017
Errors in file /oracle/admin/1_ora_16532.trc:
ORA-00600: 内部错误代码, 参数: [KGHFRE3], [0xC0000007433BACB0], [], [], [], [], [], []
Tue Oct 24 01:10:13 EAT 2017
Error 22813 occured while spilling buffered messages
Error 22813 occured while spilling buffered messages
Error 22813 occured while spilling buffered messages
Error 22813 occured while spilling buffered messages
Error 22813 occured while spilling buffered messages
Error 22813 occured while spilling buffered messages
Error 22813 occured while spilling buffered messages
Error 22813 occured while spilling buffered messages
Error 22813 occured while spilling buffered messages
Error 22813 occured while spilling buffered messages
Error 22813 occured while spilling buffered messages
Error 22813 occured while spilling buffered messages
Error 22813 occured while spilling buffered messages
Error 22813 occured while spilling buffered messages
Error 22813 occured while spilling buffered messages
Tue Oct 24 01:10:18 EAT 2017
Errors in file /oracle/admin/1_q003_26244.trc:
ORA-00600: internal error code, arguments: [kpotcprc: uga depth exceeded], [16], [], [], [], [], [], []
Error 600 occured while spilling buffered messages
Tue Oct 24 01:10:20 EAT 2017
Errors in file /oracle/admin/1_q003_26244.trc:
ORA-00600: internal error code, arguments: [kpotcprc: uga depth exceeded], [16], [], [], [], [], [], []
Error 600 occured while spilling buffered messages
Tue Oct 24 01:10:23 EAT 2017
Errors in file /oracle/admin/1_q003_26244.trc:
ORA-00600: internal error code, arguments: [kpotcprc: uga depth exceeded], [16], [], [], [], [], [], []
..........................................................................................................................................
Tue Oct 24 15:00:26 EAT 2017
Errors in file /oracle/admin/1_q007_20218.trc:
ORA-00600: internal error code, arguments: [kgqbt_search_block198], [], [], [], [], [], [], []
Tue Oct 24 15:00:26 EAT 2017
Errors in file /oracle/admin/1_q005_20204.trc:
ORA-00600: internal error code, arguments: [kgqbt_search_block198], [], [], [], [], [], [], []
Tue Oct 24 15:00:27 EAT 2017
Errors in file /oracle/admin/1_q000_19435.trc:
ORA-07445: exception encountered: core dump [kocgpn2()+224] [SIGSEGV] [Address not mapped to object] [0x600000002] [] []
Tue Oct 24 15:00:28 EAT 2017
Errors in file /oracle/admin/1_q004_20225.trc:
ORA-00600: internal error code, arguments: [kgqbt_search_block198], [], [], [], [], [], [], []
Tue Oct 24 15:00:28 EAT 2017
Errors in file /oracle/admin/1_q007_20218.trc:
ORA-00600: internal error code, arguments: [kgqbt_search_block198], [], [], [], [], [], [], []
Tue Oct 24 15:00:29 EAT 2017
Errors in file /oracle/admin/1_q005_20234.trc:
ORA-00600: internal error code, arguments: [kgqbt_search_block198], [], [], [], [], [], [], []
Tue Oct 24 15:00:29 EAT 2017
Errors in file /oracle/admin/1_q004_20225.trc:
ORA-00600: internal error code, arguments: [kgqbt_search_block198], [], [], [], [], [], [], []
Tue Oct 24 15:00:30 EAT 2017
Errors in file /oracle/admin/1_q003_19467.trc:
ORA-07445: exception encountered: core dump [kocgpn2()+224] [SIGSEGV] [Address not mapped to object] [0x600000002] [] []
初判应该是stream引起的。节点二也有相关stream的cdmp文件产生
Tue Oct 24 04:56:50 EAT 2017
Trace dumping is performing id=[cdmp_20171024015310]
Streams process APPLY_xxx altered (1) on instance 1
Tue Oct 24 05:11:34 EAT 2017
Trace dumping is performing id=[cdmp_20171024015317]
Streams process APPLY_xxx altered (1) on instance 1
Streams process APPLY_xxx altered (1) on instance 1
Tue Oct 24 05:26:18 EAT 2017
Trace dumping is performing id=[cdmp_20171024015321]
随即给厂家人员打了电话,让他们看看stream是否正常,反应状态异常。kill其进程p000、p001,立马就停止apply进程否则报错!!!
SQL> show user
USER is "STRMADMIN"
SQL> execute dbms_apply_adm.stop_apply (apply_name => 'apply_esshlj');
PL/SQL procedure successfully completed.
SQL> select apply_name,queue_name,status from dba_apply;
APPLY_NAME QUEUE_NAME STATUS
------------------------------ ------------------------------ --------
APPLY_ESSHLJ ESSHLJ_QUEUE DISABLED ###这会儿状态为DISABLE###启动后状态为ENABLE。
随即又清理了一下缓存,该在 strmadmin下执行
declare
options dbms_aqadm.aq$_purge_options_t;
begin
options.delivery_mode := dbms_aqadm.buffered;
dbms_aqadm.purge_queue_table('ESSHLJ_QUEUE_TABLE', null, options);
end;
/
报错了
ERROR at line 1:
ORA-03113: end-of-file on communication channel
1节点alert的trc文件一直产生,一秒钟可生成5个左右文件,一个2M。oracle目录只有20G。编写脚本自动删除trc文件。
1,尝试了flush buffer_cache;
2,尝试了flush share_pool;
因为该报错的只是节点一,随即打算将stream切换至节点二,观察是否报错依旧,在目标端中心库上面执行了如下命令;
exec DBMS_AQADM.ALTER_QUEUE_TABLE@esshlj(queue_table => 'ESSHLJ_QUEUE_TABLE', primary_instance => 2, secondary_instance => 1);
观察p000和p001进程在1节点已经不在了。2节点stream状态正常,ORA-600报错没了,但是ORA-7445报错依旧。因为报错中一直有小工进程q00这样的进程,暂时修改aq_tm_processes参数为0,看看报错还有没有。
SQL> !ps -ef |grep q00
oracle 8955 8883 1 15:22:05 pts/5 0:00 /sbin/sh -c ps -ef |grep q00
oracle 8957 8955 0 15:22:05 pts/5 0:00 grep q00
oracle 27097 1 0 14:37:32 ? 0:00 ora_q006_esshlj1
oracle 8820 1 51 15:21:52 ? 0:00 ora_q003_esshlj1
oracle 8018 1 0 15:21:41 ? 0:00 ora_q008_esshlj1
oracle 6752 1 0 15:21:12 ? 0:00 ora_q007_esshlj1
oracle 7641 1 0 15:21:27 ? 0:00 ora_q004_esshlj1
oracle 8863 1 1 15:21:58 ? 0:00 ora_q000_esshlj1
oracle 8839 1 0 15:21:55 ? 0:00 ora_q009_esshlj1
oracle 8931 1 4 15:22:01 ? 0:00 ora_q001_esshlj1
oracle 8954 1 6 15:22:04 ? 0:00 ora_q002_esshlj1
SQL> alter system set aq_tm_processes = 0; #####修改参数为0#######
System altered.
SQL> !ps -ef |grep q00
oracle 11224 8883 0 15:23:08 pts/5 0:00 /sbin/sh -c ps -ef |grep q00
oracle 11226 11224 0 15:23:08 pts/5 0:00 grep q00
小工进程已经没有,但是报错依旧~头疼,在改回去吧。
SQL> alter system set aq_tm_processes = 1;
System altered.
SQL> !ps -ef |grep q00
oracle 12198 8883 0 15:23:35 pts/5 0:00 /sbin/sh -c ps -ef |grep q00
oracle 12203 12198 0 15:23:35 pts/5 0:00 grep q00
oracle 12135 1 20 15:23:35 ? 0:00 ora_q000_esshlj1
oracle 12199 1 3 15:23:35 ? 0:00 ora_q002_esshlj1
oracle 12167 1 7 15:23:35 ? 0:00 ora_q001_esshlj1
有同事大胆猜想既然不在1节点跑,是否可以将1节点的aq_tm_processes参数设置为0, 将2节点的aq_tm_processes参数设置为1,并且指定SID。
SQL> alter system set aq_tm_processes = 0 sid='esshlj1';
System altered.
随后观察1节点并没有产生烦人的7445告警
Tue Oct 24 15:36:08 EAT 2017
ALTER SYSTEM SET aq_tm_processes=0 SCOPE=BOTH SID='esshlj1';
Tue Oct 24 15:45:58 EAT 2017
Trace dumping is performing id=[cdmp_20171024152311]
Tue Oct 24 16:43:44 EAT 2017
Thread 1 advanced to log sequence 39430 (LGWR switch)
Current log# 1 seq# 39430 mem# 0: /vghlj/oradata/redo01.log
Current log# 1 seq# 39430 mem# 1: /vghlj/oradata/1_rd11.log