Oracle数据库一次ORA-00600[4194]事件处理

本文涉及的产品
日志服务 SLS,月写入数据量 50GB 1个月
简介: 一、问题背景1、20191120某客户托尔斯程序运行异常,无法正常工作2、查看程序日志显示ORA-00600、ORA-02002等数据库错误3、数据库服务器信息    ①系统:CentOS 6.8(64位)    ②数据库:Oracle RAC 11.

一、问题背景

1、20191120某客户托尔斯程序运行异常,无法正常工作
2、查看程序日志显示ORA-00600、ORA-02002等数据库错误
3、数据库服务器信息
    ①系统:CentOS 6.8(64位)
    ②数据库:Oracle RAC 11.2.0.4

二、问题排查&处理

1、查看数据库信息

    ①首先查看了Oracle RAC集群信息,数据库集群运行正常
    ②查看数据库日志,节点1运行正常,节点2 alert日志中有大量ORA600信息:

Wed Nov 20 11:28:30 2019
Errors in file /u01/app/oracle/diag/rdbms/racecs/racecs2/trace/racecs2_ora_11752.trc  (incident=51957):
ORA-00600: internal error code, arguments: [4194], [], [], [], [], [], [], [], [], [], [], []
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Block recovery from logseq 946, block 733 to scn 2776710962
Recovery of Online Redo Log: Thread 2 Group 3 Seq 946 Reading mem 0
  Mem# 0: +DATA/racecs/onlinelog/group_3.261.989703707
Block recovery completed at rba 946.735.16, scn 0.2776710963
Block recovery from logseq 946, block 733 to scn 2776711284
Recovery of Online Redo Log: Thread 2 Group 3 Seq 946 Reading mem 0
  Mem# 0: +DATA/racecs/onlinelog/group_3.261.989703707
Block recovery completed at rba 946.849.16, scn 0.2776711285
Errors in file /u01/app/oracle/diag/rdbms/racecs/racecs2/trace/racecs2_ora_11752.trc  (incident=51958):
ORA-00600: internal error code, arguments: [4194], [], [], [], [], [], [], [], [], [], [], []
ORA-02002: error while writing to audit trail
ORA-00600: internal error code, arguments: [4194], [], [], [], [], [], [], [], [], [], [], []
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.

    ③查看当前数据库集群中每个节点上会话信息,节点1上有程序连接进来的信息,节点2基本都是系统用户会话信息

2、根据客户反馈的信息,进行下一步处理

    ①客户反馈之前也遇到过目前的情况(ORA2002),之前是对system表空间进行手动resize,之后程序就会恢复正常,但是这次操作了之后也没有恢复正常,程序报错如下:
ter_error

    ②根据客户描述的信息检查数据库审计情况
        a、发现数据库审计参数AUDIT_TRAIL=DB,是数据库默认的配置
        b、接下来检查system表空间信息,当前system表空间是3G,使用率70%(客户自己resize后),并且system表空间是自动扩展,看来问题不是由于审计表空间不足引起的
    ③system充足的情况下,程序多次重启还是报ORA2002错误,看日志显示ORA600后紧接着就是ORA2002,为了及时恢复业务,客户也表示没有使用到数据库审计,决定关闭数据库审计功能

3、数据库修改了AUDIT_TRAIL=NONE,truncate了SYS.AUD$表数据,关闭了数据库集群,进行了重启,观察程序情况:

    ①程序在关闭数据库审计后启动,仍然报错,此时报错内容只剩下了ORA600
trs_error1
    ②查看当前数据库中session信息,和之前一样,节点1有程序连接信息,节点2上没有
    ③这时再查看节点2alert日志信息,ORA2002虽然没有了,但是ORA600仍然不断打印出来

Wed Nov 20 21:36:11 2019
Errors in file /u01/app/oracle/diag/rdbms/racecs/racecs2/trace/racecs2_ora_19012.trc  (incident=100413):
ORA-00600: internal error code, arguments: [4194], [], [], [], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/racecs/racecs2/incident/incdir_100413/racecs2_ora_19012_i100413.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Wed Nov 20 21:36:16 2019
Dumping diagnostic data in directory=[cdmp_20191120213616], requested by (instance=2, osid=19012), summary=[incident=100413].
Block recovery from logseq 972, block 168 to scn 2776911378
Recovery of Online Redo Log: Thread 2 Group 7 Seq 972 Reading mem 0
  Mem# 0: +DATA/racecs/onlinelog/group_7.log
Block recovery completed at rba 972.173.16, scn 0.2776911845
Block recovery from logseq 972, block 168 to scn 2776914575
Recovery of Online Redo Log: Thread 2 Group 7 Seq 972 Reading mem 0
  Mem# 0: +DATA/racecs/onlinelog/group_7.log
Block recovery completed at rba 972.657.16, scn 0.2776914576
Wed Nov 20 21:38:25 2019
Errors in file /u01/app/oracle/diag/rdbms/racecs/racecs2/trace/racecs2_ora_18696.trc  (incident=100333):
ORA-00600: internal error code, arguments: [4194], [], [], [], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/racecs/racecs2/incident/incdir_100333/racecs2_ora_18696_i100333.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
...
Wed Nov 20 21:46:09 2019
Errors in file /u01/app/oracle/diag/rdbms/racecs/racecs2/trace/racecs2_ora_18732.trc  (incident=100373):
ORA-00600: Śҿխϳպë, ӎ˽: [4194], [], [], [], [], [], [], [], [], [], [], []
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Block recovery from logseq 972, block 1011 to scn 2776915928
Recovery of Online Redo Log: Thread 2 Group 7 Seq 972 Reading mem 0
  Mem# 0: +DATA/racecs/onlinelog/group_7.log
Block recovery completed at rba 972.1016.16, scn 0.2776915929
Block recovery from logseq 972, block 1011 to scn 2776916012
Recovery of Online Redo Log: Thread 2 Group 7 Seq 972 Reading mem 0
  Mem# 0: +DATA/racecs/onlinelog/group_7.log
Block recovery completed at rba 972.1031.16, scn 0.2776916013
Wed Nov 20 21:48:41 2019
Errors in file /u01/app/oracle/diag/rdbms/racecs/racecs2/trace/racecs2_ora_21433.trc  (incident=100445):
ORA-00600: Śҿխϳպë, ӎ˽: [4194], [ody "SYS.DBMS_STANDARD"
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Block recovery from logseq 972, block 1011 to scn 2776915928
Recovery of Online Redo Log: Thread 2 Group 7 Seq 972 Reading mem 0
  Mem# 0: +DATA/racecs/onlinelog/group_7.log
Block recovery completed at rba 972.1016.16, scn 0.2776915929
Block recovery from logseq 972, block 1011 to scn 2776916447
Recovery of Online Redo Log: Thread 2 Group 7 Seq 972 Reading mem 0
  Mem# 0: +DATA/racecs/onlinelog/group_7.log
Block recovery completed at rba 972.1073.16, scn 0.2776916448

4、根据报错内容在MOS查看相关问题,发现文档1428786.1和这次的现象类似,并且节点2服务器在15号多次发生了异常重启,引起数据库多次异常重启

Fri Nov 15 08:32:28 2019
Active Session History (ASH) performed an emergency flush. This may mean that ASH is undersized. If emergency flushes are a recurring issue, you may consider increasing ASH size by setting the value of _ASH_SIZE to a sufficiently large value. Currently, ASH size is 16777216 bytes. Both ASH size and the total number of emergency flushes since instance startup can be monitored by running the following query:
 select total_size,awr_flush_emergency_count from v$ash_info; 
Fri Nov 15 09:06:34 2019
IPC Send timeout detected. Receiver ospid 7546 [
Fri Nov 15 09:06:34 2019
Errors in file /u01/app/oracle/diag/rdbms/racecs/racecs2/trace/racecs2_ora_7546.trc:
Fri Nov 15 09:29:15 2019
minact-scn: useg scan erroring out with error e:12751
Fri Nov 15 09:53:19 2019
NOTE: ASMB terminating
Errors in file /u01/app/oracle/diag/rdbms/racecs/racecs2/trace/racecs2_asmb_3794.trc:
ORA-15064: communication failure with ASM instance
ORA-03113: end-of-file on communication channel
Process ID: 
Session ID: 217 Serial number: 61
Errors in file /u01/app/oracle/diag/rdbms/racecs/racecs2/trace/racecs2_asmb_3794.trc:
ORA-15064: communication failure with ASM instance
ORA-03113: end-of-file on communication channel
Process ID: 
Session ID: 217 Serial number: 61
ASMB (ospid: 3794): terminating the instance due to error 15064
Termination issued to instance processes. Waiting for the processes to exit
Fri Nov 15 09:53:30 2019
Instance termination failed to kill one or more processes
Instance terminated by ASMB, pid = 3794
Fri Nov 15 10:16:47 2019
Starting ORACLE instance (normal)

5、根据MOS文档里面的描述,很可能就是由于实例2服务器异常重启导致数据库异常,MOS文档描述原因如下:

The following error is occurring in the alert.log right before the database crashes.
ORA-00600: internal error code, arguments: [4194], [#], [#], [], [], [], [], []
This error indicates that a mismatch has been detected between redo records and rollback (undo) records.

ARGUMENTS:

Arg [a] - Maximum Undo record number in Undo block
Arg [b] - Undo record number from Redo block

Since we are adding a new undo record to our undo block, we would expect that the new record number is equal to the maximum record number in the undo block plus one. Before Oracle can add a new undo record to the undo block it validates that this is correct. If this validation fails, then an ORA-600 [4194] will be triggered.

This issue generally occurs when there is a power outage or hardware failure that initially crashes the database. On startup, the database does the normal roll forward (redo) and then rollback (undo), this is where the error is generated on the rollback.

大致意思是:
在向undo块添加一个新的undo记录,会将新的记录号等于undo块中的最大记录号加1。在Oracle向undo块添加一个新的undo记录之前,它会验证这是正确的。如果验证失败,将触发ORA-600[4194]。
这个问题通常发生在出现断电或硬件故障时,这些故障最初会导致数据库崩溃。在启动时,数据库执行普通的前滚(重做),然后回滚(撤消),这是在回滚时产生错误的地方。
从这里在结合实例2在15号多次重启的现象,可以看出很可能就是因为实例2服务器异常重启导致

6、根据上面定位到的问题对数据库进行修复

*创建一个新的undo表空间,使用新的undo段号,它比当前使用的段号要高。这样,当一个事务执行清除块时,对撤消段的引用不存在,并继续清除块,这样就不会在产生上面发生的内容。
实例2进行操作
①、Create pfile='/tmp/initsid.ora' from spfile;
②、修改undo
a、查看当前undo表空间位置和大小
b、实例2创建新的undo,替换之前的undo
  create undo tablespace undotbs3 datafile '+DATA/racecs/datafile/undotbs3.dbf' size 1500m autoextend on;(注意根据之前undo值进行替换)
  
  alter system set undo_tablespace='undotbs3' scope=spfile sid='racecs2';
③、实例2数据库重启
  shutdown immediate
  startup

7、操作完成之后,程序连接不在报错

三、其他

1、在数据库恢复之后,隔了一天又反馈出现问题,发现实例2服务器再一次发生了重启,这次数据库日志中到没有出现ORA600信息,数据库运行正常
2、接下来应该重点排查实例2服务器异常重启原因

相关文章
|
11天前
|
消息中间件 资源调度 关系型数据库
如何在Flink on YARN环境中配置Debezium CDC 3.0,以实现实时捕获数据库变更事件并将其传输到Flink进行处理
本文介绍了如何在Flink on YARN环境中配置Debezium CDC 3.0,以实现实时捕获数据库变更事件并将其传输到Flink进行处理。主要内容包括安装Debezium、配置Kafka Connect、创建Flink任务以及启动任务的具体步骤,为构建实时数据管道提供了详细指导。
34 9
|
18天前
|
SQL Oracle 关系型数据库
Oracle数据库优化方法
【10月更文挑战第25天】Oracle数据库优化方法
26 7
|
18天前
|
Oracle 关系型数据库 数据库
oracle数据库技巧
【10月更文挑战第25天】oracle数据库技巧
22 6
|
18天前
|
存储 Oracle 关系型数据库
Oracle数据库优化策略
【10月更文挑战第25天】Oracle数据库优化策略
17 5
|
25天前
|
存储 Oracle 关系型数据库
数据库数据恢复—Oracle ASM磁盘组故障数据恢复案例
Oracle数据库数据恢复环境&故障: Oracle ASM磁盘组由4块磁盘组成。Oracle ASM磁盘组掉线 ,ASM实例不能mount。 Oracle数据库故障分析&恢复方案: 数据库数据恢复工程师对组成ASM磁盘组的磁盘进行分析。对ASM元数据进行分析发现ASM存储元数据损坏,导致磁盘组无法挂载。
|
7天前
|
SQL 关系型数据库 MySQL
12 PHP配置数据库MySQL
路老师分享了PHP操作MySQL数据库的方法,包括安装并连接MySQL服务器、选择数据库、执行SQL语句(如插入、更新、删除和查询),以及将结果集返回到数组。通过具体示例代码,详细介绍了每一步的操作流程,帮助读者快速入门PHP与MySQL的交互。
23 1
|
9天前
|
SQL 关系型数据库 MySQL
go语言数据库中mysql驱动安装
【11月更文挑战第2天】
24 4
|
1月前
|
存储 关系型数据库 MySQL
Mysql(4)—数据库索引
数据库索引是用于提高数据检索效率的数据结构,类似于书籍中的索引。它允许用户快速找到数据,而无需扫描整个表。MySQL中的索引可以显著提升查询速度,使数据库操作更加高效。索引的发展经历了从无索引、简单索引到B-树、哈希索引、位图索引、全文索引等多个阶段。
61 3
Mysql(4)—数据库索引
|
16天前
|
监控 关系型数据库 MySQL
数据库优化:MySQL索引策略与查询性能调优实战
【10月更文挑战第27天】本文深入探讨了MySQL的索引策略和查询性能调优技巧。通过介绍B-Tree索引、哈希索引和全文索引等不同类型,以及如何创建和维护索引,结合实战案例分析查询执行计划,帮助读者掌握提升查询性能的方法。定期优化索引和调整查询语句是提高数据库性能的关键。
82 1
|
18天前
|
关系型数据库 MySQL Linux
在 CentOS 7 中通过编译源码方式安装 MySQL 数据库的详细步骤,包括准备工作、下载源码、编译安装、配置 MySQL 服务、登录设置等。
本文介绍了在 CentOS 7 中通过编译源码方式安装 MySQL 数据库的详细步骤,包括准备工作、下载源码、编译安装、配置 MySQL 服务、登录设置等。同时,文章还对比了编译源码安装与使用 RPM 包安装的优缺点,帮助读者根据需求选择最合适的方法。通过具体案例,展示了编译源码安装的灵活性和定制性。
61 2

推荐镜像

更多