1 Oracle undo介绍
Undo记录存放的是数据的前映像,保存的是数据更改前的信息,用于事务回滚或者撤销对数据库所作的更改。在Oracle数据库中,undo除了用于事务回滚外,还用于数据的一致性读,这是oracle数据库读不阻塞写,写不阻塞读的原因,undo记录也是数据库崩溃i恢复必须的,它用户回滚数据库崩溃前未提交的事务,Oracle的闪回查询也是在undo的基础之上实现的。
undo记录存储undo段内,undo段存储在undo表空间内。在undo自动管理模式下,Oracle会自动数据库的负载和系统的恢复目标自动管理undo段。
在数据库由大量事务运行的过程中,活跃的undo段发生故障会导致什么样的后果,本文通过实验来模拟这个故障以及如何处理这个故障。
2 模拟一个事务
用普通用户登录数据库,关闭会话的自动提交,运行一条insert语句,模拟一个简单的事务,这个会话在后面称为用户会话。
--用户会话SQL> show autoc autocommit OFF SQL>insertinto test values(1,'zhangsan',1200);1 row created.
另开一个会话,以sysdba身份登录数据库,这个会话后面就称为管理会话。
--管理会话SQL>select XIDUSN,XIDSLOT,XIDSQN from v$transaction; XIDUSN XIDSLOT XIDSQN ---------- ---------- ----------624863
可以看到,数据库现在有一个活跃的事务,这个事务的USN是6,这个事务使用的undo段可以从dba_rollback_segs中查到。
SQL>--管理会话1*select SEGMENT_ID, SEGMENT_NAME,OWNER,TABLESPACE_NAME,STATUS from dba_rollback_segs where SEGMENT_ID=6SQL>/ SEGMENT_ID SEGMENT_NAME OWNER TABLESPACE_NAM STATUS ---------- ------------------------------ ------ -------------- ----------------6 _SYSSMU6_2371613632$ PUBLIC UNDOTBS1 ONLINE
3 清空undo文件,模拟undo文件故障
查询当前undo表空间内的数据文件
--管理会话SQL> l 1*select FILE_NAME,TABLESPACE_NAME from dba_data_files where TABLESPACE_NAME='UNDOTBS1'SQL>/ FILE_NAME TABLESPACE_NAM ---------------------------------------- --------------/opt/oracle/oradata/ORCL/undotbs01.dbf UNDOTBS1
另开一个会话,使用cp命令清空查到的数据文件
--管理会话 [root@ ~]# cp /dev/null /opt/oracle/oradata/ORCL/undotbs01.dbfcp: overwrite '/opt/oracle/oradata/ORCL/undotbs01.dbf'? y [root@ ~]# ls -l /opt/oracle/oradata/ORCL/undotbs01.dbf-rw-r-----1 oracle oinstall 0 Aug 2314:16 /opt/oracle/oradata/ORCL/undotbs01.dbf
查询数据库中undo段的状态
--管理会话SQL>select SEGMENT_ID, SEGMENT_NAME,OWNER,TABLESPACE_NAME,STATUS from dba_rollback_segs;SEGMENT_ID SEGMENT_NAME OWNER TABLESPACE_NAM STATUS ---------- ------------------------------ ------ -------------- ----------------0 SYSTEM SYS SYSTEM ONLINE 1 _SYSSMU1_1608542815$ PUBLIC UNDOTBS1 ONLINE 2 _SYSSMU2_811063828$ PUBLIC UNDOTBS1 ONLINE 3 _SYSSMU3_2526703091$ PUBLIC UNDOTBS1 ONLINE 4 _SYSSMU4_2622241892$ PUBLIC UNDOTBS1 ONLINE 5 _SYSSMU5_1456795995$ PUBLIC UNDOTBS1 ONLINE 6 _SYSSMU6_2371613632$ PUBLIC UNDOTBS1 ONLINE 7 _SYSSMU7_1726144780$ PUBLIC UNDOTBS1 ONLINE 8 _SYSSMU8_2971734591$ PUBLIC UNDOTBS1 ONLINE 9 _SYSSMU9_882645763$ PUBLIC UNDOTBS1 ONLINE 10 _SYSSMU10_4198647960$ PUBLIC UNDOTBS1 ONLINE 11 rows selected.
数据库中undo段的状态没有发生变化
在用户会话中提交事务
--用户会话SQL> commit;Commit complete.
用户事务能够提交。
在管理会话中手动执行checkpoint
--管理会话SQL>alter system checkpoint;alter system checkpoint *ERROR at line 1:ORA-03113: end-of-file on communication channel Process ID:33802Session ID:15 Serial number:45257
数据库崩溃了,这时,数据库的告警日志也产生了错误信息
2022-08-23T14:18:23.295031+08:00 Errors in file /opt/oracle/diag/rdbms/orcl/orcl/trace/orcl_smon_19487.trc: ORA-01595: error freeing extent (3) of rollback segment (1)) ORA-01115: IO error reading block from file (block # )ORA-01110: data file 4: '/opt/oracle/oradata/ORCL/undotbs01.dbf'ORA-27072: File I/O error --2022-08-23T14:18:23.447608+08:00 Errors in file /opt/oracle/diag/rdbms/orcl/orcl/trace/orcl_mz00_33878.trc: ORA-01110: data file 4: '/opt/oracle/oradata/ORCL/undotbs01.dbf'ORA-01565: error in identifying file '/opt/oracle/oradata/ORCL/undotbs01.dbf'ORA-27047: unable to read the header block of file --Checker run found 1 new persistent data failures 2022-08-23T14:20:35.582127+08:00 Read of datafile '/opt/oracle/oradata/ORCL/undotbs01.dbf' (fno 4) header failed with ORA-01210 --Completely zero block found during datafile header read Rereading datafile 4 header failed with ORA-01210 --2022-08-23T14:20:35.649701+08:00 Errors in file /opt/oracle/diag/rdbms/orcl/orcl/trace/orcl_ckpt_19481.trc: ORA-63999: data file suffered media failure ORA-01122: database file 4 failed verification check ORA-01110: data file 4: '/opt/oracle/oradata/ORCL/undotbs01.dbf'ORA-01210: data file header is media corrupt 2022-08-23T14:20:35.649943+08:00 Errors in file /opt/oracle/diag/rdbms/orcl/orcl/trace/orcl_ckpt_19481.trc: ORA-63999: data file suffered media failure ORA-01122: database file 4 failed verification check ORA-01110: data file 4: '/opt/oracle/oradata/ORCL/undotbs01.dbf'ORA-01210: data file header is media corrupt Errors in file /opt/oracle/diag/rdbms/orcl/orcl/trace/orcl_ckpt_19481.trc (incident=9785) (PDBNAME=CDB$ROOT): ORA-63999 [] [] [] [] [] [] [] [] [] [] [] []
在Oracle的后台进程中,smon进程管理undo段,在手动发起checkpoint后,smon进程在释放undo段的扩展时首先报错,MZ00(MMON_SLAVE)进程在执行DDE Action: 'DB_STRUCTURE_INTEGRITY_CHECK的过程中报错,checkpoint进程读undo文件错误,数据库关闭。
4 故障诊断
启动数据库
SQL> startup; ORACLE instance started. Total System Global Area 763362712 bytes Fixed Size 9690520 bytes Variable Size 662700032 bytes Database Buffers 88080384 bytes Redo Buffers 2891776 bytes Database mounted. ORA-01157: cannot identify/lock data file 4- see DBWR trace file ORA-01110: data file 4:'/opt/oracle/oradata/ORCL/undotbs01.dbf'
数据库因为找不到数据文件(这里是undo表空间数据文件)而不能打开。将undo管理改为手动模式后,重启实例至mount模式,将找不到的undo文件离线后打开数据库。
SQL>alter system set undo_management=manual scope=spfile; System altered. SQL> shutdown immediate; ORA-01109: database not open Database dismounted. ORACLE instance shut down. SQL> statup mount; SP2-0734:unknown command beginning "statup mou..."- rest of line ignored. SQL> startup mount; ORACLE instance started. Total System Global Area 763362712 bytes Fixed Size 9690520 bytes Variable Size 662700032 bytes Database Buffers 88080384 bytes Redo Buffers 2891776 bytes Database mounted. SQL>alter database datafile 4 offline; Database altered. SQL>alter database open; Database altered.
查看数据库个undo段的状态
SQL> l 1*select SEGMENT_ID, SEGMENT_NAME,OWNER,TABLESPACE_NAME,STATUS from dba_rollback_segs SQL>/ SEGMENT_ID SEGMENT_NAME OWNER TABLESPACE_NAME STATUS ---------- ------------------------------ ------ ------------------------------ ----------------0 SYSTEM SYS SYSTEM ONLINE 1 _SYSSMU1_1608542815$ PUBLIC UNDOTBS1 NEEDS RECOVERY 2 _SYSSMU2_811063828$ PUBLIC UNDOTBS1 NEEDS RECOVERY 3 _SYSSMU3_2526703091$ PUBLIC UNDOTBS1 NEEDS RECOVERY 4 _SYSSMU4_2622241892$ PUBLIC UNDOTBS1 NEEDS RECOVERY 5 _SYSSMU5_1456795995$ PUBLIC UNDOTBS1 NEEDS RECOVERY 6 _SYSSMU6_2371613632$ PUBLIC UNDOTBS1 NEEDS RECOVERY 7 _SYSSMU7_1726144780$ PUBLIC UNDOTBS1 NEEDS RECOVERY 8 _SYSSMU8_2971734591$ PUBLIC UNDOTBS1 NEEDS RECOVERY 9 _SYSSMU9_882645763$ PUBLIC UNDOTBS1 NEEDS RECOVERY 10 _SYSSMU10_4198647960$ PUBLIC UNDOTBS1 NEEDS RECOVERY 11 rows selected.
被删除的undo数据文件上的回滚段都处于需要恢复状态。这种状态下,不能再数据库中执行DML语句。
5 故障处理
这种情况下,如果由数据库的全备及全备以后的所有归档日志,对丢失的undo文件执行完全恢复即可,不会丢失数据。这里介绍一下在没有可用备份的情况下的处理方式。
创建一个undo表空间
SQL>create undo tablespace undotbs02 datafile '/opt/oracle/oradata/ORCL/undotbs02.dbf' size 10M autoextend on; Tablespace created.
将此表表空间设置为数据库表空间,使配置生效需要重启数据库
SQL>alter system set undo_tablespace=undotbs02 scope=spfile; System altered.
将undo管理方式改为自动
SQL>alter system set undo_management=auto scope=spfile; System altered.
重启数据库后删除原来的表空间undo1
SQL>drop tablespace undotbs1 including contents and datafiles;drop tablespace undotbs1 including contents and datafiles * ERROR at line 1: ORA-01548: active rollback segment '_SYSSMU1_1608542815$' found, terminate dropping tablespace
因为有活跃的回滚段表空间删除被终结。
这种情况下可以设置隐含参数
_corrupted_rollback_segments和_offline_rollback_segments,重启数据库就可以drop掉这个undo表空间了。
SQL>alter system set"_offline_rollback_segments"=true scope=spfile; System altered. SQL>alter system set"_corrupted_rollback_segments"='_SYSSMU1_1608542815$','_SYSSMU2_811063828$','_SYSSMU3_2526703091$','_SYSSMU4_2622241892$','_SYSSMU5_1456795995$','_SYSSMU6_2371613632$','_SYSSMU7_1726144780$','_SYSSMU8_2971734591$','_SYSSMU9_882645763$','_SYSSMU10_4198647960$' scope=spfile 2;
_corrupted_rollback_segments的参数是所有故障的活跃回滚段名称,以逗号分隔。重启数据后,就可以drop掉故障的undo表空间了。
SQL>drop tablespace undotbs1 including contents and datafiles; Tablespace dropped.
打开一个用户会话后,查看数据
--用户会话SQL>select*from test; ID NAME SALARY ---------- -------------------- ----------1 zhangsan 1200
用户提交的数据还在,数据没有丢失。
6 结论
活跃回滚段故障并不会丢失数据,在有数据库备份时恢复起来十分简单,在无数据库备份可用时恢复起来稍微复杂一点。