在Data Guard环境中,主备库基本都是使用归档来传递数据的变化。如果主备的归档传输中断,同时主库的归档被删除或者损坏,这种情况下备库是没法开始继续接收归档,应用新的数据变更了。
看到网友 paulyibin的文章中提到了SCN恢复的想法,感觉非常有意思,明白了思路,自己在本地也测试了一把,发现真是有趣。
一般来说,主库的归档丢失,常规的思路只能是重建备库了。其实我们可以换一个角度来看这个问题,数据的变化在归档中是一个连续的过程,而在日志文件,数据文件中则是一个状态。我们可以直接通过物理增量备份的方式来恢复得到一个增量的数据变更结果集,在备库直接应用即可,这个增量数据集中是包含了归档日志中的数据变更,只是表现形式会有所不同。
所以明白了这一点,我们就来实践看看主库中归档缺失的情况下,还是可以无需重建备库而同步增量的数据变更。
主库的状态如下:
SQL> select open_mode from v$database;
OPEN_MODE
--------------------
READ WRITE
首先来得到一个基本的SCN值作为标记。
SQL> SELECT CURRENT_SCN FROM V$DATABASE;
CURRENT_SCN
-----------
3213758
然后查看备库的SCN
SQL> SELECT CURRENT_SCN FROM V$DATABASE;
CURRENT_SCN
-----------
3213745
这个时候直接在备库做类似断电的操作。
SQL> shutdown abort
ORACLE instance shut down.
这个时候主备之间的归档传输就会停止,我们开始在主库中做一个数据变更,切换日志,保证主备端存在较多的数据变更
主库中切换日志:
SQL> alter system switch logfile;
System altered.
SQL> alter system switch logfile;
System altered.
然后新建一个表,可以作为标记。
SQL> create table test as select *from all_objects;
Table created.
然后再次切换日志
SQL> SQL> alter system switch logfile;
System altered.
好了主库的变更就做好了,我们看看主库的归档情况:
-rw-r----- 1 oracle oinstall 22546944 Jun 3 22:05 o1_mf_1_40_co33ob7r_.arc
-rw-r----- 1 oracle oinstall 41472 Jun 3 22:05 o1_mf_1_41_co33of81_.arc
-rw-r----- 1 oracle oinstall 3074560 Jun 3 22:11 o1_mf_1_42_co3402tz_.arc
-rw-r----- 1 oracle oinstall 2099200 Jun 3 22:15 o1_mf_1_43_co348mjq_.arc
-rw-r----- 1 oracle oinstall 3072 Jun 3 22:15 o1_mf_1_44_co348qxg_.arc
-rw-r----- 1 oracle oinstall 8830464 Jun 3 22:16 o1_mf_1_45_co3495t0_.arc
根据备库的日志,日志序列号44的归档肯定是没有应用到备库的,我们来手工修改一下归档名称,让它无法在备库应用。
修改归档名称后,序列号44的归档就很醒目了。看名字肯定是应用不到备库的。
[oracle@BX_133_45 2016_06_03]$ ll
total 35748
-rw-r----- 1 oracle oinstall 22546944 Jun 3 22:05 o1_mf_1_40_co33ob7r_.arc
-rw-r----- 1 oracle oinstall 41472 Jun 3 22:05 o1_mf_1_41_co33of81_.arc
-rw-r----- 1 oracle oinstall 3074560 Jun 3 22:11 o1_mf_1_42_co3402tz_.arc
-rw-r----- 1 oracle oinstall 2099200 Jun 3 22:15 o1_mf_1_43_co348mjq_.arc
-rw-r----- 1 oracle oinstall 3072 Jun 3 22:15 o1_mf_1_44_co348qxg_.arc.bak
-rw-r----- 1 oracle oinstall 8830464 Jun 3 22:16 o1_mf_1_45_co3495t0_.arc
然后在备库操作
启动备库 STARTUP
备库的日志如下,可以看到备库在接收应用归档44的时候,发现了GAP,但是却无法修复。
All non-current ORLs have been archived.
Media Recovery Log /U01/app/oracle/fast_recovery_area/DGTEST2/archivelog/2016_06_03/o1_mf_1_43_co34kd5o_.arc
Media Recovery Waiting for thread 1 sequence 44
Fetching gap sequence in thread 1, gap sequence 44-44
Completed: ALTER DATABASE RECOVER MANAGED STANDBY DATABASE THROUGH ALL SWITCHOVER DISCONNECT USING CURRENT LOGFILE
Fri Jun 03 22:22:47 2016
FAL[client]: Failed to request gap sequence
GAP - thread 1 sequence 44-44
DBID 3866107499 branch 909583854
FAL[client]: All defined FAL servers have been attempted.
------------------------------------------------------------
Check that the CONTROL_FILE_RECORD_KEEP_TIME initialization
parameter is defined to a value that's sufficiently large
enough to maintain adequate log switch information to resolve
archivelog gaps.
好了,这个时候我们来开始修复这个问题,查看备库的SCN情况。
SQL> SELECT CURRENT_SCN FROM V$DATABASE;
CURRENT_SCN
-----------
3214028
我们就以备库的这个SCN为基础,从主库导出一个增量备份。
主库:
[oracle@BX_133_45 2016_06_03]$ rman target /
connected to target database: DGTEST (DBID=3866107499)
RMAN> BACKUP INCREMENTAL FROM SCN 3214028 DATABASE FORMAT '/home/oracle/ForStandby_%U' tag 'FORSTANDBY';
Starting backup at 03-JUN-16
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=225 device type=DISK
backup will be obsolete on date 10-JUN-16
archived logs will not be kept or backed up
channel ORA_DISK_1: starting full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00001 name=/U01/app/oracle/oradata/dgtest/system01.dbf
input datafile file number=00002 name=/U01/app/oracle/oradata/dgtest/sysaux01.dbf
input datafile file number=00003 name=/U01/app/oracle/oradata/dgtest/undotbs01.dbf
input datafile file number=00004 name=/U01/app/oracle/oradata/dgtest/users01.dbf
channel ORA_DISK_1: starting piece 1 at 03-JUN-16
channel ORA_DISK_1: finished piece 1 at 03-JUN-16
piece handle=/home/oracle/ForStandby_07r78fk0_1_1 tag=FORSTANDBY comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:01
。。。
piece handle=/home/oracle/ForStandby_08r78fk2_1_1 tag=FORSTANDBY comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:01
Finished backup at 03-JUN-16
传输备份到备库,恢复
中间插播一个小插曲。
这是一个11g的备库,所以取消日志应用后,备库就成为了只读状态。
备库:
SQL> recover managed standby database cancel;
Media recovery complete.
SQL> select open_mode from v$database;
OPEN_MODE
----------------------------------------
READ ONLY
我们开启rman日志恢复会提示下面的奇怪问题。
[oracle@WEB_YQ_64.48 ~]$rman target /
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-00554: initialization of internal recovery manager package failed
RMAN-04005: error from target database:
ORA-06553: PLS-801: internal error [56327]
RMAN-04015: error setting target database character set to ZHS16GBK
其实解决方法就是重启至mount状态,然后恢复就没有问题了。
[oracle@WEB_YQ_64.48 ~]$rman target /
connected to target database: DGTEST (DBID=3866107499, not open)
RMAN> CATALOG START WITH '/home/oracle/tmp';
using target database control file instead of recovery catalog
searching for all files that match the pattern /home/oracle/tmp
List of Files Unknown to the Database
=====================================
File Name: /home/oracle/tmp/ForStandby_07r78fk0_1_1
File Name: /home/oracle/tmp/ForStandby_08r78fk2_1_1
Do you really want to catalog the above files (enter YES or NO)? yes
cataloging files...
cataloging done
List of Cataloged Files
=======================
File Name: /home/oracle/tmp/ForStandby_07r78fk0_1_1
File Name: /home/oracle/tmp/ForStandby_08r78fk2_1_1
RMAN> recover database noredo ;
Starting recover at 2016-06-03 22:30:33
allocated channel: ORA_DISK_1
。。。
channel ORA_DISK_1: restore complete, elapsed time: 00:00:03
Finished recover at 2016-06-03 22:30:38
恢复的过程中,可以看到alert.log的输出如下:
Fri Jun 03 22:30:35 2016
Incremental restore complete of datafile 4 /U01/app/oracle/oradata/dgtest/users01.dbf
checkpoint is 3215512
last deallocation scn is 3
Incremental restore complete of datafile 3 /U01/app/oracle/oradata/dgtest/undotbs01.dbf
checkpoint is 3215512
last deallocation scn is 3
Incremental restore complete of datafile 2 /U01/app/oracle/oradata/dgtest/sysaux01.dbf
checkpoint is 3215512
last deallocation scn is 995211
Incremental restore complete of datafile 1 /U01/app/oracle/oradata/dgtest/system01.dbf
checkpoint is 3215512
last deallocation scn is 993074
SCN也在递增,而恢复成功之后的,控制文件还是以前的。 所以查看SCN的还是恢复前的状态。
SQL>SELECT CURRENT_SCN FROM V$DATABASE
CURRENT_SCN
-----------
3214028
解决方法也很简单,直接从主库生成控制文件,拷贝到备库即可。
主库:
SQL> alter database create standby controlfile as '/home/oracle/std_con01.ctl';
Database altered.
然后在备库应用即可,当然备库需要在nomount状态
startup nomount
RMAN> RESTORE STANDBY CONTROLFILE FROM '/home/oracle/tmp/std_con01.ctl';
Starting restore at 2016-06-03 22:39:20
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=189 device type=DISK
channel ORA_DISK_1: copied control file copy
output file name=/U01/app/oracle/oradata/dgtest/control01.ctl
output file name=/U01/app/oracle/fast_recovery_area/dgtest/control02.ctl
Finished restore at 2016-06-03 22:39:21
再次查看SCN就是一个相对较高的值了。
SQL> SELECT CURRENT_SCN FROM V$DATABASE;
CURRENT_SCN
-----------
3223400
而这个时候查看alert.log的输出就会发现,日志会从序列号47开始应用,直接跳过了44,45,46三个归档。
Media Recovery Waiting for thread 1 sequence 47
Fetching gap sequence in thread 1, gap sequence 47-48
Completed: ALTER DATABASE RECOVER MANAGED STANDBY DATABASE THROUGH ALL SWITCHOVER DISCONNECT USING CURRENT LOGFILE
Fri Jun 03 22:40:12 2016
RFS[3]: Assigned to RFS process 29087
RFS[3]: Opened log for thread 1 sequence 47 dbid -428859797 branch 909583854
Fri Jun 03 22:40:12 2016
RFS[4]: Assigned to RFS process 29089
RFS[4]: Opened log for thread 1 sequence 48 dbid -428859797 branch 909583854
Archived Log entry 2 added for thread 1 sequence 47 rlc 909583854 ID 0xe6703c6b dest 2:
Media Recovery Log /U01/app/oracle/fast_recovery_area/DGTEST2/archivelog/2016_06_03/o1_mf_1_47_co35pdcj_.arc
Media Recovery Waiting for thread 1 sequence 48 (in transit)
Archived Log entry 3 added for thread 1 sequence 48 rlc 909583854 ID 0xe6703c6b dest 2:
Media Recovery Log /U01/app/oracle/fast_recovery_area/DGTEST2/archivelog/2016_06_03/o1_mf_1_48_co35pdgw_.arc
Fri Jun 03 22:40:20 2016
Media Recovery Log /U01/app/oracle/fast_recovery_area/DGTEST2/archivelog/2016_06_03/o1_mf_1_49_co35p231_.arc
Media Recovery Waiting for thread 1 sequence 50 (in transit)
至此,整个恢复的过程就顺利完成了,如果是一个TB级的数据库出现此类问题,我们就可以避免重建备库,而使用这些小技巧即可从繁琐中解放出来。
看到网友 paulyibin的文章中提到了SCN恢复的想法,感觉非常有意思,明白了思路,自己在本地也测试了一把,发现真是有趣。
一般来说,主库的归档丢失,常规的思路只能是重建备库了。其实我们可以换一个角度来看这个问题,数据的变化在归档中是一个连续的过程,而在日志文件,数据文件中则是一个状态。我们可以直接通过物理增量备份的方式来恢复得到一个增量的数据变更结果集,在备库直接应用即可,这个增量数据集中是包含了归档日志中的数据变更,只是表现形式会有所不同。
所以明白了这一点,我们就来实践看看主库中归档缺失的情况下,还是可以无需重建备库而同步增量的数据变更。
主库的状态如下:
SQL> select open_mode from v$database;
OPEN_MODE
--------------------
READ WRITE
首先来得到一个基本的SCN值作为标记。
SQL> SELECT CURRENT_SCN FROM V$DATABASE;
CURRENT_SCN
-----------
3213758
然后查看备库的SCN
SQL> SELECT CURRENT_SCN FROM V$DATABASE;
CURRENT_SCN
-----------
3213745
这个时候直接在备库做类似断电的操作。
SQL> shutdown abort
ORACLE instance shut down.
这个时候主备之间的归档传输就会停止,我们开始在主库中做一个数据变更,切换日志,保证主备端存在较多的数据变更
主库中切换日志:
SQL> alter system switch logfile;
System altered.
SQL> alter system switch logfile;
System altered.
然后新建一个表,可以作为标记。
SQL> create table test as select *from all_objects;
Table created.
然后再次切换日志
SQL> SQL> alter system switch logfile;
System altered.
好了主库的变更就做好了,我们看看主库的归档情况:
-rw-r----- 1 oracle oinstall 22546944 Jun 3 22:05 o1_mf_1_40_co33ob7r_.arc
-rw-r----- 1 oracle oinstall 41472 Jun 3 22:05 o1_mf_1_41_co33of81_.arc
-rw-r----- 1 oracle oinstall 3074560 Jun 3 22:11 o1_mf_1_42_co3402tz_.arc
-rw-r----- 1 oracle oinstall 2099200 Jun 3 22:15 o1_mf_1_43_co348mjq_.arc
-rw-r----- 1 oracle oinstall 3072 Jun 3 22:15 o1_mf_1_44_co348qxg_.arc
-rw-r----- 1 oracle oinstall 8830464 Jun 3 22:16 o1_mf_1_45_co3495t0_.arc
根据备库的日志,日志序列号44的归档肯定是没有应用到备库的,我们来手工修改一下归档名称,让它无法在备库应用。
修改归档名称后,序列号44的归档就很醒目了。看名字肯定是应用不到备库的。
[oracle@BX_133_45 2016_06_03]$ ll
total 35748
-rw-r----- 1 oracle oinstall 22546944 Jun 3 22:05 o1_mf_1_40_co33ob7r_.arc
-rw-r----- 1 oracle oinstall 41472 Jun 3 22:05 o1_mf_1_41_co33of81_.arc
-rw-r----- 1 oracle oinstall 3074560 Jun 3 22:11 o1_mf_1_42_co3402tz_.arc
-rw-r----- 1 oracle oinstall 2099200 Jun 3 22:15 o1_mf_1_43_co348mjq_.arc
-rw-r----- 1 oracle oinstall 3072 Jun 3 22:15 o1_mf_1_44_co348qxg_.arc.bak
-rw-r----- 1 oracle oinstall 8830464 Jun 3 22:16 o1_mf_1_45_co3495t0_.arc
然后在备库操作
启动备库 STARTUP
备库的日志如下,可以看到备库在接收应用归档44的时候,发现了GAP,但是却无法修复。
All non-current ORLs have been archived.
Media Recovery Log /U01/app/oracle/fast_recovery_area/DGTEST2/archivelog/2016_06_03/o1_mf_1_43_co34kd5o_.arc
Media Recovery Waiting for thread 1 sequence 44
Fetching gap sequence in thread 1, gap sequence 44-44
Completed: ALTER DATABASE RECOVER MANAGED STANDBY DATABASE THROUGH ALL SWITCHOVER DISCONNECT USING CURRENT LOGFILE
Fri Jun 03 22:22:47 2016
FAL[client]: Failed to request gap sequence
GAP - thread 1 sequence 44-44
DBID 3866107499 branch 909583854
FAL[client]: All defined FAL servers have been attempted.
------------------------------------------------------------
Check that the CONTROL_FILE_RECORD_KEEP_TIME initialization
parameter is defined to a value that's sufficiently large
enough to maintain adequate log switch information to resolve
archivelog gaps.
好了,这个时候我们来开始修复这个问题,查看备库的SCN情况。
SQL> SELECT CURRENT_SCN FROM V$DATABASE;
CURRENT_SCN
-----------
3214028
我们就以备库的这个SCN为基础,从主库导出一个增量备份。
主库:
[oracle@BX_133_45 2016_06_03]$ rman target /
connected to target database: DGTEST (DBID=3866107499)
RMAN> BACKUP INCREMENTAL FROM SCN 3214028 DATABASE FORMAT '/home/oracle/ForStandby_%U' tag 'FORSTANDBY';
Starting backup at 03-JUN-16
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=225 device type=DISK
backup will be obsolete on date 10-JUN-16
archived logs will not be kept or backed up
channel ORA_DISK_1: starting full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00001 name=/U01/app/oracle/oradata/dgtest/system01.dbf
input datafile file number=00002 name=/U01/app/oracle/oradata/dgtest/sysaux01.dbf
input datafile file number=00003 name=/U01/app/oracle/oradata/dgtest/undotbs01.dbf
input datafile file number=00004 name=/U01/app/oracle/oradata/dgtest/users01.dbf
channel ORA_DISK_1: starting piece 1 at 03-JUN-16
channel ORA_DISK_1: finished piece 1 at 03-JUN-16
piece handle=/home/oracle/ForStandby_07r78fk0_1_1 tag=FORSTANDBY comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:01
。。。
piece handle=/home/oracle/ForStandby_08r78fk2_1_1 tag=FORSTANDBY comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:01
Finished backup at 03-JUN-16
传输备份到备库,恢复
中间插播一个小插曲。
这是一个11g的备库,所以取消日志应用后,备库就成为了只读状态。
备库:
SQL> recover managed standby database cancel;
Media recovery complete.
SQL> select open_mode from v$database;
OPEN_MODE
----------------------------------------
READ ONLY
我们开启rman日志恢复会提示下面的奇怪问题。
[oracle@WEB_YQ_64.48 ~]$rman target /
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-00554: initialization of internal recovery manager package failed
RMAN-04005: error from target database:
ORA-06553: PLS-801: internal error [56327]
RMAN-04015: error setting target database character set to ZHS16GBK
其实解决方法就是重启至mount状态,然后恢复就没有问题了。
[oracle@WEB_YQ_64.48 ~]$rman target /
connected to target database: DGTEST (DBID=3866107499, not open)
RMAN> CATALOG START WITH '/home/oracle/tmp';
using target database control file instead of recovery catalog
searching for all files that match the pattern /home/oracle/tmp
List of Files Unknown to the Database
=====================================
File Name: /home/oracle/tmp/ForStandby_07r78fk0_1_1
File Name: /home/oracle/tmp/ForStandby_08r78fk2_1_1
Do you really want to catalog the above files (enter YES or NO)? yes
cataloging files...
cataloging done
List of Cataloged Files
=======================
File Name: /home/oracle/tmp/ForStandby_07r78fk0_1_1
File Name: /home/oracle/tmp/ForStandby_08r78fk2_1_1
RMAN> recover database noredo ;
Starting recover at 2016-06-03 22:30:33
allocated channel: ORA_DISK_1
。。。
channel ORA_DISK_1: restore complete, elapsed time: 00:00:03
Finished recover at 2016-06-03 22:30:38
恢复的过程中,可以看到alert.log的输出如下:
Fri Jun 03 22:30:35 2016
Incremental restore complete of datafile 4 /U01/app/oracle/oradata/dgtest/users01.dbf
checkpoint is 3215512
last deallocation scn is 3
Incremental restore complete of datafile 3 /U01/app/oracle/oradata/dgtest/undotbs01.dbf
checkpoint is 3215512
last deallocation scn is 3
Incremental restore complete of datafile 2 /U01/app/oracle/oradata/dgtest/sysaux01.dbf
checkpoint is 3215512
last deallocation scn is 995211
Incremental restore complete of datafile 1 /U01/app/oracle/oradata/dgtest/system01.dbf
checkpoint is 3215512
last deallocation scn is 993074
SCN也在递增,而恢复成功之后的,控制文件还是以前的。 所以查看SCN的还是恢复前的状态。
SQL>SELECT CURRENT_SCN FROM V$DATABASE
CURRENT_SCN
-----------
3214028
解决方法也很简单,直接从主库生成控制文件,拷贝到备库即可。
主库:
SQL> alter database create standby controlfile as '/home/oracle/std_con01.ctl';
Database altered.
然后在备库应用即可,当然备库需要在nomount状态
startup nomount
RMAN> RESTORE STANDBY CONTROLFILE FROM '/home/oracle/tmp/std_con01.ctl';
Starting restore at 2016-06-03 22:39:20
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=189 device type=DISK
channel ORA_DISK_1: copied control file copy
output file name=/U01/app/oracle/oradata/dgtest/control01.ctl
output file name=/U01/app/oracle/fast_recovery_area/dgtest/control02.ctl
Finished restore at 2016-06-03 22:39:21
再次查看SCN就是一个相对较高的值了。
SQL> SELECT CURRENT_SCN FROM V$DATABASE;
CURRENT_SCN
-----------
3223400
而这个时候查看alert.log的输出就会发现,日志会从序列号47开始应用,直接跳过了44,45,46三个归档。
Media Recovery Waiting for thread 1 sequence 47
Fetching gap sequence in thread 1, gap sequence 47-48
Completed: ALTER DATABASE RECOVER MANAGED STANDBY DATABASE THROUGH ALL SWITCHOVER DISCONNECT USING CURRENT LOGFILE
Fri Jun 03 22:40:12 2016
RFS[3]: Assigned to RFS process 29087
RFS[3]: Opened log for thread 1 sequence 47 dbid -428859797 branch 909583854
Fri Jun 03 22:40:12 2016
RFS[4]: Assigned to RFS process 29089
RFS[4]: Opened log for thread 1 sequence 48 dbid -428859797 branch 909583854
Archived Log entry 2 added for thread 1 sequence 47 rlc 909583854 ID 0xe6703c6b dest 2:
Media Recovery Log /U01/app/oracle/fast_recovery_area/DGTEST2/archivelog/2016_06_03/o1_mf_1_47_co35pdcj_.arc
Media Recovery Waiting for thread 1 sequence 48 (in transit)
Archived Log entry 3 added for thread 1 sequence 48 rlc 909583854 ID 0xe6703c6b dest 2:
Media Recovery Log /U01/app/oracle/fast_recovery_area/DGTEST2/archivelog/2016_06_03/o1_mf_1_48_co35pdgw_.arc
Fri Jun 03 22:40:20 2016
Media Recovery Log /U01/app/oracle/fast_recovery_area/DGTEST2/archivelog/2016_06_03/o1_mf_1_49_co35p231_.arc
Media Recovery Waiting for thread 1 sequence 50 (in transit)
至此,整个恢复的过程就顺利完成了,如果是一个TB级的数据库出现此类问题,我们就可以避免重建备库,而使用这些小技巧即可从繁琐中解放出来。