在《GoldenGate实现Live Standby主备库切换(1)》中我们介绍了如何针对GoldenGate Live standby环境执行计划内的Switchover切换。除去计划内的主备切换,实际生产中更多的故障切换发生在主机故障或主库不可用的情况下,这种情况下一般我们已经无法在Primary上停止应用及extract了;当我们在这样的情况下failover到Standby上后如同在DataGuard环境下一样即便Primary上的数据库恢复了我们也无法直接进行回切了,需要做的是重新配置Primary上的OGG并将Standby上的数据以initial load的形式还原回去,在数据重新同步后才能再切换到Primary上。下面我们就来介绍如何在计划外的情况下从主库failover到备库,并尝试回切:
1.
使用lag replicat命令了解standby上的replicat的延迟情况,若返回"At EOF (end of file)"则说明replicat已应用所有trail中的数据到备库上。
GGSCI (rh3.oracle.com) 1> info all

Program     Status      Group       Lag           Time Since Chkpt

MANAGER     RUNNING
EXTRACT     STOPPED     EXTSTD2     00:00:00      23:42:47
EXTRACT     STOPPED     PUMPSTD2    00:00:00      23:41:29
REPLICAT    RUNNING     REPSTD1     00:00:00      00:00:00

GGSCI (rh3.oracle.com) 5> lag replicat repstd1

Sending GETLAG request to REPLICAT REPSTD1 ...
Last record lag: 5 seconds.
At EOF, no more records to process.

2.
停止standby上的replicat
GGSCI (rh3.oracle.com) 6> stop replicat repstd1

Sending STOP request to REPLICAT REPSTD1 ...
Request processed.

3.
在standby上执行必要的赋予DML权限,启动triggers触发器和cascade delete约束的脚本

4.
启动standby上的extract,
在此之前先确认Standby上的data pump group不被启动,以保证trail文件堆积在standby上
GGSCI (rh3.oracle.com) 15> info all
Program     Status      Group       Lag           Time Since Chkpt
MANAGER     RUNNING
EXTRACT     STOPPED     EXTSTD2     00:00:00      24:04:16
EXTRACT     STOPPED     PUMPSTD2    00:00:00      24:02:57
REPLICAT    STOPPED     REPSTD1     00:00:00      00:00:06

GGSCI (rh3.oracle.com) 16> start extstd2
Sending START request to MANAGER ...
EXTRACT EXTSTD2 starting

5.
此时可以将应用切换到standby上了

==============================================================================
以上步骤完成了故障切换到Standby的过程,接下来我们尝试将应用还原到primary上
1.如果原primary主机已损毁则需要重装Oracle软件,并重建Primary系统上的Goldengate软件目录
2.从primary端启动GGSCI命令
3.删除primary上相关的extract及EXTTRAIL,并重建

GGSCI (rh2.oracle.com) 6> delete extract extstd1
Deleted EXTRACT EXTSTD1.

GGSCI (rh2.oracle.com) 7> delete exttrail /d01/ext/cl

GGSCI (rh2.oracle.com) 14> add extract extstd1,tranlog,begin now
EXTRACT added.

GGSCI (rh2.oracle.com) 15> add exttrail /d01/ext/cl,megabytes 100,extract extstd1
EXTTRAIL added.

4.
在primary上启动Manager
GGSCI (rh2.oracle.com) 18> start Manager
Manager started.

5.
接着在primary上执行disable trigger触发器和cascade delete约束的脚本

6.
在standby上对执行热备份(逻辑,物理的均可);并记录该热备的结束时间

7.
使用standby上的热备份来完成primary上的initial load后,再以HANDLECOLLISIONS选项启动Standby上的replicat
GGSCI (rh2.oracle.com) 22> view params repstd2

-- Identify the Replicat group:
REPLICAT repstd2
-- State that source and target definitions are identical:
ASSUMETARGETDEFS
-- Specify database login information as needed for the database:
userid maclean, password maclean
HANDLECOLLISIONS
-- Specify tables for delivery:
MAP clinic.*, TARGET clinic.*;
-- Exclude specific tables from delivery if needed:
-- MAPEXCLUDE 


GGSCI (rh2.oracle.com) 23> start replicat repstd2

Sending START request to MANAGER ...
REPLICAT REPSTD2 starting

8.并启动standby上的data pump group,将堆积的trail文件传输到Primary上
GGSCI (rh3.oracle.com) 19> start pumpstd2
Sending START request to MANAGER ...
EXTRACT PUMPSTD2 starting

9.使用info replicat观察primary上的replicat,观察其进度是否已晚于完成初始化导出的时间


10.
禁用primary上目前使用的HANDLECOLLISIONS选项
GGSCI (rh2.oracle.com) 26> send replicat repstd2,NOHANDLECOLLISIONS

11.
关闭之前切换到Standby上的一切应用

12.
12.若需要进行数据验证则关闭Standby上的extract、pump及Primary上的replicat:
GGSCI (rh2.oracle.com) 31> lag replicat repstd2
Sending GETLAG request to REPLICAT REPSTD2 ...
Last record lag: 3 seconds.
At EOF, no more records to process.

GGSCI (rh3.oracle.com) 28> stop extstd2
Sending STOP request to EXTRACT EXTSTD2 ...
Request processed.


GGSCI (rh3.oracle.com) 26> stop pumpstd2
Sending STOP request to EXTRACT PUMPSTD2 ...
Request processed.

GGSCI (rh2.oracle.com) 34> stop replicat repstd2
Sending STOP request to REPLICAT REPSTD2 ...
Request processed.

/* 使用Oracle GoldenGate Veridata等工具验证数据一致性,
   若不一致则修复
*/

standby库上:
SQL> select sum(t2) from tv;

   SUM(T2)
----------
5355944997

primary库上:
SQL> select sum(t2) from tv;

   SUM(T2)
----------
5355944997

13.在primary系统上赋予应用相关DML权限,启用触发器及删除约束

14.
修改primary系统上的extract group的begin time为当前,启动Primary到Standby的extract、pump及replicat
GGSCI (rh2.oracle.com) 36> alter extstd1 ,begin now
EXTRACT altered.


GGSCI (rh2.oracle.com) 52> start extract extstd1

Sending START request to MANAGER ...
EXTRACT EXTSTD1 starting


GGSCI (rh2.oracle.com) 53> start extract pumpstd1

Sending START request to MANAGER ...
EXTRACT PUMPSTD1 starting


GGSCI (rh3.oracle.com) 3> start repstd1

Sending START request to MANAGER ...
REPLICAT REPSTD1 starting

此时系统切换回原始的primary->standby状态.
That's great!
.table>