0、 背景
Oracle 从11.2.0.2开始引入了一个新特性叫做Redundant Interconnect,简称HAIP。HAIP的目的用来代替操作系统级别的网卡绑定以实现Active-Active的模式进行数据传输。一来可以实现传统操作系统网卡绑定带来的故障转移的功能,另一方面则可以更加充分利用其负载均衡的特性最大程度的减少因为gc等待带来的性能问题。
HAIP的历史可以追溯到Oracle 10g时代,那个时候CRS中就已经包含了HAIP的雏形. 在安装10g安装CRS的时候,在选择私有网络的时候可以选择多个私有网卡, 虽然官方文档中没有提及,但是很多Oracle的销售甚至工程师都宣称其可以提供高可用性,但实际测试中却往往不尽如人意。
从11.2.0.2这一功能开始得到完善,最终形成了HAIP。 我们可以看到在GI升级到11.2.0.2以后,会自动生成一个叫做ora.cluster_interconnect.haip的资源(这也是haip名字的来历),管理这个资源的是ohasd.bin进程。其对应的log位于$GRID_HOME/log//ohasd/ohasd.log 以及GRID_HOME/log//agent/ohasd/orarootagent_root/orarootagent_root.log这两个位置。在HAIP资源online以后,通过操作系统命令ifconfig -a就能查看到多了类似与eth0:1的虚拟网卡,HAIP地址为169.254.X.X, 当然也可以在数据库级别查看V_$CLUSTER_INTERCONNECTS视图HAIP的地址。HAIP对应的地址由系统自动分配,无法由用户手工进行指定。
由于HAIP使用的是169.254.X.X的地址段,所以在GI准备升级到11.2.0.2+以前都需要检查此地址段是否已经被占用,否则可能会遇到一些意想不到的情况,最终导致GI无法启动而整个升级则不得不失败告终。一个比较典型的情况是在IBM的IMM web interface就是使用这个地址段的。那么Oracle为啥偏偏要选择这个地址段呢?简单的回答就是因为169.254.X.X地址段是预留的ip地址段。 此地址段用于ip地址的autoconfiguration以及link localaddress, 并且早已经被RFC标准化——RFC3927。对这个感兴趣的读者可访问Zero configuration networking, Link local address和RFC3927来获取更多相关信息。
HAIP对于Oracle Clusterware以及RDBMS的要求很严格。这两者的版本都需要在11.2.0.2以上。简单的举个例子,有这么一种组合: GI的版本在11.2.0.3, RDBMS的版本在10.2.0.5。这种模式下虽然心跳网卡上会生成169.254.X.X地址段的虚地址,但实际上低于11.2.0.2的RDBMS是无法感知到这个地址的存在的,所以也无法使用HAIP提供的高可用特性。另外请注意在早期的HAIP的文档中并没有强制要求使用HAIP的私有网卡地址必须在同一个子网,结果导致了很多问题——尤其是在AIX平台,现在新版的文档/MOS已经对此明确要求。
值得一提的是早期的HAIP存在不少bug。在MOS note 11gR2 Grid Infrastructure Redundant Interconnect and ora.cluster_interconnect.haip [ID 1210883.1] 上提供了大多数已知的HAIP,请读者自行阅读,我这里简单的Bug号方便由于某些原因暂时无法登录MOS的读者:(注意其中两个note虽然并非HAIP本身的bug,但是同样与HAIP相关)
bug 12674817
Bug 10332426
Bug 10363902
Bug 10357258
Bug 10397652
Bug 10253028
Bug 9795321
Bug 11077756
Bug 12546712
Note 1366211.1
bug 10114953
Note 1447517.1
HAIP无法被禁用,当然某些不支持HAIP的平台例如Microsoft Windows除外。如果用户使用的是操作系统级别的绑定或者没有使用私网的绑定,则可以通过在rdbms和asm的init.ora/spfile中设置CLUSTER_INTECONNECTS指定私网地址将HAIP覆盖(如果有多个私网地址,请用英文:分隔),虽然说HAIP本身依然存在,但是ASM实例和RDBMS实例以后就不会使用HAIP。
就鄙人看来,HAIP到目前为止还太新,仅仅是一个“看上去很美”的特性,Oracle总是幻想实现一切不依赖于操作系统的内建高可用特性。但是和Oracle大多数看上去很诱人新特性一样,想法很好, 实现很糟糕。当然在11.2.0.4出来以后HAIP Bug基本会修复完,到时候再去使用HAIP也不迟,目前推荐使用操作系统级别的绑定理由只有一个——因为它更稳定可靠。下一篇将主要介绍OS级别的故障转移。
1. HAIP简介
Oracle从11.2.0.2开始引入了一个新特性网络冗余技术HAIP。HAIP的目的用来代替操作系统级别的网卡绑定以实现Active-Active的模式进行数据传输。一来可以实现传统操作系统网卡绑定带来的故障转移的功能,另一方面则可以更加充分利用其负载均衡的特性最大程度的减少因为gc等待带来的性能问题。
如果更多的网络适配器被指定,clusterware可以一次激活最多4个专用网络适配器。ora.cluster_interconnect.haip 将为Oracle RAC、Oracle ASM、Oracle ACFS等启用一至四个连接本地HAIP的互联通信网络适配器,注意,如果存在sun cluster,HAIT特性将在11.2.0.2中禁用。
Grid将自动选择连接本地保留地址169.254.*.*作为HAIP子网,并且不会尝试适用任何169.254.*.*地址,如果它已经被在用于其它目的使用。由于HAIP,在默认情况下,网络流量将被所有活动的网络接口负载均衡。并且如果其中一个失败或者变成不可连接状态,相应的HAIP地址将透明的转移到相对的其它网络适配器。
当Grid中启动集群中的第一个节点,HAIP地址数量是由有多少个私有网络适配器是活动状态所决定的。如果只有一个活跃的私有网络,那么Grid将创建一个,如果有两个,Grid将创建两个,如果大于两个,Grid将创建4个HAIPs.即使更多的私有网络适配器随后被激活,HAIPs的数量是不会改变的,要使得新的网络适配器变成活动状态,则要重启集群所有的节点。
[参考 http://blog.csdn.net/ora_unix/article/details/9420393 ]
[参考 MetaLink 1210883.1]
2. HAIP服务
由于HAIP是默认开启的,使用169.254.*.*私有地址,可以通过oifcfg和ifconfig来查看:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
|
[grid@racdb01 ~]$ oifcfg iflist
eth0 172.19.17.0
eth1 192.168.1.0
eth1 169.254.0.0
[grid@racdb01 ~]$ oifcfg getif
eth1 192.168.1.0 global cluster_interconnect
eth0 172.19.17.0 global public
[oracle@racdb03 ~]$ ifconfig
eth0 Link encap:Ethernet HWaddr 00:50:56:B2:19:64
inet addr:172.19.17.205 Bcast:172.19.17.255 Mask:255.255.255.0
inet6 addr: fe80::250:56ff:feb2:1964/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:1994 errors:0 dropped:0 overruns:0 frame:0
TX packets:477 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:183143 (178.8 KiB) TX bytes:72031 (70.3 KiB)
eth1 Link encap:Ethernet HWaddr 00:50:56:B2:2F:EE
inet addr:192.168.1.205 Bcast:192.168.1.255 Mask:255.255.255.0
inet6 addr: fe80::250:56ff:feb2:2fee/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:55266 errors:0 dropped:0 overruns:0 frame:0
TX packets:55911 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:25402212 (24.2 MiB) TX bytes:28256359 (26.9 MiB)
eth1:1 Link encap:Ethernet HWaddr 00:50:56:B2:2F:EE
inet addr:169.254.151.97 Bcast:169.254.255.255 Mask:255.255.0.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
eth2 Link encap:Ethernet HWaddr 00:50:56:B2:1F:AE
inet addr:192.168.1.215 Bcast:192.168.1.255 Mask:255.255.255.0
inet6 addr: fe80::250:56ff:feb2:1fae/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:327 errors:0 dropped:0 overruns:0 frame:0
TX packets:33 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:70358 (68.7 KiB) TX bytes:4841 (4.7 KiB)
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:6763 errors:0 dropped:0 overruns:0 frame:0
TX packets:6763 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:4258261 (4.0 MiB) TX bytes:4258261 (4.0 MiB)
|
在数据库和ASM实例的启动阶段,可以从alert_.log中找到:
Cluster communication is configured to use the following interface(s) for this instance
169.254.151.97
在数据库中通过gv$cluster_interconnects视图可以显示haip的信息
1
2
3
4
5
6
7
|
SQL> select * from gv$cluster_interconnects;
INST_ID NAME IP_ADDRESS IS_ SOURCE
---------- --------------- ---------------- --- -------------------------------
1 eth1:1 169.254.134.108 NO
3 eth1:1 169.254.151.97 NO
2 eth1:1 169.254.31.191 NO
|
而集群架构中的相关结构:
1
2
3
|
[grid@racdb01 ~]$ crsctl stat res -t -init | grep -1 ha
1 ONLINE ONLINE racdb01 Started
ora.cluster_interconnect.haip
|
3. 添加新的专用网络适配器
下面我们对racdb01, racdb02, racdb03三个节点继续添加一块网卡eth02(地址分配192.168.1.211,192.168.1.213,192.168.1.215),然后通过oifcfg将这块网卡添加到ocr中。
1
2
3
4
5
|
[root@racdb03 ~]# oifcfg setif -global eth2/192.168.1.0:cluster_interconnect
[root@racdb03 ~]# oifcfg getif
eth2 192.168.1.0 global cluster_interconnect
eth1 192.168.1.0 global cluster_interconnect
eth0 172.19.17.0 global public
|
然后分别重启三个节点的集群服务,HAIP会自动生效:
在三个节点分别做完配置,重启集群,HAIP自动生效。
1
2
3
4
5
6
7
8
9
10
|
SQL> select * from gv$cluster_interconnects;
INST_ID NAME IP_ADDRESS IS_ SOURCE
---------- --------------- ---------------- --- -------------------------------
3 eth1:1 169.254.79.164 NO
3 eth2:1 169.254.135.126 NO
2 eth1:1 169.254.42.101 NO
2 eth2:1 169.254.165.198 NO
1 eth1:1 169.254.30.165 NO
1 eth2:1 169.254.208.93 NO
|
4. 模拟网络故障
断开racdb03的eth1:
# ifconfig eth1 down
数据库和ASM的alert中并未出现任何报错,说明IP地址已经进行了透明的偏移。
[参考 http://www.luocs.com/archives/281.html]
1
2
3
4
5
6
7
8
9
10
11
12
13
14
|
[root@racdb03 log]# oifcfg iflist
eth0 172.19.17.0
eth1 192.168.1.0
eth2 192.168.1.0
eth2 169.254.128.0
eth2 169.254.0.0
#ip a
4: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
link/ether 00:50:56:b2:1f:ae brd ff:ff:ff:ff:ff:ff
inet 192.168.1.215/24 brd 192.168.1.255 scope global eth2
inet 169.254.135.126/17 brd 169.254.255.255 scope global eth2:1
inet 169.254.79.164/17 brd 169.254.127.255 scope global eth2:2
inet6 fe80::250:56ff:feb2:1fae/64 scope link
valid_lft forever preferred_lft forever
|
查看ohasd的日志$GRID_HOME/log/ohasd/ohasd.log
1
2
3
|
2013-08-12 13:45:34.441: [GIPCHGEN][2995967744]gipchaInterfaceFail: marking interface failing 0x7f56701b7b50 {host '', haName 'CLSFRAME_racdb-cluster', local (nil), ip '192.168.1.205:46662', subnet '192.168.1.0', mask'255.255.255.0', mac '00-50-56-b2-2f-ee', ifname 'eth1', numRef 0, numFail 0, idxBoot 0, flags 0x184d }
2013-08-12 13:45:34.714: [GIPCHGEN][3627513600]gipchaInterfaceDisable: disabling interface 0x7f56701b7b50 {host '', haName 'CLSFRAME_racdb-cluster', local (nil), ip '192.168.1.205:46662', subnet '192.168.1.0', mask'255.255.255.0', mac '00-50-56-b2-2f-ee', ifname 'eth1', numRef 0, numFail 0, idxBoot 0, flags 0x19cd }
2013-08-12 13:45:34.714: [GIPCHDEM][3627513600]gipchaWorkerCleanInterface: performing cleanup of disabledinterface 0x7f56701b7b50 { host '', haName 'CLSFRAME_racdb-cluster', local (nil), ip '192.168.1.205:46662', subnet'192.168.1.0', mask '255.255.255.0', mac '00-50-56-b2-2f-ee', ifname 'eth1', numRef 0, numFail 0, idxBoot 0, flags0x19ed }
|
查看agent日志$GRID_HOME/log/racdb03/agent/ohasd/orarootagent_root,可以看到moving ip ‘169.254.79.164’ from inf ‘eth1′ to inf ‘eth2’:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
|
2013-08-12 13:45:29.815: [ USRTHRD][3418347264]{0:0:2} failed to receive ARP request
2013-08-12 13:45:31.577: [ USRTHRD][3420448512]{0:0:2} HAIP: Updating member info HAIP1;192.168.1.0#0
2013-08-12 13:45:31.579: [ USRTHRD][3420448512]{0:0:2} HAIP: Moving ip '169.254.79.164' from inf 'eth1' toinf 'eth2'
2013-08-12 13:45:31.579: [ USRTHRD][3420448512]{0:0:2} pausing thread
2013-08-12 13:45:31.579: [ USRTHRD][3420448512]{0:0:2} posting thread
2013-08-12 13:45:31.579: [ USRTHRD][3420448512]{0:0:2} Thread:[NetHAWork]start {
2013-08-12 13:45:31.579: [ USRTHRD][3420448512]{0:0:2} Thread:[NetHAWork]start }
2013-08-12 13:45:31.579: [ USRTHRD][3395221248]{0:0:2} [NetHAWork] thread started
2013-08-12 13:45:31.579: [ USRTHRD][3395221248]{0:0:2} Arp::sCreateSocket {
2013-08-12 13:45:31.597: [ USRTHRD][3395221248]{0:0:2} Arp::sCreateSocket }
2013-08-12 13:45:32.097: [ USRTHRD][3395221248]{0:0:2} Starting Probe for ip 169.254.79.164
2013-08-12 13:45:32.097: [ USRTHRD][3395221248]{0:0:2} Transitioning to Probe State
2013-08-12 13:45:32.097: [ USRTHRD][3395221248]{0:0:2} Arp::sProbe {
2013-08-12 13:45:32.097: [ USRTHRD][3395221248]{0:0:2} Arp::sSend: sending type 1
2013-08-12 13:45:32.097: [ USRTHRD][3395221248]{0:0:2} Arp::sProbe }
2013-08-12 13:45:33.135: [ USRTHRD][3395221248]{0:0:2} Arp::sProbe {
2013-08-12 13:45:33.135: [ USRTHRD][3395221248]{0:0:2} Arp::sSend: sending type 1
2013-08-12 13:45:33.135: [ USRTHRD][3395221248]{0:0:2} Arp::sProbe }
2013-08-12 13:45:34.428: [ USRTHRD][3395221248]{0:0:2} Arp::sProbe {
2013-08-12 13:45:34.428: [ USRTHRD][3395221248]{0:0:2} Arp::sSend: sending type 1
2013-08-12 13:45:34.429: [ USRTHRD][3395221248]{0:0:2} Arp::sProbe }
2013-08-12 13:45:34.429: [ USRTHRD][3395221248]{0:0:2} Transitioning to Announce State
2013-08-12 13:45:36.425: [ USRTHRD][3395221248]{0:0:2} Arp::sAnnounce {
2013-08-12 13:45:36.425: [ USRTHRD][3395221248]{0:0:2} Arp::sSend: sending type 1
2013-08-12 13:45:36.425: [ USRTHRD][3395221248]{0:0:2} Arp::sAnnounce } :
|
这时打开eth1网卡来模拟地址恢复
ifconfig eth1 up
查看日志$GRID_HOME/log/racdb03/agent/ohasd/orarootagent_root,可以看到IP地址飘回:Moving ip ‘169.254.79.164’ from inf ‘eth2′ to inf ‘eth1′
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
|
2013-08-12 14:19:49.675: [ USRTHRD][3420448512]{0:0:2} HAIP: Updating member info HAIP1;192.168.1.0#0;192.168.1.0#1
2013-08-12 14:19:49.676: [ USRTHRD][3420448512]{0:0:2} HAIP: Moving ip '169.254.79.164' from inf 'eth2' toinf 'eth1'
2013-08-12 14:19:49.676: [ USRTHRD][3420448512]{0:0:2} pausing thread
2013-08-12 14:19:49.676: [ USRTHRD][3420448512]{0:0:2} posting thread
2013-08-12 14:19:49.676: [ USRTHRD][3420448512]{0:0:2} Thread:[NetHAWork]start {
2013-08-12 14:19:49.677: [ USRTHRD][3420448512]{0:0:2} Thread:[NetHAWork]start }
2013-08-12 14:19:49.677: [ USRTHRD][3418347264]{0:0:2} [NetHAWork] thread started
2013-08-12 14:19:49.677: [ USRTHRD][3418347264]{0:0:2} Arp::sCreateSocket {
2013-08-12 14:19:49.692: [ USRTHRD][3418347264]{0:0:2} Arp::sCreateSocket }
2013-08-12 14:19:50.192: [ USRTHRD][3418347264]{0:0:2} Starting Probe for ip 169.254.79.164
2013-08-12 14:19:50.192: [ USRTHRD][3418347264]{0:0:2} Transitioning to Probe State
2013-08-12 14:19:50.192: [ USRTHRD][3418347264]{0:0:2} Arp::sProbe {
2013-08-12 14:19:50.192: [ USRTHRD][3418347264]{0:0:2} Arp::sSend: sending type 1
2013-08-12 14:19:50.192: [ USRTHRD][3418347264]{0:0:2} Arp::sProbe }
2013-08-12 14:19:51.688: [ USRTHRD][3418347264]{0:0:2} Arp::sProbe {
2013-08-12 14:19:51.688: [ USRTHRD][3418347264]{0:0:2} Arp::sSend: sending type 1
2013-08-12 14:19:51.688: [ USRTHRD][3418347264]{0:0:2} Arp::sProbe }
2013-08-12 14:19:52.339: [ora.crf][3899438848]{0:0:2} [check] clsdmc_respget return: status=0, ecode=0
2013-08-12 14:19:52.339: [ora.crf][3899438848]{0:0:2} [check] Check return = 0, state detail = NULL
2013-08-12 14:19:52.713: [ USRTHRD][3418347264]{0:0:2} Arp::sProbe {
2013-08-12 14:19:52.713: [ USRTHRD][3418347264]{0:0:2} Arp::sSend: sending type 1
2013-08-12 14:19:52.713: [ USRTHRD][3418347264]{0:0:2} Arp::sProbe }
2013-08-12 14:19:52.713: [ USRTHRD][3418347264]{0:0:2} Transitioning to Announce State
2013-08-12 14:19:54.718: [ USRTHRD][3418347264]{0:0:2} Arp::sAnnounce {
2013-08-12 14:19:54.718: [ USRTHRD][3418347264]{0:0:2} Arp::sSend: sending type 1
2013-08-12 14:19:54.718: [ USRTHRD][3418347264]{0:0:2} Arp::sAnnounce }
[ clsdmc][3406784256]CLSDMC.C returnbuflen=8, extraDataBuf=E6, returnbuf=9801CEA0
2013-08-12 14:19:56.632: [ora.ctssd][3406784256]{0:0:2} [check] clsdmc_respget return: status=0, ecode=0,returnbuf=[0x7f8a9801cea0], buflen=8
2013-08-12 14:19:56.632: [ora.ctssd][3406784256]{0:0:2} [check] translateReturnCodes, return = 0, state detail= OBSERVERCheckcb data [0x7f8a9
801cea0]: mode[0xe6] offset[2 ms].
2013-08-12 14:19:56.720: [ USRTHRD][3418347264]{0:0:2} Arp::sAnnounce {
2013-08-12 14:19:56.720: [ USRTHRD][3418347264]{0:0:2} Arp::sSend: sending type 1
2013-08-12 14:19:56.720: [ USRTHRD][3418347264]{0:0:2} Arp::sAnnounce }
2013-08-12 14:19:56.720: [ USRTHRD][3418347264]{0:0:2} Transitioning to Defend State
2013-08-12 14:19:57.220: [ USRTHRD][3418347264]{0:0:2} VipActions::startIp {
2013-08-12 14:19:57.220: [ USRTHRD][3418347264]{0:0:2} Adding 169.254.79.164 on eth1:1
2013-08-12 14:19:57.220: [ USRTHRD][3418347264]{0:0:2} VipActions::startIp }
2013-08-12 14:19:57.221: [ USRTHRD][3418347264]{0:0:2} Assigned IP: 169.254.79.164 on interface eth1
2013-08-12 14:19:57.677: [ USRTHRD][3420448512]{0:0:2} Thread:[NetHAWork]stop {
2013-08-12 14:19:57.695: [ USRTHRD][3395221248]{0:0:2} [NetHAWork] thread stopping
2013-08-12 14:19:57.695: [ USRTHRD][3395221248]{0:0:2} Thread:[NetHAWork]isRunning is reset to false here
2013-08-12 14:19:57.695: [ USRTHRD][3420448512]{0:0:2} Thread:[NetHAWork]stop }
2013-08-12 14:19:57.695: [ USRTHRD][3420448512]{0:0:2} VipActions::stopIp {
2013-08-12 14:19:57.695: [ USRTHRD][3420448512]{0:0:2} NetInterface::sStopIp {
2013-08-12 14:19:57.695: [ USRTHRD][3420448512]{0:0:2} Stopping ip '169.254.79.164', inf 'eth2', mask'192.168.1.0'
2013-08-12 14:19:57.695: [ USRTHRD][3420448512]{0:0:2} Stopping ip 169.254.79.164 on inf eth2:2
2013-08-12 14:19:57.695: [ USRTHRD][3420448512]{0:0:2} NetInterface::sStopIp }
2013-08-12 14:19:57.695: [ USRTHRD][3420448512]{0:0:2} VipActions::stopIp }
2013-08-12 14:19:57.710: [ USRTHRD][3420448512]{0:0:2} IptoClean '169.254.0.1', ip '169.254.0.0', mask'255.255.128.0'
2013-08-12 14:19:57.710: [ USRTHRD][3420448512]{0:0:2} USING HAIP[ 0 ]: eth1 - 169.254.79.164
2013-08-12 14:19:57.710: [ USRTHRD][3420448512]{0:0:2} USING HAIP[ 1 ]: eth2 - 169.254.135.126
|
而cssd.log日志中有相应的adding interface information等内容。
说明: 整理于网络
本文转自 张冲andy 博客园博客,原文链接: http://www.cnblogs.com/andy6/p/7519197.html,如需转载请自行联系原作者