系统环境:
操作系统:RedHat EL5
Cluster: Oracle GI(Grid Infrastructure)
Oracle: Oracle 11.2.0.1.0
如图所示:RAC 系统架构
对于Oracle 11G构建RAC首先需要构建GI(Grid Infrastructure)的架构
案例分析:
在Oracle 11gR2 RAC在添加新节点时,添加grid,在node3上执行root.sh时,出现以下错误:
[root@lxh3 ~]# /u01/11.2.0/grid/root.sh
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
|
Running Oracle 11g root.sh script...
The following environment variables are set
as
:
ORACLE_OWNER= grid
ORACLE_HOME= /u01/
11.2.
0
/grid
Enter the full pathname of the local bin directory: [/usr/local/bin]:
Copying dbhome to /usr/local/bin ...
Copying oraenv to /usr/local/bin ...
Copying coraenv to /usr/local/bin ...
Creating /etc/oratab file...
Entries will be added to the /etc/oratab file
as
needed by
Database Configuration Assistant when a database is created
Finished running generic part of root.sh script.
Now product-specific root actions will be performed.
2014
-07
-10
15
:
00
:
36
: Parsing the host name
2014
-07
-10
15
:
00
:
36
: Checking
for
super
user privileges
2014
-07
-10
15
:
00
:
36
: User has
super
user privileges
Using configuration parameter file: /u01/
11.2.
0
/grid/crs/install/crsconfig_params
Creating trace directory
LOCAL ADD MODE
Creating OCR keys
for
user
'root'
, privgrp
'root'
..
Operation successful.
Adding daemon to inittab
CRS
-4123
: Oracle High Availability Services has been started.
ohasd is starting
CRS
-4402
: The CSS daemon was started
in
exclusive
mode but found an active CSS daemon
on
node lxh1, number
1
,
and
is terminating
An active cluster was found during
exclusive
startup, restarting to join the cluster
CRS
-2672
: Attempting to start
'ora.mdnsd'
on
'lxh3'
CRS
-2676
: Start of
'ora.mdnsd'
on
'lxh3'
succeeded
CRS
-2672
: Attempting to start
'ora.gipcd'
on
'lxh3'
CRS
-2676
: Start of
'ora.gipcd'
on
'lxh3'
succeeded
CRS
-2672
: Attempting to start
'ora.gpnpd'
on
'lxh3'
CRS
-2676
: Start of
'ora.gpnpd'
on
'lxh3'
succeeded
CRS
-2672
: Attempting to start
'ora.cssdmonitor'
on
'lxh3'
CRS
-2676
: Start of
'ora.cssdmonitor'
on
'lxh3'
succeeded
CRS
-2672
: Attempting to start
'ora.cssd'
on
'lxh3'
CRS
-2672
: Attempting to start
'ora.diskmon'
on
'lxh3'
CRS
-2676
: Start of
'ora.diskmon'
on
'lxh3'
succeeded
CRS
-2676
: Start of
'ora.cssd'
on
'lxh3'
succeeded
CRS
-2672
: Attempting to start
'ora.ctssd'
on
'lxh3'
CRS
-2676
: Start of
'ora.ctssd'
on
'lxh3'
succeeded
CRS
-2672
: Attempting to start
'ora.drivers.acfs'
on
'lxh3'
CRS
-2676
: Start of
'ora.drivers.acfs'
on
'lxh3'
succeeded
CRS
-2672
: Attempting to start
'ora.asm'
on
'lxh3'
CRS
-2676
: Start of
'ora.asm'
on
'lxh3'
succeeded
CRS
-2672
: Attempting to start
'ora.crsd'
on
'lxh3'
CRS
-2676
: Start of
'ora.crsd'
on
'lxh3'
succeeded
CRS
-2672
: Attempting to start
'ora.evmd'
on
'lxh3'
CRS
-2676
: Start of
'ora.evmd'
on
'lxh3'
succeeded
Timed out waiting
for
the CRS stack to start.
|
CRSD服务启动失败!
查看日志:
[root@lxh3 crsd]# more crsdOUT.log
2014-07-10 15:03:56
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
|
Changing directory to /u01/
11.2.
0
/grid/log/lxh3/crsd
2014
-07
-10
15
:
03
:
56
CRSD REBOOT
CRSD exiting: Could
not
init
OCR, code:
442014
-07
-10
15
:
03
:
58
Changing directory to /u01/
11.2.
0
/grid/log/lxh3/crsd
2014
-07
-10
15
:
03
:
58
CRSD REBOOT
CRSD exiting: Could
not
init
OCR, code:
442014
-07
-10
15
:
04
:
00
Changing directory to /u01/
11.2.
0
/grid/log/lxh3/crsd
2014
-07
-10
15
:
04
:
00
CRSD REBOOT
CRSD exiting: Could
not
init
OCR, code:
442014
-07
-10
15
:
04
:
02
Changing directory to /u01/
11.2.
0
/grid/log/lxh3/crsd
2014
-07
-10
15
:
04
:
02
CRSD REBOOT
CRSD exiting: Could
not
init
OCR, code:
442014
-07
-10
15
:
04
:
04
Changing directory to /u01/
11.2.
0
/grid/log/lxh3/crsd
2014
-07
-10
15
:
04
:
04
CRSD REBOOT
CRSD exiting: Could
not
init
OCR, code:
442014
-07
-10
15
:
04
:
06
Changing directory to /u01/
11.2.
0
/grid/log/lxh3/crsd
2014
-07
-10
15
:
04
:
06
CRSD REBOOT
CRSD exiting: Could
not
init
OCR, code:
442014
-07
-10
15
:
04
:
08
Changing directory to /u01/
11.2.
0
/grid/log/lxh3/crsd
2014
-07
-10
15
:
04
:
08
CRSD REBOOT
CRSD exiting: Could
not
init
OCR, code:
442014
-07
-10
15
:
04
:
10
Changing directory to /u01/
11.2.
0
/grid/log/lxh3/crsd
2014
-07
-10
15
:
04
:
10
CRSD REBOOT
CRSD exiting: Could
not
init
OCR, code:
442014
-07
-10
15
:
04
:
12
Changing directory to /u01/
11.2.
0
/grid/log/lxh3/crsd
2014
-07
-10
15
:
04
:
12
CRSD REBOOT
CRSD exiting: Could
not
init
OCR, code:
442014
-07
-10
15
:
04
:
14
Changing directory to /u01/
11.2.
0
/grid/log/lxh3/crsd
2014
-07
-10
15
:
04
:
14
CRSD REBOOT
CRSD exiting: Could
not
init
OCR, code:
442014
-07
-10
15
:
04
:
16
Changing directory to /u01/
11.2.
0
/grid/log/lxh3/crsd
2014
-07
-10
15
:
04
:
16
CRSD REBOOT
CRSD exiting: Could
not
init
OCR, code:
44
|
[root@lxh3 crsd]# tail crsd.log
1
2
3
4
5
6
7
8
9
|
2014
-07
-10
15
:
04
:
17.954
: [ GPnP][
3046512336
]clsgpnp_Init: [
at
clsgpnp0.c:
837
] GPnP client pid=
12069
, tl=
3
, f=
0
2014
-07
-10
15
:
04
:
17.966
: [ OCRAPI][
3046512336
]clsu_get_private_ip_addresses: no ip addresses found.
2014
-07
-10
15
:
04
:
17.966
: [GIPCXCPT][
3046512336
] gipcShutdownF: skipping shutdown, count
2
,
from
[ clsinet.c :
1732
], ret gipcretSuccess (
0
)
2014
-07
-10
15
:
04
:
17.968
: [GIPCXCPT][
3046512336
] gipcShutdownF: skipping shutdown, count
1
,
from
[ clsgpnp0.c :
1021
], ret gipcretSuccess (
0
)
[ OCRAPI][
3046512336
]a_init_clsss: failed to call clsu_get_private_ip_addr (
7
)
2014
-07
-10
15
:
04
:
17.970
: [ OCRAPI][
3046512336
]a_init:
13
!: Clusterware
init
unsuccessful : [
44
]
2014
-07
-10
15
:
04
:
17.970
: [ CRSOCR][
3046512336
] OCR context
init
failure. Error: PROC
-44
: Error
in
network address
and
interface operations Network address
and
interface operations error [
7
]
2014
-07
-10
15
:
04
:
17.970
: [ CRSD][
3046512336
][PANIC] CRSD exiting: Could
not
init
OCR, code:
44
2014
-07
-10
15
:
04
:
17.970
: [ CRSD][
3046512336
] Done.
|
一个重要的错误信息“2014-07-10 15:04:17.966: [ OCRAPI][3046512336]clsu_get_private_ip_addresses: no ip addresses found.”看来造成crsd启动失败的原因和private network interface 有关系
检查网络配置:
[root@lxh3 crsd]# ifconfig eth1
eth1 Link encap:Ethernet HWaddr 08:00:27:D5:85:37
inet addr:10.10.10.13 Bcast:10.255.255.255 Mask:255.0.0.0
inet6 addr: fe80::a00:27ff:fed5:8537/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:2213 errors:0 dropped:0 overruns:0 frame:0
TX packets:11568 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:281234 (274.6 KiB) TX bytes:4006863 (3.8 MiB)
[grid@lxh1 ~]$ /sbin/ifconfig eth1
eth1 Link encap:Ethernet HWaddr 08:00:27:AE:93:9A
inet addr:10.10.10.11 Bcast:10.10.10.255 Mask:255.255.255.0
inet6 addr: fe80::a00:27ff:feae:939a/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:157629 errors:0 dropped:0 overruns:0 frame:0
TX packets:140367 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:68428231 (65.2 MiB) TX bytes:48684593 (46.4 MiB)
node3的private network interface的netmask 为255.255.255.0,和其他node的netmask(255.255.255.0)不一致!
解决方法:
1、修改node3的private network interface的netmask为255.255.255.0
2、清除crs的配置信息,重新执行root.sh
[root@lxh3 install]# perl rootcrs.pl -deconfig -force
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
|
2014
-07
-10
15
:
12
:
13
: Parsing the host name
2014
-07
-10
15
:
12
:
13
: Checking
for
super
user privileges
2014
-07
-10
15
:
12
:
13
: User has
super
user privileges
Using configuration parameter file: ./crsconfig_params
PRCR
-1035
: Failed to look up CRS resource ora.cluster_vip.type
for
1
PRCR
-1068
: Failed to query resources
Cannot communicate
with
crsd
PRCR
-1070
: Failed to check
if
resource ora.gsd is registered
Cannot communicate
with
crsd
PRCR
-1070
: Failed to check
if
resource ora.ons is registered
Cannot communicate
with
crsd
PRCR
-1070
: Failed to check
if
resource ora.eons is registered
Cannot communicate
with
crsd
ACFS
-9200
: Supported
CRS
-4535
: Cannot communicate
with
Cluster Ready Services
CRS
-4000
: Command Stop failed,
or
completed
with
errors.
CRS
-2791
: Starting shutdown of Oracle High Availability Services-managed resources
on
'lxh3'
CRS
-2673
: Attempting to stop
'ora.cssdmonitor'
on
'lxh3'
CRS
-2673
: Attempting to stop
'ora.ctssd'
on
'lxh3'
CRS
-2673
: Attempting to stop
'ora.evmd'
on
'lxh3'
CRS
-2673
: Attempting to stop
'ora.asm'
on
'lxh3'
CRS
-2673
: Attempting to stop
'ora.mdnsd'
on
'lxh3'
CRS
-2673
: Attempting to stop
'ora.drivers.acfs'
on
'lxh3'
CRS
-2677
: Stop of
'ora.cssdmonitor'
on
'lxh3'
succeeded
CRS
-2677
: Stop of
'ora.evmd'
on
'lxh3'
succeeded
CRS
-2677
: Stop of
'ora.mdnsd'
on
'lxh3'
succeeded
CRS
-2677
: Stop of
'ora.ctssd'
on
'lxh3'
succeeded
CRS
-2677
: Stop of
'ora.asm'
on
'lxh3'
succeeded
CRS
-2673
: Attempting to stop
'ora.cssd'
on
'lxh3'
CRS
-2677
: Stop of
'ora.cssd'
on
'lxh3'
succeeded
CRS
-2673
: Attempting to stop
'ora.gpnpd'
on
'lxh3'
CRS
-2673
: Attempting to stop
'ora.diskmon'
on
'lxh3'
CRS
-2677
: Stop of
'ora.diskmon'
on
'lxh3'
succeeded
CRS
-2677
: Stop of
'ora.gpnpd'
on
'lxh3'
succeeded
CRS
-2673
: Attempting to stop
'ora.gipcd'
on
'lxh3'
CRS
-2677
: Stop of
'ora.drivers.acfs'
on
'lxh3'
succeeded
CRS
-2677
: Stop of
'ora.gipcd'
on
'lxh3'
succeeded
CRS
-2793
: Shutdown of Oracle High Availability Services-managed resources
on
'lxh3'
has completed
CRS
-4133
: Oracle High Availability Services has been stopped.
error:
package
cvuqdisk is
not
installed
Successfully deconfigured Oracle clusterware stack
on
this
node
|
[root@lxh3 crsd]# /u01/11.2.0/grid/root.sh
1
2
3
4
5
6
7
8
9
10
|