系统环境:
操作系统:RedHat EL5
Cluster: Oracle GI(Grid Infrastructure)
Oracle: Oracle 11.2.0.1.0
如图所示:RAC 系统架构
对于Oracle 11G构建RAC首先需要构建GI(Grid Infrastructure)的架构
案例分析:
在Oracle 11gR2 RAC在添加新节点时,添加grid,在node3上执行root.sh时,出现以下错误:
[root@lxh3 ~]# /u01/11.2.0/grid/root.sh
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
|
Running Oracle 11g root.sh script...
The following environment variables are set
as
:
ORACLE_OWNER= grid
ORACLE_HOME= /u01/
11.2.
0
/grid
Enter the full pathname of the local bin directory: [/usr/local/bin]:
Copying dbhome to /usr/local/bin ...
Copying oraenv to /usr/local/bin ...
Copying coraenv to /usr/local/bin ...
Creating /etc/oratab file...
Entries will be added to the /etc/oratab file
as
needed by
Database Configuration Assistant when a database is created
Finished running generic part of root.sh script.
Now product-specific root actions will be performed.
2014
-07
-10
15
:
00
:
36
: Parsing the host name
2014
-07
-10
15
:
00
:
36
: Checking
for
super
user privileges
2014
-07
-10
15
:
00
:
36
: User has
super
user privileges
Using configuration parameter file: /u01/
11.2.
0
/grid/crs/install/crsconfig_params
Creating trace directory
LOCAL ADD MODE
Creating OCR keys
for
user
'root'
, privgrp
'root'
..
Operation successful.
Adding daemon to inittab
CRS
-4123
: Oracle High Availability Services has been started.
ohasd is starting
CRS
-4402
: The CSS daemon was started
in
exclusive
mode but found an active CSS daemon
on
node lxh1, number
1
,
and
is terminating
An active cluster was found during
exclusive
startup, restarting to join the cluster
CRS
-2672
: Attempting to start
'ora.mdnsd'
on
'lxh3'
CRS
-2676
: Start of
'ora.mdnsd'
on
'lxh3'
succeeded
CRS
-2672
: Attempting to start
'ora.gipcd'
on
'lxh3'
CRS
-2676
: Start of
'ora.gipcd'
on
'lxh3'
succeeded
CRS
-2672
: Attempting to start
'ora.gpnpd'
on
'lxh3'
CRS
-2676
: Start of
'ora.gpnpd'
on
'lxh3'
succeeded
CRS
-2672
: Attempting to start
'ora.cssdmonitor'
on
'lxh3'
CRS
-2676
: Start of
'ora.cssdmonitor'
on
'lxh3'
succeeded
CRS
-2672
: Attempting to start
'ora.cssd'
on
'lxh3'
CRS
-2672
: Attempting to start
'ora.diskmon'
on
'lxh3'
CRS
-2676
: Start of
'ora.diskmon'
on
'lxh3'
succeeded
CRS
-2676
: Start of
'ora.cssd'
on
'lxh3'
succeeded
CRS
-2672
: Attempting to start
'ora.ctssd'
on
'lxh3'
CRS
-2676
: Start of
'ora.ctssd'
on
'lxh3'
succeeded
CRS
-2672
: Attempting to start
'ora.drivers.acfs'
on
'lxh3'
CRS
-2676
: Start of
'ora.drivers.acfs'
on
'lxh3'
succeeded
CRS
-2672
: Attempting to start
'ora.asm'
on
'lxh3'
CRS
-2676
: Start of
'ora.asm'
on
'lxh3'
succeeded
CRS
-2672
: Attempting to start
'ora.crsd'
on
'lxh3'
CRS
-2676
: Start of
'ora.crsd'
on
'lxh3'
succeeded
CRS
-2672
: Attempting to start
'ora.evmd'
on
'lxh3'
CRS
-2676
: Start of
'ora.evmd'
on
'lxh3'
succeeded
Timed out waiting
for
the CRS stack to start.
|
CRSD服务启动失败!
查看日志:
[root@lxh3 crsd]# more crsdOUT.log
2014-07-10 15:03:56
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
|
Changing directory to /u01/
11.2.
0
/grid/log/lxh3/crsd
2014
-07
-10
15
:
03
:
56
CRSD REBOOT
CRSD exiting: Could
not
init
OCR, code:
442014
-07
-10
15
:
03
:
58
Changing directory to /u01/
11.2.
0
/grid/log/lxh3/crsd
2014
-07
-10
15
:
03
:
58
CRSD REBOOT
CRSD exiting: Could
not
init
OCR, code:
442014
-07
-10
15
:
04
:
00
Changing directory to /u01/
11.2.
0
/grid/log/lxh3/crsd
2014
-07
-10
15
:
04
:
00
CRSD REBOOT
CRSD exiting: Could
not
init
OCR, code:
442014
-07
-10
15
:
04
:
02
Changing directory to /u01/
11.2.
0
/grid/log/lxh3/crsd
2014
-07
-10
15
:
04
:
02
CRSD REBOOT
CRSD exiting: Could
not
init
OCR, code:
442014
-07
-10
15
:
04
:
04
Changing directory to /u01/
11.2.
0
/grid/log/lxh3/crsd
2014
-07
-10
15
:
04
:
04
CRSD REBOOT
CRSD exiting: Could
not
init
OCR, code:
442014
-07
-10
15
:
04
:
06
Changing directory to /u01/
11.2.
0
/grid/log/lxh3/crsd
2014
-07
-10
15
:
04
:
06
CRSD REBOOT
CRSD exiting: Could
not
init
OCR, code:
442014
-07
-10
15
:
04
:
08
Changing directory to /u01/
11.2.
0
/grid/log/lxh3/crsd
2014
-07
-10
15
:
04
:
08
CRSD REBOOT
CRSD exiting: Could
not
init
OCR, code:
442014
-07
-10
15
:
04
:
10
Changing directory to /u01/
11.2.
0
/grid/log/lxh3/crsd
2014
-07
-10
15
:
04
:
10
CRSD REBOOT
CRSD exiting: Could
not
init
OCR, code:
442014
-07
-10
15
:
04
:
12
Changing directory to /u01/
11.2.
0
/grid/log/lxh3/crsd
2014
-07
-10
15
:
04
:
12
CRSD REBOOT
CRSD exiting: Could
not
init
OCR, code:
442014
-07
-10
15
:
04
:
14
Changing directory to /u01/
11.2.
0
/grid/log/lxh3/crsd
2014
-07
-10
15
:
04
:
14
CRSD REBOOT
CRSD exiting: Could
not
init
OCR, code:
442014
-07
-10
15
:
04
:
16
Changing directory to /u01/
11.2.
0
/grid/log/lxh3/crsd
2014
-07
-10
15
:
04
:
16
CRSD REBOOT
CRSD exiting: Could
not
init
OCR, code:
44
|
[root@lxh3 crsd]# tail crsd.log
1
2
3
4
5
6
7
8
9
|
2014
-07
-10
15
:
04
:
17.954
: [ GPnP][
3046512336
]clsgpnp_Init: [
at
clsgpnp0.c:
837
] GPnP client pid=
12069
, tl=
3
, f=
0
2014
-07
-10
15
:
04
:
17.966
: [ OCRAPI][
3046512336
]clsu_get_private_ip_addresses: no ip addresses found.
2014
-07
-10
15
:
04
:
17.966
: [GIPCXCPT][
3046512336
] gipcShutdownF: skipping shutdown, count
2
,
from
[ clsinet.c :
1732
], ret gipcretSuccess (
0
)
2014
-07
-10
15
:
04
:
17.968
: [GIPCXCPT][
3046512336
] gipcShutdownF: skipping shutdown, count
1
,
from
[ clsgpnp0.c :
1021
], ret gipcretSuccess (
0
)
[ OCRAPI][
3046512336
]a_init_clsss: failed to call clsu_get_private_ip_addr (
7
)
2014
-07
-10
15
:
04
:
17.970
: [ OCRAPI][
3046512336
]a_init:
13
!: Clusterware
init
unsuccessful : [
44
]
2014
-07
-10
15
:
04
:
17.970
: [ CRSOCR][
3046512336
] OCR context
init
failure. Error: PROC
-44
: Error
in
network address
and
interface operations Network address
and
interface operations error [
7
]
2014
-07
-10
15
:
04
:
17.970
: [ CRSD][
3046512336
][PANIC] CRSD exiting: Could
not
init
OCR, code:
44
2014
-07
-10
15
:
04
:
17.970
: [ CRSD][
3046512336
] Done.
|
一个重要的错误信息“2014-07-10 15:04:17.966: [ OCRAPI][3046512336]clsu_get_private_ip_addresses: no ip addresses found.”看来造成crsd启动失败的原因和private network interface 有关系
检查网络配置:
[root@lxh3 crsd]# ifconfig eth1
eth1 Link encap:Ethernet HWaddr 08:00:27:D5:85:37
inet addr:10.10.10.13 Bcast:10.255.255.255 Mask:255.0.0.0
inet6 addr: fe80::a00:27ff:fed5:8537/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:2213 errors:0 dropped:0 overruns:0 frame:0
TX packets:11568 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:281234 (274.6 KiB) TX bytes:4006863 (3.8 MiB)
[grid@lxh1 ~]$ /sbin/ifconfig eth1
eth1 Link encap:Ethernet HWaddr 08:00:27:AE:93:9A
inet addr:10.10.10.11 Bcast:10.10.10.255 Mask:255.255.255.0
inet6 addr: fe80::a00:27ff:feae:939a/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:157629 errors:0 dropped:0 overruns:0 frame:0
TX packets:140367 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:68428231 (65.2 MiB) TX bytes:48684593 (46.4 MiB)
node3的private network interface的netmask 为255.255.255.0,和其他node的netmask(255.255.255.0)不一致!
解决方法:
1、修改node3的private network interface的netmask为255.255.255.0
2、清除crs的配置信息,重新执行root.sh
[root@lxh3 install]# perl rootcrs.pl -deconfig -force
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
|
2014
-07
-10
15
:
12
:
13
: Parsing the host name
2014
-07
-10
15
:
12
:
13
: Checking
for
super
user privileges
2014
-07
-10
15
:
12
:
13
: User has
super
user privileges
Using configuration parameter file: ./crsconfig_params
PRCR
-1035
: Failed to look up CRS resource ora.cluster_vip.type
for
1
PRCR
-1068
: Failed to query resources
Cannot communicate
with
crsd
PRCR
-1070
: Failed to check
if
resource ora.gsd is registered
Cannot communicate
with
crsd
PRCR
-1070
: Failed to check
if
resource ora.ons is registered
Cannot communicate
with
crsd
PRCR
-1070
: Failed to check
if
resource ora.eons is registered
Cannot communicate
with
crsd
ACFS
-9200
: Supported
CRS
-4535
: Cannot communicate
with
Cluster Ready Services
CRS
-4000
: Command Stop failed,
or
completed
with
errors.
CRS
-2791
: Starting shutdown of Oracle High Availability Services-managed resources
on
'lxh3'
CRS
-2673
: Attempting to stop
'ora.cssdmonitor'
on
'lxh3'
CRS
-2673
: Attempting to stop
'ora.ctssd'
on
'lxh3'
CRS
-2673
: Attempting to stop
'ora.evmd'
on
'lxh3'
CRS
-2673
: Attempting to stop
'ora.asm'
on
'lxh3'
CRS
-2673
: Attempting to stop
'ora.mdnsd'
on
'lxh3'
CRS
-2673
: Attempting to stop
'ora.drivers.acfs'
on
'lxh3'
CRS
-2677
: Stop of
'ora.cssdmonitor'
on
'lxh3'
succeeded
CRS
-2677
: Stop of
'ora.evmd'
on
'lxh3'
succeeded
CRS
-2677
: Stop of
'ora.mdnsd'
on
'lxh3'
succeeded
CRS
-2677
: Stop of
'ora.ctssd'
on
'lxh3'
succeeded
CRS
-2677
: Stop of
'ora.asm'
on
'lxh3'
succeeded
CRS
-2673
: Attempting to stop
'ora.cssd'
on
'lxh3'
CRS
-2677
: Stop of
'ora.cssd'
on
'lxh3'
succeeded
CRS
-2673
: Attempting to stop
'ora.gpnpd'
on
'lxh3'
CRS
-2673
: Attempting to stop
'ora.diskmon'
on
'lxh3'
CRS
-2677
: Stop of
'ora.diskmon'
on
'lxh3'
succeeded
CRS
-2677
: Stop of
'ora.gpnpd'
on
'lxh3'
succeeded
CRS
-2673
: Attempting to stop
'ora.gipcd'
on
'lxh3'
CRS
-2677
: Stop of
'ora.drivers.acfs'
on
'lxh3'
succeeded
CRS
-2677
: Stop of
'ora.gipcd'
on
'lxh3'
succeeded
CRS
-2793
: Shutdown of Oracle High Availability Services-managed resources
on
'lxh3'
has completed
CRS
-4133
: Oracle High Availability Services has been stopped.
error:
package
cvuqdisk is
not
installed
Successfully deconfigured Oracle clusterware stack
on
this
node
|
[root@lxh3 crsd]# /u01/11.2.0/grid/root.sh
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
|
Running Oracle 11g root.sh script...
The following environment variables are set
as
:
ORACLE_OWNER= grid
ORACLE_HOME= /u01/
11.2.
0
/grid
Enter the full pathname of the local bin directory: [/usr/local/bin]:
The file
"dbhome"
already exists
in
/usr/local/bin. Overwrite it? (y/n)
[n]: y
Copying dbhome to /usr/local/bin ...
The file
"oraenv"
already exists
in
/usr/local/bin. Overwrite it? (y/n)
[n]: y
Copying oraenv to /usr/local/bin ...
The file
"coraenv"
already exists
in
/usr/local/bin. Overwrite it? (y/n)
[n]: y
Copying coraenv to /usr/local/bin ...
Entries will be added to the /etc/oratab file
as
needed by
Database Configuration Assistant when a database is created
Finished running generic part of root.sh script.
Now product-specific root actions will be performed.
2014
-07
-10
15
:
17
:
53
: Parsing the host name
2014
-07
-10
15
:
17
:
53
: Checking
for
super
user privileges
2014
-07
-10
15
:
17
:
53
: User has
super
user privileges
Using configuration parameter file: /u01/
11.2.
0
/grid/crs/install/crsconfig_params
LOCAL ADD MODE
Creating OCR keys
for
user
'root'
, privgrp
'root'
..
Operation successful.
Adding daemon to inittab
CRS
-4123
: Oracle High Availability Services has been started.
ohasd is starting
CRS
-4402
: The CSS daemon was started
in
exclusive
mode but found an active CSS daemon
on
node lxh1, number
1
,
and
is terminating
An active cluster was found during
exclusive
startup, restarting to join the cluster
CRS
-2672
: Attempting to start
'ora.mdnsd'
on
'lxh3'
CRS
-2676
: Start of
'ora.mdnsd'
on
'lxh3'
succeeded
CRS
-2672
: Attempting to start
'ora.gipcd'
on
'lxh3'
CRS
-2676
: Start of
'ora.gipcd'
on
'lxh3'
succeeded
CRS
-2672
: Attempting to start
'ora.gpnpd'
on
'lxh3'
CRS
-2676
: Start of
'ora.gpnpd'
on
'lxh3'
succeeded
CRS
-2672
: Attempting to start
'ora.cssdmonitor'
on
'lxh3'
CRS
-2676
: Start of
'ora.cssdmonitor'
on
'lxh3'
succeeded
CRS
-2672
: Attempting to start
'ora.cssd'
on
'lxh3'
CRS
-2672
: Attempting to start
'ora.diskmon'
on
'lxh3'
CRS
-2676
: Start of
'ora.diskmon'
on
'lxh3'
succeeded
CRS
-2676
: Start of
'ora.cssd'
on
'lxh3'
succeeded
CRS
-2672
: Attempting to start
'ora.ctssd'
on
'lxh3'
CRS
-2676
: Start of
'ora.ctssd'
on
'lxh3'
succeeded
CRS
-2672
: Attempting to start
'ora.drivers.acfs'
on
'lxh3'
CRS
-2676
: Start of
'ora.drivers.acfs'
on
'lxh3'
succeeded
CRS
-2672
: Attempting to start
'ora.asm'
on
'lxh3'
CRS
-2676
: Start of
'ora.asm'
on
'lxh3'
succeeded
CRS
-2672
: Attempting to start
'ora.crsd'
on
'lxh3'
CRS
-2676
: Start of
'ora.crsd'
on
'lxh3'
succeeded
CRS
-2672
: Attempting to start
'ora.evmd'
on
'lxh3'
CRS
-2676
: Start of
'ora.evmd'
on
'lxh3'
succeeded
clscfg: EXISTING configuration version
5
detected.
clscfg: version
5
is 11g Release
2.
Successfully accumulated necessary OCR keys.
Creating OCR keys
for
user
'root'
, privgrp
'root'
..
Operation successful.
lxh3
2014
/
07
/
10
15
:
20
:
10
/u01/
11.2.
0
/grid/cdata/lxh3/backup_20140710_152010.olr
Preparing packages
for
installation...
cvuqdisk
-1.0.
7
-1
Configure Oracle Grid Infrastructure
for
a Cluster ... succeeded
Updating inventory properties
for
clusterware
Starting Oracle Universal Installer...
Checking swap space: must be greater than
500
MB. Actual
2047
MB Passed
The inventory pointer is located
at
/etc/oraInst.loc
The inventory is located
at
/u01/app/oraInventory
'UpdateNodeList'
was successful.
|
脚本执行成功!
验证:
[root@lxh3 crsd]# crsctl check crs
1
2
3
4
|
CRS
-4638
: Oracle High Availability Services is online
CRS
-4537
: Cluster Ready Services is online
CRS
-4529
: Cluster Synchronization Services is online
CRS
-4533
: Event Manager is online
|
[root@lxh3 crsd]# crs_stat -t
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
|
Name Type Target State Host
------------------------------------------------------------
ora.DG1.dg ora....up.type ONLINE ONLINE lxh1
ora.DG2.dg ora....up.type ONLINE ONLINE lxh1
ora....ER.lsnr ora....er.type ONLINE ONLINE lxh2
ora....N1.lsnr ora....er.type ONLINE ONLINE lxh1
ora....VOTE.dg ora....up.type ONLINE ONLINE lxh1
ora.RCY.dg ora....up.type ONLINE ONLINE lxh1
ora.asm ora.asm.type ONLINE ONLINE lxh1
ora.eons ora.eons.type ONLINE ONLINE lxh1
ora.gsd ora.gsd.type ONLINE ONLINE lxh1
ora.lxh.db ora....se.type OFFLINE OFFLINE
ora....taf.svc ora....ce.type OFFLINE OFFLINE
ora....SM1.asm application ONLINE ONLINE lxh1
ora....H1.lsnr application ONLINE OFFLINE
ora.lxh1.gsd application ONLINE ONLINE lxh1
ora.lxh1.ons application ONLINE OFFLINE
ora.lxh1.vip ora....t1.type ONLINE ONLINE lxh2
ora....SM2.asm application ONLINE ONLINE lxh2
ora....H2.lsnr application ONLINE ONLINE lxh2
ora.lxh2.gsd application ONLINE ONLINE lxh2
ora.lxh2.ons application ONLINE OFFLINE
ora.lxh2.vip ora....t1.type ONLINE ONLINE lxh2
ora....SM3.asm application ONLINE ONLINE lxh3
ora....H3.lsnr application ONLINE ONLINE lxh3
ora.lxh3.gsd application ONLINE ONLINE lxh3
ora.lxh3.ons application ONLINE ONLINE lxh3
ora.lxh3.vip ora....t1.type ONLINE ONLINE lxh3
ora....network ora....rk.type ONLINE ONLINE lxh1
ora.oc4j ora.oc4j.type ONLINE ONLINE lxh1
ora.ons ora.ons.type ONLINE ONLINE lxh3
ora....ry.acfs ora....fs.type ONLINE ONLINE lxh1
ora.scan1.vip ora....ip.type ONLINE ONLINE lxh1
|
@至此,问题解决完毕!