Oracle Linux 6.7中 Oracle 11.2.0.4 RAC集群CRS异常处理

本文涉及的产品
OCR统一识别,每月200次
票据凭证识别,票据凭证识别 200次/月
通用文字识别,通用文字识别 200次/月
简介:

最近一个月单位上的Oracle RAC集群CRS异常导致集群命令无法使用,执行crsctl stat res -t命令出现如下错误:

[grid@db1 ~]$ crsctl stat res -t
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4000: Command Status failed, or completed with errors.

[grid@db2 ~]$ crsctl stat res -t
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4000: Command Status failed, or completed with errors.
但是数据库可以正常访问,业务系统也运行正常

[root@db1 ~]# ps -ef | grep pmon
root 8024 204594 0 21:11 pts/0 00:00:00 grep pmon
grid 77120 1 0 Dec21 ? 00:04:21 asmpmon+ASM1
oracle 77790 1 0 Dec21 ? 00:05:18 ora_pmon_CAIWU1
oracle 77794 1 0 Dec21 ? 00:05:08 ora_pmon_dadb1
oracle 77848 1 0 Dec21 ? 00:05:39 ora_pmon_chdyl1
oracle 77910 1 0 Dec21 ? 00:07:47 ora_pmon_RLZY1

[root@db2 ~]# ps -ef | grep pmon
grid 27745 1 0 Dec21 ? 00:04:21 asmpmon+ASM2
oracle 28393 1 0 Dec21 ? 00:05:21 ora_pmon_dadb2
oracle 28569 1 0 Dec21 ? 00:04:58 ora_pmon_CAIWU2
oracle 28573 1 0 Dec21 ? 00:05:36 ora_pmon_chdyl2
oracle 28583 1 0 Dec21 ? 00:07:49 ora_pmon_RLZY2
查看ASM磁盘组的状态,发现OCR磁盘组确实offline了

[grid@db1 ~]$ asmcmd lsdg
State Type Rebal Sector Block AU Total_MB Free_MB Req_mir_free_MB Usable_file_MB Offline_disks Voting_files Name
MOUNTED EXTERN N 512 4096 1048576 3072000 2679522 0 2679522 0 N ARCH/
MOUNTED EXTERN N 512 4096 1048576 204800 151138 0 151138 0 N CWDATA/
MOUNTED EXTERN N 512 4096 1048576 512000 472546 0 472546 0 N DADATA/
MOUNTED EXTERN N 512 4096 1048576 3072000 595334 0 595334 0 N DATA/
MOUNTED EXTERN N 512 4096 1048576 1843200 609953 0 609953 0 N SBDATA/

[grid@db2 ~]$ asmcmd lsdg
State Type Rebal Sector Block AU Total_MB Free_MB Req_mir_free_MB Usable_file_MB Offline_disks Voting_files Name
MOUNTED EXTERN N 512 4096 1048576 3072000 2679522 0 2679522 0 N ARCH/
MOUNTED EXTERN N 512 4096 1048576 204800 151138 0 151138 0 N CWDATA/
MOUNTED EXTERN N 512 4096 1048576 512000 472546 0 472546 0 N DADATA/
MOUNTED EXTERN N 512 4096 1048576 3072000 595334 0 595334 0 N DATA/
MOUNTED EXTERN N 512 4096 1048576 1843200 609953 0 609953 0 N SBDATA/
手工将crsdg上线,命令能够执行成功,但执行crsctl stat res -t命令仍然报错。

[grid@db1 ~]$ sqlplus / as sysasm

SQL*Plus: Release 11.2.0.4.0 Production on Mon Dec 30 21:15:33 2019

Copyright (c) 1982, 2013, Oracle. All rights reserved.

Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options

SQL> alter diskgroup ocr mount;

Diskgroup altered.

SQL> exit
Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options

[grid@db1 ~]$ crsctl stat res -t
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4000: Command Status failed, or completed with errors.

[grid@db2 ~]$ sqlplus / as sysasm

SQL*Plus: Release 11.2.0.4.0 Production on Mon Dec 30 21:15:05 2019

Copyright (c) 1982, 2013, Oracle. All rights reserved.

Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options

SQL> alter diskgroup ocr mount;

Diskgroup altered.

SQL> exit
Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options

[grid@db2 ~]$ crsctl stat res -t
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4000: Command Status failed, or completed with errors.
检查节点db1的alert_+ASM1.log有如下报错,说是不能访问OCR磁盘组的相关磁盘而强制dismount了OCR磁盘,但使用dd命令测试是能够访问的

Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_77212.trc:
ORA-15078: ASM diskgroup was forcibly dismounted
WARNING: requested mirror side 1 of virtual extent 0 logical extent 0 offset 102400 is not allocated; I/O request failed
WARNING: requested mirror side 2 of virtual extent 0 logical extent 1 offset 102400 is not allocated; I/O request failed
Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_77212.trc:
ORA-15078: ASM diskgroup was forcibly dismounted
ORA-15078: ASM diskgroup was forcibly dismounted
Sat Dec 28 05:30:48 2019
SQL> alter diskgroup OCR check / proxy /
ORA-15032: not all alterations performed
ORA-15001: diskgroup "OCR" does not exist or is not mounted
ERROR: alter diskgroup OCR check / proxy /
NOTE: client exited [77184]
Sat Dec 28 05:30:49 2019
NOTE: [crsd.bin@db1 (TNS V1-V3) 35285] opening OCR file
Sat Dec 28 05:30:51 2019
NOTE: [crsd.bin@db1 (TNS V1-V3) 35305] opening OCR file
Sat Dec 28 05:30:53 2019
NOTE: [crsd.bin@db1 (TNS V1-V3) 35322] opening OCR file
Sat Dec 28 05:30:55 2019
NOTE: [crsd.bin@db1 (TNS V1-V3) 35346] opening OCR file
Sat Dec 28 05:30:57 2019
NOTE: [crsd.bin@db1 (TNS V1-V3) 35363] opening OCR file
Sat Dec 28 05:31:00 2019
NOTE: [crsd.bin@db1 (TNS V1-V3) 35459] opening OCR file
Sat Dec 28 05:31:02 2019
NOTE: [crsd.bin@db1 (TNS V1-V3) 35481] opening OCR file
Sat Dec 28 05:31:04 2019
NOTE: [crsd.bin@db1 (TNS V1-V3) 35520] opening OCR file
Sat Dec 28 05:31:06 2019
NOTE: [crsd.bin@db1 (TNS V1-V3) 35539] opening OCR file
Sat Dec 28 05:31:08 2019
NOTE: [crsd.bin@db1 (TNS V1-V3) 35557] opening OCR file
Sat Dec 28 21:00:10 2019
Warning: VKTM detected a time drift.
Time drifts can result in an unexpected behavior such as time-outs. Please check trace file for more details.
检查错误日志

[root@db1 ~]# more /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_77212.trc
Trace file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_77212.trc
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options
ORACLE_HOME = /u01/app/11.2.0/grid
System name: Linux
Node name: db1
Release: 3.8.13-68.3.4.el6uek.x86_64
Version: #2 SMP Tue Jul 14 15:03:36 PDT 2015
Machine: x86_64
Instance name: +ASM1
Redo thread mounted by this instance: 0
Oracle process number: 24
Unix process pid: 77212, image: oracle@db1 (TNS V1-V3)

2019-12-28 05:30:44.894 SESSION ID:(2929.3) 2019-12-28 05:30:44.894
CLIENT ID:() 2019-12-28 05:30:44.894 SERVICE NAME:() 2019-12-28 05:30:44.894
MODULE NAME:(crsd.bin@db1 (TNS V1-V3)) 2019-12-28 05:30:44.894 ACTION NAME:() 2019-12-28 05:30:44.894

Received ORADEBUG command (#1) 'CLEANUP_KFK_FD' from process 'Unix process pid: 35253, image: '

* 2019-12-28 05:30:44.895
Finished processing ORADEBUG command (#1) 'CLEANUP_KFK_FD'

* 2019-12-28 05:30:48.235
WARNING:failed xlate 1
ORA-15078: ASM diskgroup was forcibly dismounted
ksfdrfms:Mirror Read file=+OCR.255.4294967295 fob=0x9b00e5d8 bufp=0x7f5dd012ba00 blkno=25 nbytes=4096
WARNING:failed xlate 1
WARNING: requested mirror side 1 of virtual extent 0 logical extent 0 offset 102400 is not allocated; I/O request failed
ksfdrfms:Read failed from mirror side=1 logical extent number=0 dskno=65535
WARNING:failed xlate 1
WARNING: requested mirror side 2 of virtual extent 0 logical extent 1 offset 102400 is not allocated; I/O request failed
ksfdrfms:Read failed from mirror side=2 logical extent number=1 dskno=65535
ORA-15078: ASM diskgroup was forcibly dismounted
ORA-15078: ASM diskgroup was forcibly dismounted
检查节点db1的alertdb1.log有如下报错,也是说不能访问OCR磁盘组的相关磁盘

2019-12-28 05:30:48.468:
[/u01/app/11.2.0/grid/bin/oraagent.bin(77466)]CRS-5822:Agent '/u01/app/11.2.0/grid/bin/oraagent_grid' disconnected from server. Details at (:CRSAGF00117:) {0:1:4} in /u01/app/11.2.0/grid/log/db1/agent/crsd/oraagent_grid/oraagent_grid.lo
g.
2019-12-28 05:30:48.468:
[/u01/app/11.2.0/grid/bin/oraagent.bin(77684)]CRS-5822:Agent '/u01/app/11.2.0/grid/bin/oraagent_oracle' disconnected from server. Details at (:CRSAGF00117:) {0:7:332} in /u01/app/11.2.0/grid/log/db1/agent/crsd/oraagent_oracle/oraagent_o
racle.log.
2019-12-28 05:30:48.471:
[/u01/app/11.2.0/grid/bin/orarootagent.bin(77482)]CRS-5822:Agent '/u01/app/11.2.0/grid/bin/orarootagent_root' disconnected from server. Details at (:CRSAGF00117:) {0:5:11497} in /u01/app/11.2.0/grid/log/db1/agent/crsd/orarootagent_root/
orarootagent_root.log.
2019-12-28 05:30:48.480:
[ohasd(33022)]CRS-2765:Resource 'ora.crsd' has failed on server 'db1'.
2019-12-28 05:30:50.003:
[crsd(35285)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/db1/crsd/crsd.log.
2019-12-28 05:30:50.021:
[crsd(35285)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /u01/app/11.2.0/grid/log/db1/crsd/crsd.log.
2019-12-28 05:30:50.520:
[ohasd(33022)]CRS-2765:Resource 'ora.crsd' has failed on server 'db1'.
2019-12-28 05:30:51.918:
[crsd(35305)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/db1/crsd/crsd.log.
2019-12-28 05:30:51.929:
[crsd(35305)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /u01/app/11.2.0/grid/log/db1/crsd/crsd.log.
2019-12-28 05:30:52.557:
[ohasd(33022)]CRS-2765:Resource 'ora.crsd' has failed on server 'db1'.
2019-12-28 05:30:53.945:
[crsd(35322)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/db1/crsd/crsd.log.
2019-12-28 05:30:53.956:
[crsd(35322)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /u01/app/11.2.0/grid/log/db1/crsd/crsd.log.
2019-12-28 05:30:54.595:
[ohasd(33022)]CRS-2765:Resource 'ora.crsd' has failed on server 'db1'.
2019-12-28 05:30:55.976:
[crsd(35346)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/db1/crsd/crsd.log.
2019-12-28 05:30:55.988:
[crsd(35346)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /u01/app/11.2.0/grid/log/db1/crsd/crsd.log.
2019-12-28 05:30:56.633:
[ohasd(33022)]CRS-2765:Resource 'ora.crsd' has failed on server 'db1'.
2019-12-28 05:30:58.010:
[crsd(35363)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/db1/crsd/crsd.log.
2019-12-28 05:30:58.020:
[crsd(35363)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /u01/app/11.2.0/grid/log/db1/crsd/crsd.log.
2019-12-28 05:30:58.669:
[ohasd(33022)]CRS-2765:Resource 'ora.crsd' has failed on server 'db1'.
2019-12-28 05:31:00.043:
[crsd(35459)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/db1/crsd/crsd.log.
2019-12-28 05:31:00.054:
[crsd(35459)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /u01/app/11.2.0/grid/log/db1/crsd/crsd.log.
2019-12-28 05:31:00.706:
[ohasd(33022)]CRS-2765:Resource 'ora.crsd' has failed on server 'db1'.
2019-12-28 05:31:02.093:
[crsd(35481)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/db1/crsd/crsd.log.
2019-12-28 05:31:02.103:
[crsd(35481)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /u01/app/11.2.0/grid/log/db1/crsd/crsd.log.
2019-12-28 05:31:02.742:
[ohasd(33022)]CRS-2765:Resource 'ora.crsd' has failed on server 'db1'.
2019-12-28 05:31:04.109:
[crsd(35520)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/db1/crsd/crsd.log.
2019-12-28 05:31:04.119:
[crsd(35520)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /u01/app/11.2.0/grid/log/db1/crsd/crsd.log.
2019-12-28 05:31:04.777:
[ohasd(33022)]CRS-2765:Resource 'ora.crsd' has failed on server 'db1'.
2019-12-28 05:31:06.141:
[crsd(35539)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/db1/crsd/crsd.log.
2019-12-28 05:31:06.151:
[crsd(35539)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /u01/app/11.2.0/grid/log/db1/crsd/crsd.log.
2019-12-28 05:31:06.810:
[ohasd(33022)]CRS-2765:Resource 'ora.crsd' has failed on server 'db1'.
2019-12-28 05:31:08.181:
[crsd(35557)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/db1/crsd/crsd.log.
2019-12-28 05:31:08.191:
[crsd(35557)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /u01/app/11.2.0/grid/log/db1/crsd/crsd.log.
2019-12-28 05:31:08.846:
[ohasd(33022)]CRS-2765:Resource 'ora.crsd' has failed on server 'db1'.
2019-12-28 05:31:08.847:
[ohasd(33022)]CRS-2771:Maximum restart attempts reached for resource 'ora.crsd'; will not restart.
2019-12-28 05:31:08.848:
[ohasd(33022)]CRS-2769:Unable to failover resource 'ora.crsd'.
检查节点db1的oraagent_grid.log文件有如下报错,显示OCR磁盘组的状态被改变为offline了。

2019-12-28 05:30:16.531: [ AGFW][511039232]{1:30746:2} Agent received the message: AGENT_HB[Engine] ID 12293:113720
2019-12-28 05:30:37.808: [ AGFW][511039232]{1:30746:9373} Agent received the message: RESOURCE_STOP[ora.OCR.dg db1 1] ID 4099:113730
2019-12-28 05:30:37.808: [ AGFW][511039232]{1:30746:9373} Preparing STOP command for: ora.OCR.dg db1 1
2019-12-28 05:30:37.808: [ AGFW][511039232]{1:30746:9373} ora.OCR.dg db1 1 state changed from: ONLINE to: STOPPING
2019-12-28 05:30:37.809: [ora.OCR.dg][513140480]{1:30746:9373} [stop] (:CLSN00108:) clsn_agent::stop {
2019-12-28 05:30:37.810: [ora.OCR.dg][513140480]{1:30746:9373} [stop] DgpAgent::stop: enter {
2019-12-28 05:30:37.810: [ora.OCR.dg][513140480]{1:30746:9373} [stop] getResAttrib: attrib name USR_ORA_OPI value true len 4
2019-12-28 05:30:37.810: [ora.OCR.dg][513140480]{1:30746:9373} [stop] Agent::flagUsrOraOpiIsSet(true) reason not dependency
2019-12-28 05:30:37.810: [ora.OCR.dg][513140480]{1:30746:9373} [stop] DgpAgent::stop: tha exit }
2019-12-28 05:30:37.810: [ora.OCR.dg][513140480]{1:30746:9373} [stop] DgpAgent::stopSingle status:2 }
2019-12-28 05:30:37.810: [ora.OCR.dg][513140480]{1:30746:9373} [stop] (:CLSN00108:) clsn_agent::stop }
2019-12-28 05:30:37.810: [ AGFW][513140480]{1:30746:9373} Command: stop for resource: ora.OCR.dg db1 1 completed with status: SUCCESS
2019-12-28 05:30:37.810: [ora.OCR.dg][513140480]{1:30746:9373} [check] CrsCmd::ClscrsCmdData::stat entity 1 statflag 33 useFilter 0
2019-12-28 05:30:37.811: [ AGFW][511039232]{1:30746:9373} Agent sending reply for: RESOURCE_STOP[ora.OCR.dg db1 1] ID 4099:113730
2019-12-28 05:30:37.838: [ora.OCR.dg][513140480]{1:30746:9373} [check] DgpAgent::runCheck: asm stat asmRet 0
2019-12-28 05:30:37.839: [ora.OCR.dg][513140480]{1:30746:9373} [check] DgpAgent::getConnxn connected
2019-12-28 05:30:37.844: [ora.OCR.dg][513140480]{1:30746:9373} [check] DgpAgent::queryDgStatus excp no data found
2019-12-28 05:30:37.844: [ora.OCR.dg][513140480]{1:30746:9373} [check] DgpAgent::queryDgStatus no data found in v$asm_diskgroup_stat
2019-12-28 05:30:37.844: [ora.OCR.dg][513140480]{1:30746:9373} [check] DgpAgent::queryDgStatus dgName OCR ret 1
2019-12-28 05:30:37.845: [ AGFW][511039232]{1:30746:9373} ora.OCR.dg db1 1 state changed from: STOPPING to: OFFLINE
2019-12-28 05:30:37.845: [ AGFW][511039232]{1:30746:9373} Agent sending last reply for: RESOURCE_STOP[ora.OCR.dg db1 1] ID 4099:113730
2019-12-28 05:30:43.889: [ora.asm][503641856]{1:30746:2} [check] CrsCmd::ClscrsCmdData::stat entity 1 statflag 33 useFilter 0
2019-12-28 05:30:43.920: [ora.asm][503641856]{1:30746:2} [check] AsmProxyAgent::check clsagfw_resstatus 0
2019-12-28 05:30:48.465: [ CRSCOMM][521545472] IpcC: IPC client connection 6c to member 0 has been removed
2019-12-28 05:30:48.465: [CLSFRAME][521545472] Removing IPC Member:{Relative|Node:0|Process:0|Type:1}
2019-12-28 05:30:48.465: [CLSFRAME][521545472] Disconnected from CRSD:db1 process: {Relative|Node:0|Process:0|Type:1}
2019-12-28 05:30:48.474: [ AGENT][511039232]{0:1:4} {0:1:4} Created alert : (:CRSAGF00117:) : Disconnected from server, Agent is shutting down.
2019-12-28 05:30:48.474: [ AGFW][511039232]{0:1:4} Agent is exiting with exit code: 1
检查节点db2的alert
+ASM2.log有如下报错,出现了类似的"Waited 15 secs for write IO to PST disk 0 in group 1"信息,这说明对OCR磁盘组执行写操作时超时了15秒.

Sat Dec 28 03:02:51 2019
WARNING: Waited 15 secs for write IO to PST disk 0 in group 1.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 1.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 3.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 3.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 5.
WARNING: Waited 15 secs for write IO to PST disk 1 in group 5.
WARNING: Waited 15 secs for write IO to PST disk 2 in group 5.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 5.
WARNING: Waited 15 secs for write IO to PST disk 1 in group 5.
WARNING: Waited 15 secs for write IO to PST disk 2 in group 5.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 6.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 6.
Sat Dec 28 03:02:51 2019
NOTE: process b000+asm2 (45488) initiating offline of disk 0.3916160907 (OCR1) with mask 0x7e in group 5
NOTE: process b000+asm2 (45488) initiating offline of disk 1.3916160906 (OCR2) with mask 0x7e in group 5
NOTE: process b000+asm2 (45488) initiating offline of disk 2.3916160905 (OCR3) with mask 0x7e in group 5
NOTE: checking PST: grp = 5
GMON checking disk modes for group 5 at 19 for pid 27, osid 45488
ERROR: no read quorum in group: required 2, found 0 disks
NOTE: checking PST for grp 5 done.
NOTE: initiating PST update: grp = 5, dsk = 0/0xe96bdf8b, mask = 0x6a, op = clear
NOTE: initiating PST update: grp = 5, dsk = 1/0xe96bdf8a, mask = 0x6a, op = clear
NOTE: initiating PST update: grp = 5, dsk = 2/0xe96bdf89, mask = 0x6a, op = clear
GMON updating disk modes for group 5 at 20 for pid 27, osid 45488
ERROR: no read quorum in group: required 2, found 0 disks
Sat Dec 28 03:02:51 2019
NOTE: cache dismounting (not clean) group 5/0x8F5B2F9F (OCR)
NOTE: messaging CKPT to quiesce pins Unix process pid: 45490, image: oracle@db2 (B001)
Sat Dec 28 03:02:51 2019
NOTE: halting all I/Os to diskgroup 5 (OCR)
Sat Dec 28 03:02:52 2019
NOTE: LGWR doing non-clean dismount of group 5 (OCR)
NOTE: LGWR sync ABA=23.100 last written ABA 23.100
WARNING: Offline for disk OCR1 in mode 0x7f failed.
WARNING: Offline for disk OCR2 in mode 0x7f failed.
WARNING: Offline for disk OCR3 in mode 0x7f failed.
Sat Dec 28 03:02:52 2019
kjbdomdet send to inst 1
detach from dom 5, sending detach message to inst 1
Sat Dec 28 03:02:52 2019
List of instances:
1 2
Dirty detach reconfiguration started (new ddet inc 1, cluster inc 36)
Global Resource Directory partially frozen for dirty detach

  • dirty detach - domain 5 invalid = TRUE
    0 GCS resources traversed, 0 cancelled
    Dirty Detach Reconfiguration complete
    Sat Dec 28 03:02:52 2019
    WARNING: dirty detached from domain 5
    NOTE: cache dismounted group 5/0x8F5B2F9F (OCR)
    SQL> alter diskgroup OCR dismount force / ASM SERVER:2405117855 /
    Sat Dec 28 03:02:52 2019
    NOTE: cache deleting context for group OCR 5/0x8f5b2f9f
    GMON dismounting group 5 at 21 for pid 28, osid 45490
    NOTE: Disk OCR1 in mode 0x7f marked for de-assignment
    NOTE: Disk OCR2 in mode 0x7f marked for de-assignment
    NOTE: Disk OCR3 in mode 0x7f marked for de-assignment
    NOTE:Waiting for all pending writes to complete before de-registering: grpnum 5
    Sat Dec 28 03:03:03 2019
    WARNING: Waited 27 secs for write IO to PST disk 0 in group 1.
    WARNING: Waited 27 secs for write IO to PST disk 0 in group 1.
    WARNING: Waited 24 secs for write IO to PST disk 0 in group 2.
    WARNING: Waited 24 secs for write IO to PST disk 0 in group 2.
    WARNING: Waited 27 secs for write IO to PST disk 0 in group 3.
    WARNING: Waited 27 secs for write IO to PST disk 0 in group 3.
    WARNING: Waited 21 secs for write IO to PST disk 0 in group 4.
    WARNING: Waited 21 secs for write IO to PST disk 0 in group 4.
    WARNING: Waited 27 secs for write IO to PST disk 0 in group 6.
    WARNING: Waited 27 secs for write IO to PST disk 0 in group 6.
    Sat Dec 28 03:03:03 2019
    ASM Health Checker found 1 new failures
    Sat Dec 28 03:03:22 2019
    SUCCESS: diskgroup OCR was dismounted
    SUCCESS: alter diskgroup OCR dismount force / ASM SERVER:2405117855 /
    SUCCESS: ASM-initiated MANDATORY DISMOUNT of group OCR
    Sat Dec 28 03:03:22 2019
    NOTE: diskgroup resource ora.OCR.dg is offline
    Sat Dec 28 03:03:22 2019
    Errors in file /u01/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_ora_27831.trc:
    ORA-15078: ASM diskgroup was forcibly dismounted
    Errors in file /u01/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_ora_27831.trc:
    ORA-15078: ASM diskgroup was forcibly dismounted
    Errors in file /u01/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_ora_27831.trc:
    ORA-15078: ASM diskgroup was forcibly dismounted
    Errors in file /u01/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_ora_27831.trc:
    ORA-15078: ASM diskgroup was forcibly dismounted
    Errors in file /u01/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_ora_27831.trc:
    ORA-15078: ASM diskgroup was forcibly dismounted
    Errors in file /u01/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_ora_27831.trc:
    ORA-15078: ASM diskgroup was forcibly dismounted
    Errors in file /u01/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_ora_27831.trc:
    ORA-15078: ASM diskgroup was forcibly dismounted
    Errors in file /u01/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_ora_27831.trc:
    ORA-15078: ASM diskgroup was forcibly dismounted
    Errors in file /u01/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_ora_27831.trc:
    ORA-15078: ASM diskgroup was forcibly dismounted
    Sat Dec 28 05:30:34 2019
    WARNING: Waited 15 secs for write IO to PST disk 0 in group 1.
    WARNING: Waited 15 secs for write IO to PST disk 0 in group 1.
    WARNING: Waited 15 secs for write IO to PST disk 0 in group 6.
    WARNING: Waited 15 secs for write IO to PST disk 0 in group 6.
    Sat Dec 28 05:30:37 2019
    Received dirty detach msg from inst 1 for dom 5
    Sat Dec 28 05:30:37 2019
    List of instances:
    1 2
    Dirty detach reconfiguration started (new ddet inc 2, cluster inc 36)
    Global Resource Directory partially frozen for dirty detach
  • dirty detach - domain 5 invalid = TRUE
    0 GCS resources traversed, 0 cancelled
    freeing rdom 5
    Dirty Detach Reconfiguration complete
    Sat Dec 28 05:30:37 2019
    Errors in file /u01/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_ora_27831.trc:
    ORA-15078: ASM diskgroup was forcibly dismounted
    Errors in file /u01/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_ora_27831.trc:
    ORA-15078: ASM diskgroup was forcibly dismounted
    Errors in file /u01/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_ora_27831.trc:
    ORA-15078: ASM diskgroup was forcibly dismounted
    WARNING: requested mirror side 1 of virtual extent 5 logical extent 0 offset 704512 is not allocated; I/O request failed
    WARNING: requested mirror side 2 of virtual extent 5 logical extent 1 offset 704512 is not allocated; I/O request failed
    Errors in file /u01/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_ora_27831.trc:
    ORA-15078: ASM diskgroup was forcibly dismounted
    ORA-15078: ASM diskgroup was forcibly dismounted
    Sat Dec 28 05:30:37 2019
    SQL> alter diskgroup OCR check / proxy /
    ORA-15032: not all alterations performed
    ORA-15001: diskgroup "OCR" does not exist or is not mounted
    ERROR: alter diskgroup OCR check / proxy /
    Sat Dec 28 05:30:44 2019
    WARNING: Waited 20 secs for write IO to PST disk 0 in group 2.
    WARNING: Waited 20 secs for write IO to PST disk 0 in group 2.
    Sat Dec 28 05:30:48 2019
    NOTE: client exited [27819]
    Sat Dec 28 05:30:49 2019
    NOTE: [crsd.bin@db2 (TNS V1-V3) 142641] opening OCR file
    Sat Dec 28 05:30:51 2019
    NOTE: [crsd.bin@db2 (TNS V1-V3) 142660] opening OCR file
    Sat Dec 28 05:30:53 2019
    NOTE: [crsd.bin@db2 (TNS V1-V3) 142678] opening OCR file
    Sat Dec 28 05:30:55 2019
    NOTE: [crsd.bin@db2 (TNS V1-V3) 142696] opening OCR file
    Sat Dec 28 05:30:57 2019
    NOTE: [crsd.bin@db2 (TNS V1-V3) 142723] opening OCR file
    Sat Dec 28 05:30:59 2019
    NOTE: [crsd.bin@db2 (TNS V1-V3) 142744] opening OCR file
    Sat Dec 28 05:31:01 2019
    NOTE: [crsd.bin@db2 (TNS V1-V3) 142773] opening OCR file
    Sat Dec 28 05:31:03 2019
    NOTE: [crsd.bin@db2 (TNS V1-V3) 142792] opening OCR file
    Sat Dec 28 05:31:05 2019
    NOTE: [crsd.bin@db2 (TNS V1-V3) 142806] opening OCR file
    Sat Dec 28 05:31:07 2019
    NOTE: [crsd.bin@db2 (TNS V1-V3) 142821] opening OCR file
    Sat Dec 28 06:18:42 2019
    WARNING: Waited 15 secs for write IO to PST disk 0 in group 1.
    WARNING: Waited 15 secs for write IO to PST disk 0 in group 1.
    WARNING: Waited 15 secs for write IO to PST disk 0 in group 2.
    WARNING: Waited 15 secs for write IO to PST disk 0 in group 2.
    检查节点db2的alertdb2.log有如下报错,也是说不能访问OCR磁盘组的相关磁盘,但使用dd命令测试是可以访问的。

2019-12-28 05:30:48.019:
[/u01/app/11.2.0/grid/bin/oraagent.bin(28268)]CRS-5822:Agent '/u01/app/11.2.0/grid/bin/oraagent_oracle' disconnected from server. Details at (:CRSAGF00117:) {0:7:73} in /u01/app/11.2.0/grid/log/db2/agent/crsd/oraagent_oracle/oraagent_oracle.log.
2019-12-28 05:30:48.019:
[/u01/app/11.2.0/grid/bin/scriptagent.bin(37953)]CRS-5822:Agent '/u01/app/11.2.0/grid/bin/scriptagent_grid' disconnected from server. Details at (:CRSAGF00117:) {0:9:8} in /u01/app/11.2.0/grid/log/db2/agent/crsd/scriptagent_grid/scriptagent_grid.log.
2019-12-28 05:30:48.020:
[/u01/app/11.2.0/grid/bin/oraagent.bin(28009)]CRS-5822:Agent '/u01/app/11.2.0/grid/bin/oraagent_grid' disconnected from server. Details at (:CRSAGF00117:) {0:1:10} in /u01/app/11.2.0/grid/log/db2/agent/crsd/oraagent_grid/oraagent_grid.log.
2019-12-28 05:30:48.021:
[/u01/app/11.2.0/grid/bin/orarootagent.bin(28025)]CRS-5822:Agent '/u01/app/11.2.0/grid/bin/orarootagent_root' disconnected from server. Details at (:CRSAGF00117:) {0:5:373} in /u01/app/11.2.0/grid/log/db2/agent/crsd/orarootagent_root/orarootagent_root.log.
2019-12-28 05:30:48.024:
[ohasd(13034)]CRS-2765:Resource 'ora.crsd' has failed on server 'db2'.
2019-12-28 05:30:49.410:
[crsd(142641)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/db2/crsd/crsd.log.
2019-12-28 05:30:49.420:
[crsd(142641)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /u01/app/11.2.0/grid/log/db2/crsd/crsd.log.
2019-12-28 05:30:50.063:
[ohasd(13034)]CRS-2765:Resource 'ora.crsd' has failed on server 'db2'.
2019-12-28 05:30:51.442:
[crsd(142660)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/db2/crsd/crsd.log.
2019-12-28 05:30:51.451:
[crsd(142660)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /u01/app/11.2.0/grid/log/db2/crsd/crsd.log.
2019-12-28 05:30:52.100:
[ohasd(13034)]CRS-2765:Resource 'ora.crsd' has failed on server 'db2'.
2019-12-28 05:30:53.471:
[crsd(142678)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/db2/crsd/crsd.log.
2019-12-28 05:30:53.480:
[crsd(142678)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /u01/app/11.2.0/grid/log/db2/crsd/crsd.log.
2019-12-28 05:30:54.138:
[ohasd(13034)]CRS-2765:Resource 'ora.crsd' has failed on server 'db2'.
2019-12-28 05:30:55.507:
[crsd(142696)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/db2/crsd/crsd.log.
2019-12-28 05:30:55.517:
[crsd(142696)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /u01/app/11.2.0/grid/log/db2/crsd/crsd.log.
2019-12-28 05:30:56.176:
[ohasd(13034)]CRS-2765:Resource 'ora.crsd' has failed on server 'db2'.
2019-12-28 05:30:57.551:
[crsd(142723)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/db2/crsd/crsd.log.
2019-12-28 05:30:57.560:
[crsd(142723)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /u01/app/11.2.0/grid/log/db2/crsd/crsd.log.
2019-12-28 05:30:58.216:
[ohasd(13034)]CRS-2765:Resource 'ora.crsd' has failed on server 'db2'.
2019-12-28 05:30:59.592:
[crsd(142744)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/db2/crsd/crsd.log.
2019-12-28 05:30:59.602:
[crsd(142744)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /u01/app/11.2.0/grid/log/db2/crsd/crsd.log.
2019-12-28 05:31:00.253:
[ohasd(13034)]CRS-2765:Resource 'ora.crsd' has failed on server 'db2'.
2019-12-28 05:31:01.627:
[crsd(142773)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/db2/crsd/crsd.log.
2019-12-28 05:31:01.636:
[crsd(142773)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /u01/app/11.2.0/grid/log/db2/crsd/crsd.log.
2019-12-28 05:31:02.290:
[ohasd(13034)]CRS-2765:Resource 'ora.crsd' has failed on server 'db2'.
2019-12-28 05:31:03.658:
[crsd(142792)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/db2/crsd/crsd.log.
2019-12-28 05:31:03.668:
[crsd(142792)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /u01/app/11.2.0/grid/log/db2/crsd/crsd.log.
2019-12-28 05:31:04.327:
[ohasd(13034)]CRS-2765:Resource 'ora.crsd' has failed on server 'db2'.
2019-12-28 05:31:05.701:
[crsd(142806)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/db2/crsd/crsd.log.
2019-12-28 05:31:05.711:
[crsd(142806)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /u01/app/11.2.0/grid/log/db2/crsd/crsd.log.
2019-12-28 05:31:06.365:
[ohasd(13034)]CRS-2765:Resource 'ora.crsd' has failed on server 'db2'.
2019-12-28 05:31:07.726:
[crsd(142821)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/db2/crsd/crsd.log.
2019-12-28 05:31:07.735:
[crsd(142821)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /u01/app/11.2.0/grid/log/db2/crsd/crsd.log.
2019-12-28 05:31:08.402:
[ohasd(13034)]CRS-2765:Resource 'ora.crsd' has failed on server 'db2'.
2019-12-28 05:31:08.402:
[ohasd(13034)]CRS-2771:Maximum restart attempts reached for resource 'ora.crsd'; will not restart.
2019-12-28 05:31:08.403:
[ohasd(13034)]CRS-2769:Unable to failover resource 'ora.crsd'.
检查节点db2的oraagent_grid.log文件有如下报错:

2019-12-28 05:29:59.329: [ AGFW][3601811200]{2:6928:2} Agent received the message: AGENT_HB[Engine] ID 12293:273304
2019-12-28 05:30:17.162: [ora.LISTENER_SCAN2.lsnr][3592312576]{1:34166:403} [check] Utils:execCmd action = 3 flags = 38 ohome = (null) cmdname = lsnrctl.
2019-12-28 05:30:17.267: [ora.LISTENER_SCAN2.lsnr][3592312576]{1:34166:403} [check] execCmd ret = 0
2019-12-28 05:30:17.267: [ora.LISTENER_SCAN2.lsnr][3592312576]{1:34166:403} [check] CrsCmd::ClscrsCmdData::stat entity 5 statflag 32 useFilter 1
2019-12-28 05:30:17.298: [ora.LISTENER_SCAN2.lsnr][3592312576]{1:34166:403} [check] ScanLsnrAgent::checkDependentVipResource: scanVipResource = ora.scan2.vip, statRet = 0
2019-12-28 05:30:17.881: [ora.LISTENER_SCAN3.lsnr][2950686464]{1:34166:403} [check] Utils:execCmd action = 3 flags = 38 ohome = (null) cmdname = lsnrctl.
2019-12-28 05:30:17.986: [ora.LISTENER_SCAN3.lsnr][2950686464]{1:34166:403} [check] execCmd ret = 0
2019-12-28 05:30:17.987: [ora.LISTENER_SCAN3.lsnr][2950686464]{1:34166:403} [check] CrsCmd::ClscrsCmdData::stat entity 5 statflag 32 useFilter 1
2019-12-28 05:30:18.019: [ora.LISTENER_SCAN3.lsnr][2950686464]{1:34166:403} [check] ScanLsnrAgent::checkDependentVipResource: scanVipResource = ora.scan3.vip, statRet = 0
2019-12-28 05:30:27.292: [ora.asm][2950686464]{2:6928:2} [check] CrsCmd::ClscrsCmdData::stat entity 1 statflag 33 useFilter 0
2019-12-28 05:30:27.319: [ora.asm][2950686464]{2:6928:2} [check] AsmProxyAgent::check clsagfw_res_status 0
2019-12-28 05:30:34.522: [ora.ons][2950686464]{2:6928:2} [check] getOracleHomeAttrib: oracle_home = /u01/app/11.2.0/grid
2019-12-28 05:30:34.522: [ora.ons][2950686464]{2:6928:2} [check] Utils:execCmd action = 3 flags = 6 ohome = /u01/app/11.2.0/grid/opmn/ cmdname = onsctli.
2019-12-28 05:30:34.627: [ora.ons][2950686464]{2:6928:2} [check] (:CLSN00010:)ons is running ...
2019-12-28 05:30:34.627: [ora.ons][2950686464]{2:6928:2} [check] (:CLSN00010:)
2019-12-28 05:30:34.628: [ora.ons][2950686464]{2:6928:2} [check] execCmd ret = 0
2019-12-28 05:30:37.858: [ USRTHRD][3575973632]{1:30748:9373} Processing the event CRS_RESOURCE_STATE_CHANGE
2019-12-28 05:30:38.652: [ora.LISTENER.lsnr][3594413824]{2:6928:2} [check] Utils:execCmd action = 3 flags = 38 ohome = (null) cmdname = lsnrctl.
2019-12-28 05:30:38.757: [ora.LISTENER.lsnr][3594413824]{2:6928:2} [check] execCmd ret = 0
2019-12-28 05:30:48.017: [ CRSCOMM][3612317440] IpcC: IPC client connection 6c to member 0 has been removed
2019-12-28 05:30:48.017: [CLSFRAME][3612317440] Removing IPC Member:{Relative|Node:0|Process:0|Type:1}
2019-12-28 05:30:48.017: [CLSFRAME][3612317440] Disconnected from CRSD:db2 process: {Relative|Node:0|Process:0|Type:1}
2019-12-28 05:30:48.020: [ AGENT][3601811200]{0:1:10} {0:1:10} Created alert : (:CRSAGF00117:) : Disconnected from server, Agent is shutting down.
2019-12-28 05:30:48.020: [ AGFW][3601811200]{0:1:10} Agent is exiting with exit code: 1
检查2个节点的/var/log/messages日志,发现2个节点均有多路径相关的错误信息,但相关的磁盘是用于备份的,不是用于生产数据库所使用的

Dec 30 05:25:31 db1 multipathd: backup2: sdcr - emc_clariion_checker: query command indicates error
Dec 30 05:25:31 db1 multipathd: checker failed path 69:240 in map backup2
Dec 30 05:25:31 db1 kernel: device-mapper: multipath: Failing path 69:240.
Dec 30 05:25:31 db1 multipathd: backup2: remaining active paths: 3
Dec 30 05:25:37 db1 multipathd: backup2: sdcr - emc_clariion_checker: Active path is healthy.
Dec 30 05:25:37 db1 multipathd: 69:240: reinstated
Dec 30 05:25:37 db1 multipathd: backup2: remaining active paths: 4
Dec 30 05:25:37 db1 kernel: sd 5:0:3:2: emc: ALUA failover mode detected
Dec 30 05:25:37 db1 kernel: sd 5:0:3:2: emc: at SP A Port 5 (owned, default SP A)
Dec 30 05:26:03 db1 kernel: qla2xxx [0000:05:00.1]-801c:5: Abort command issued nexus=5:3:4 -- 1 2002.
Dec 30 06:03:35 db1 CLSD: The clock on host db1 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.
Dec 30 06:15:23 db1 multipathd: backup3: sdcq - emc_clariion_checker: Read error for WWN 600601608b203300d563752524c1e611. Sense data are 0x0/0x0/0x0.
Dec 30 06:15:23 db1 kernel: qla2xxx [0000:05:00.1]-801c:5: Abort command issued nexus=5:3:1 -- 1 2002.
Dec 30 06:15:23 db1 kernel: device-mapper: multipath: Failing path 69:224.
Dec 30 06:15:23 db1 multipathd: checker failed path 69:224 in map backup3
Dec 30 06:15:23 db1 multipathd: backup3: remaining active paths: 3
Dec 30 06:15:28 db1 multipathd: backup3: sdcq - emc_clariion_checker: Active path is healthy.
Dec 30 06:15:28 db1 multipathd: 69:224: reinstated
Dec 30 06:15:28 db1 multipathd: backup3: remaining active paths: 4
Dec 30 06:15:28 db1 kernel: sd 5:0:3:1: emc: ALUA failover mode detected
Dec 30 06:15:28 db1 kernel: sd 5:0:3:1: emc: at SP A Port 5 (owned, default SP A)
Dec 30 06:59:29 db1 CLSD: The clock on host db1 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.
Dec 30 07:53:22 db1 CLSD: The clock on host db1 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.
Dec 30 07:55:11 db1 multipathd: sdct: couldn't get asymmetric access state
Dec 30 07:55:11 db1 multipathd: backup4: load table [0 2147483648 multipath 2 queue_if_no_path retain_attached_hw_handler 1 emc 2 1 round-robin 0 2 1 70:16 1 66:240 1 round-robin 0 2 1 8:64 1 67:80 1]
Dec 30 07:55:11 db1 kernel: sd 5:0:3:4: emc: ALUA failover mode detected
Dec 30 07:55:11 db1 kernel: sd 5:0:3:4: emc: at SP A Port 5 (owned, default SP A)
Dec 30 07:55:11 db1 kernel: sd 4:0:3:4: emc: ALUA failover mode detected
Dec 30 07:55:11 db1 kernel: sd 4:0:3:4: emc: at SP A Port 4 (owned, default SP A)
Dec 30 07:55:35 db1 multipathd: backup2: sdcr - emc_clariion_checker: Read error for WWN 600601608b203300d663752524c1e611. Sense data are 0x0/0x0/0x0.
Dec 30 07:55:35 db1 multipathd: checker failed path 69:240 in map backup2
Dec 30 07:55:35 db1 multipathd: backup2: remaining active paths: 3
Dec 30 07:55:35 db1 kernel: device-mapper: multipath: Failing path 69:240.
Dec 30 07:55:40 db1 multipathd: backup2: sdcr - emc_clariion_checker: Active path is healthy.
Dec 30 07:55:40 db1 multipathd: 69:240: reinstated
Dec 30 07:55:40 db1 multipathd: backup2: remaining active paths: 4
Dec 30 07:55:40 db1 kernel: sd 5:0:3:2: emc: ALUA failover mode detected
Dec 30 07:55:40 db1 kernel: sd 5:0:3:2: emc: at SP A Port 5 (owned, default SP A)
Dec 30 08:39:47 db1 CLSD: The clock on host db1 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.
Dec 30 08:43:36 db1 multipathd: mpathb: load table [0 20971520 multipath 2 queue_if_no_path retain_attached_hw_handler 1 emc 2 1 round-robin 0 2 1 69:208 1 66:176 1 round-robin 0 2 1 8:0 1 67:16 1]
Dec 30 08:43:36 db1 kernel: sd 5:0:3:0: emc: ALUA failover mode detected
Dec 30 08:43:36 db1 kernel: sd 5:0:3:0: emc: at SP A Port 5 (owned, default SP A)
Dec 30 08:43:36 db1 kernel: sd 4:0:3:0: emc: ALUA failover mode detected
Dec 30 08:43:36 db1 kernel: sd 4:0:3:0: emc: at SP A Port 4 (owned, default SP A)
Dec 30 09:24:04 db1 CLSD: The clock on host db1 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.
Dec 30 10:13:09 db1 CLSD: The clock on host db1 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.
Dec 30 11:06:07 db1 CLSD: The clock on host db1 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.
Dec 30 12:07:36 db1 CLSD: The clock on host db1 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.
Dec 30 13:08:58 db1 CLSD: The clock on host db1 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.
Dec 30 14:00:19 db1 CLSD: The clock on host db1 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.
Dec 30 14:52:20 db1 CLSD: The clock on host db1 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.
Dec 30 15:40:45 db1 CLSD: The clock on host db1 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.
Dec 30 16:34:38 db1 CLSD: The clock on host db1 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.
Dec 30 17:09:56 db1 auditd[15975]: Audit daemon rotating log files
Dec 30 17:38:16 db1 CLSD: The clock on host db1 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.
Dec 30 18:59:38 db1 CLSD: The clock on host db1 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.
Dec 30 19:54:43 db1 CLSD: The clock on host db1 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.
经查看集群相关日志可以确定,由于存储磁盘出现IO问题(或光线闪断、或IO延迟),导致集群CRS异常宕机。但是,比较奇怪的是,虽然CSR掉线了,ASM实例和DB实例却好好的,还可以正常使用。查询oracle support发现一篇文章1581864.1?提到ASM CRS仲裁盘访问超时与隐藏参数_asm_hbeatiowait有关系,而ASM的隐藏参数_asm_hbeatiowait由于操作系统多路径Multipath配置的polling_interval有关,具体的故障原因是操作系统盘的判断访问超时远大于数据库ASM仲裁盘访问超时,导致ORACLE RAC判定ASM中仲裁盘无法访问从而将仲裁盘强制Offline。解决的思路是:首先,确定操作系统polling_interval参数与数据库ASM隐藏参数值_asm_hbeatiowait,将_asm_hbeatiowait的值调整到比polling_interval值大即可。
下面是具体的解决操作:
1、查看数据库RAC ASM的_asm_hbeatiowait值(默认是15秒):

SQL> col ksppinm for a20
SQL> col ksppstvl for a40
SQL> col ksppdesc for a80
SQL> SELECT ksppinm, ksppstvl, ksppdesc
2 FROM x$ksppi x, x$ksppcv y
3 WHERE x.indx = y.indx AND ksppinm = '_asm_hbeatiowait';

KSPPINM KSPPSTVL KSPPDESC


_asm_hbeatiowait 15 number of secs to wait for PST Async Hbeat IO return
2、查看操作存储盘访问超时时间(Oracle Linux 6.7默认是30秒)

[root@db1 ~]# cat /sys/block/sdb/device/timeout
30

[root@db2 ~]# cat /sys/block/sdb/device/timeout
30
3、将_asm_hbeatiowait 的值调整为45秒(该参数是静态参数,需要重启集群)

SQL> alter system set "_asm_hbeatiowait"=45 scope=spfile sid='*';

System altered.

4.重启集群

5.检查集群状态

[grid@db1 ~]$ crsctl stat res -t

NAME TARGET STATE SERVER STATE_DETAILS

Local Resources

ora.ARCH.dg
ONLINE ONLINE db1
ONLINE ONLINE db2
ora.CWDATA.dg
ONLINE ONLINE db1
ONLINE ONLINE db2
ora.DADATA.dg
ONLINE ONLINE db1
ONLINE ONLINE db2
ora.DATA.dg
ONLINE ONLINE db1
ONLINE ONLINE db2
ora.LISTENER.lsnr
ONLINE ONLINE db1
ONLINE ONLINE db2
ora.OCR.dg
ONLINE ONLINE db1
ONLINE ONLINE db2
ora.SBKDATA.dg
ONLINE ONLINE db1
ONLINE ONLINE db2
ora.asm
ONLINE ONLINE db1 Started
ONLINE ONLINE db2 Started
ora.gsd
OFFLINE OFFLINE db1
OFFLINE OFFLINE db2
ora.net1.network
ONLINE ONLINE db1
ONLINE ONLINE db2
ora.ons
ONLINE ONLINE db1

           ONLINE  ONLINE       db2                                          

Cluster Resources

ora.LISTENER_SCAN1.lsnr
1 ONLINE ONLINE db2
ora.LISTENER_SCAN2.lsnr
1 ONLINE ONLINE db1
ora.LISTENER_SCAN3.lsnr
1 ONLINE ONLINE db1
ora.caiwu.db
1 ONLINE ONLINE db1 Open
2 ONLINE ONLINE db2 Open
ora.chdyl.db
1 ONLINE ONLINE db1 Open
2 ONLINE ONLINE db2 Open
ora.cvu
1 ONLINE ONLINE db1
ora.dadb.db
1 ONLINE ONLINE db1 Open
2 ONLINE ONLINE db2 Open
ora.db1.vip
1 ONLINE ONLINE db1
ora.db2.vip
1 ONLINE ONLINE db2
ora.oc4j
1 ONLINE ONLINE db1
ora.rlzy.db
1 ONLINE ONLINE db1 Open
2 ONLINE ONLINE db2 Open
ora.scan1.vip
1 ONLINE ONLINE db2
ora.scan2.vip
1 ONLINE ONLINE db1
ora.scan3.vip
1 ONLINE ONLINE db1

到此处理完成。

目录
相关文章
|
7天前
|
消息中间件 Java Kafka
【手把手教你Linux环境下快速搭建Kafka集群】内含脚本分发教程,实现一键部署多个Kafka节点
本文介绍了Kafka集群的搭建过程,涵盖从虚拟机安装到集群测试的详细步骤。首先规划了集群架构,包括三台Kafka Broker节点,并说明了分布式环境下的服务进程配置。接着,通过VMware导入模板机并克隆出三台虚拟机(kafka-broker1、kafka-broker2、kafka-broker3),分别设置IP地址和主机名。随后,依次安装JDK、ZooKeeper和Kafka,并配置相应的环境变量与启动脚本,确保各组件能正常运行。最后,通过编写启停脚本简化集群的操作流程,并对集群进行测试,验证其功能完整性。整个过程强调了自动化脚本的应用,提高了部署效率。
【手把手教你Linux环境下快速搭建Kafka集群】内含脚本分发教程,实现一键部署多个Kafka节点
|
2月前
|
Oracle Cloud Native 关系型数据库
Oracle Linux 9.5 正式版发布 - Oracle 提供支持 RHEL 兼容发行版
Oracle Linux 9.5 正式版发布 - Oracle 提供支持 RHEL 兼容发行版
69 10
Oracle Linux 9.5 正式版发布 - Oracle 提供支持 RHEL 兼容发行版
|
2月前
|
监控 Oracle 关系型数据库
Linux平台Oracle开机自启动设置
【11月更文挑战第8天】在 Linux 平台设置 Oracle 开机自启动有多种方法,本文以 CentOS 为例,介绍了两种常见方法:使用 `rc.local` 文件(较简单但不推荐用于生产环境)和使用 `systemd` 服务(推荐)。具体步骤包括编写启动脚本、赋予执行权限、配置 `rc.local` 或创建 `systemd` 服务单元文件,并设置开机自启动。通过 `systemd` 方式可以更好地与系统启动过程集成,更规范和可靠。
141 2
|
2月前
|
Oracle Ubuntu 关系型数据库
Linux平台Oracle开机自启动设置
【11月更文挑战第7天】本文介绍了 Linux 系统中服务管理机制,并详细说明了如何在使用 systemd 和 System V 的系统上设置 Oracle 数据库的开机自启动。包括创建服务单元文件、编辑启动脚本、设置开机自启动和启动服务的具体步骤。最后建议重启系统验证设置是否成功。
|
3月前
|
存储 Oracle 关系型数据库
|
5月前
|
运维 Oracle 前端开发
Oracle 11g RAC集群日常运维命令总结
Oracle 11g RAC集群日常运维命令总结
116 2
|
5月前
|
运维 负载均衡 监控
"Linux高可用集群背后的神秘力量:揭秘心跳机制,如何确保服务永不掉线?"
【8月更文挑战第21天】今天探讨Linux高可用集群中的心跳机制——节点间定期发送信号以确认彼此状态的关键技术。它主要用于故障检测、负载均衡及资源接管。示例代码展示如何使用Corosync+Pacemaker配置心跳,确保服务连续性与可靠性。正确配置心跳机制能够显著提升系统的稳定性。
83 1
|
6月前
|
Oracle 关系型数据库 Linux
讲解linux下的Qt如何编译oracle的驱动库libqsqloci.so
通过这一连串的步骤,可以专业且有效地在Linux下为Qt编译Oracle驱动库 `libqsqloci.so`,使得Qt应用能够通过OCI与Oracle数据库进行交互。这些步骤适用于具备一定Linux和Qt经验的开发者,并且能够为需要使用Qt开发数据库应用的专业人士提供指导。
187 1
讲解linux下的Qt如何编译oracle的驱动库libqsqloci.so
|
5月前
|
存储 分布式计算 负载均衡
在Linux中,什么是集群,并且列出常见的集群技术。
在Linux中,什么是集群,并且列出常见的集群技术。
|
5月前
|
负载均衡 监控 算法
在Linux中,如何配置和管理LVS集群?
在Linux中,如何配置和管理LVS集群?