Oracle集群(RAC)时间同步(ntp和CTSS)

本文涉及的产品
日志服务 SLS,月写入数据量 50GB 1个月
简介:

Oracle集群(RAC)时间同步(ntp和CTSS)




http://blog.itpub.net/26736162/viewspace-2157130/

 

crsctl stat res -t -init

ps -ef|grep ctss

crsctl check ctss

cluvfy comp clocksync -n all -verbose

 

 crsctl start res ora.ctssd -init

 crsctl stop res ora.ctssd -init

 

 

Network Time Protocol Setting

You have two options for time synchronization: an operating system configured network time protocol (NTP), or Oracle Cluster Time Synchronization Service.

Oracle Cluster Time Synchronization Service is designed for organizations whose cluster servers are unable to access NTP services.

If you use NTP, then the Oracle Cluster Time Synchronization daemon (ctssd) starts up in observer mode. If you do not have NTP daemons, then ctssd starts up in active mode and synchronizes time among cluster members without contacting an external time server..

 

可以采用操作系统的NTP服务,也可以使用Oracle自带的服务ctss,如果ntp没有启用,那么Oracle会自动启用自己的ctssd进程。

oracle 11gR2 RAC开始使用Cluster Time Synchronization Service(CTSS)同步各节点的时间,当安装程序发现NTP协议处于非活动状态时,安装集群时间同步服务将以活动模式(active)自动进行安装并同步所有节点的时间。如果发现配置了 NTP,则以观察者模式(observer mode)启动集群时间同步服务,Oracle Clusterware不会在集群中进行活动的时间同步。

RAC中,集群的时间应该是保持同步的,否则可能导致很多问题,例如:依赖于时间的应用会造成数据的错误,各种日志打印的顺序紊乱,这将会影响问题的诊断,严重的可能会导致集群宕机或者重新启动集群时节点无法加入集群。

Oracle 11gR2前,集群的时间是由NTP同步的,而在11gR2后,Oracle引入了CTSS组件,如果系统没有配置NTP,则由CTSS来同步集群时间。

NTP和CTSS是可以共存的,且NTP的优先级要高于CTSS,也就是说,如果系统中同时有NTPCTSS,那么集群的时间是由NTP同步的,CTSS会处于观望(Observer)模式,只有当集群关闭所有的NTP服务CTSS才会处于激活(Active)模式。在一个集群中,只要有一个节点的ntp处于活动状态,那么集群的所有节点的CTSS都会处于激活(Active)模式。

需要注意的是,要让CTSS处于激活(Active)模式,则不仅要关闭ntp服务(/sbin/service ntpd stop),还要删除/etc/ntp.conf文件(mv /etc/ntp.conf /etc/ntp.conf.bak),否则不能启用CTSS

 

1.1.1      CTSS同步模式

关闭NTP

/sbin/service ntpd stop

mv /etc/ntp.conf /etc/ntp.conf.bak

service ntpd status

chkconfig ntpd off

 

[root@raclhr-11gR2-N2 ~]# ps -ef|grep ctss

root     19678     1  0 19:22 ?        00:00:02 /u01/app/11.2.0/grid/bin/octssd.bin reboot

root     20970 20623  0 19:35 pts/4    00:00:00 grep ctss

[root@raclhr-11gR2-N2 ~]#

[root@raclhr-11gR2-N2 ~]# crsctl stat res -t -init

--------------------------------------------------------------------------------

NAME           TARGET  STATE        SERVER                   STATE_DETAILS      

--------------------------------------------------------------------------------

Cluster Resources

--------------------------------------------------------------------------------

ora.asm

      1        ONLINE  ONLINE       raclhr-11gr2-n2          Started            

ora.cluster_interconnect.haip

      1        ONLINE  ONLINE       raclhr-11gr2-n2                             

ora.crf

      1        ONLINE  ONLINE       raclhr-11gr2-n2                             

ora.crsd

      1        ONLINE  ONLINE       raclhr-11gr2-n2                             

ora.cssd

      1        ONLINE  ONLINE       raclhr-11gr2-n2                             

ora.cssdmonitor

      1        ONLINE  ONLINE       raclhr-11gr2-n2                             

ora.ctssd

      1        ONLINE  ONLINE       raclhr-11gr2-n2          ACTIVE:0           

ora.diskmon

      1        OFFLINE OFFLINE                                                  

ora.evmd

      1        ONLINE  ONLINE       raclhr-11gr2-n2                             

ora.gipcd

      1        ONLINE  ONLINE       raclhr-11gr2-n2                             

ora.gpnpd

      1        ONLINE  ONLINE       raclhr-11gr2-n2                             

ora.mdnsd

      1        ONLINE  ONLINE       raclhr-11gr2-n2                             

[root@raclhr-11gR2-N2 ~]#

 

节点1ctss状态:

[root@raclhr-11gR2-N1 ~]# crsctl check ctss

CRS-4701: The Cluster Time Synchronization Service is in Active mode.

CRS-4702: Offset (in msec): 0

[root@raclhr-11gR2-N1 ~]#

节点1octssd的日志:

/u01/app/11.2.0/grid/log/raclhr-11gr2-n1/ctssd/octssd.log

2018-06-30 19:25:56.369: [    CTSS][899475200]sclsctss_gvss2: NTP default pid file not found

2018-06-30 19:25:56.369: [    CTSS][899475200]sclsctss_gvss8: Return [0] and NTP status [1].

2018-06-30 19:25:56.369: [    CTSS][899475200]ctss_check_vendor_sw: Vendor time sync software is not detected. status [1].

2018-06-30 19:25:57.002: [    CTSS][916338432]ctss_checkcb: clsdm requested check alive. checkcb_data{mode[0xcc], offset[0 ms]}, length=[8].

2018-06-30 19:26:01.263: [    CTSS][901576448]ctsscomm_recv_cb2: Receive incoming message event. Msgtype [1].

2018-06-30 19:26:01.264: [    CTSS][901576448]ctsscomm_msg_hndlr: Received sync msg

2018-06-30 19:26:01.264: [    CTSS][901576448]ctsscomm_msg_hndlr: Received from slave ( mode [0xc4] nodenum [2] hostname [raclhr-11gr2-n2] )

2018-06-30 19:26:09.267: [    CTSS][901576448]ctsscomm_recv_cb2: Receive incoming message event. Msgtype [1].

节点1octssd.log中记录没有发现ntp服务,ctss服务为激活模式。

 

节点2ctss状态:

[root@raclhr-11gR2-N2 ~]# crsctl check ctss

CRS-4701: The Cluster Time Synchronization Service is in Active mode.

CRS-4702: Offset (in msec): 0

[root@raclhr-11gR2-N2 ~]#

节点2octssd的日志:

/u01/app/11.2.0/grid/log/raclhr-11gr2-n2/ctssd/octssd.log

2018-06-30 19:28:49.539: [    CTSS][839321344]sclsctss_gvss2: NTP default pid file not found

2018-06-30 19:28:49.539: [    CTSS][839321344]sclsctss_gvss8: Return [0] and NTP status [1].

2018-06-30 19:28:49.539: [    CTSS][839321344]ctss_check_vendor_sw: Vendor time sync software is not detected. status [1].

2018-06-30 19:29:05.544: [    CTSS][839321344]ctsselect_msm: CTSS mode is [0xc4]

2018-06-30 19:29:05.544: [    CTSS][839321344]ctssslave_swm1_2: Ready to initiate new time sync process.

2018-06-30 19:29:05.545: [    CTSS][839321344]ctssslave_swm2_1: Waiting for time sync message from master. sync_state[2].

2018-06-30 19:29:05.546: [    CTSS][845625088]ctsscomm_recv_cb2: Receive incoming message event. Msgtype [2].

2018-06-30 19:29:05.546: [    CTSS][845625088]ctssslave_msg_handler4_1: Waiting for slave_sync_with_master to finish sync process. sync_state[3].

2018-06-30 19:29:05.547: [    CTSS][839321344]ctssslave_swm2_3: Received time sync message from master.

2018-06-30 19:29:05.547: [    CTSS][839321344]ctssslave_swm: The system time difference is too small [243] usec. Not adjusting time.

2018-06-30 19:29:05.547: [    CTSS][839321344]ctssslave_swm17: LT [1530358145sec 546888usec], MT [1530358145sec 140655884523349usec], Delta [2314usec]

2018-06-30 19:29:05.547: [    CTSS][839321344]ctssslave_swm19: The offset is [243 usec] and sync interval set to [1]

2018-06-30 19:29:05.547: [    CTSS][839321344]ctssslave_swm: Received from master (mode [0xcc] nodenum [1] hostname [raclhr-11gr2-n1] )

2018-06-30 19:29:05.547: [    CTSS][839321344]ctsselect_msm: Sync interval returned in [1]

2018-06-30 19:29:05.547: [    CTSS][845625088]ctssslave_msg_handler4_3: slave_sync_with_master finished sync process. Exiting clsctssslave_msg_handler

2018-06-30 19:29:07.910: [    CTSS][860387072]ctss_checkcb: clsdm requested check alive. checkcb_data{mode[0xc4], offset[0 ms]}, length=[8].

节点2octssd.log中记录没有发现ntp服务,ctss服务为激活模式,同步时间的主节点是节点1,并且会告诉集群的时间有差异,但是因为差异过小,无需调整。

 

校验集群的时间:

 cluvfy comp clocksync -n all -verbose

 

虽然集群时间不一致,但是这种情况下校验结果是通过的,而且略微的差异范围内集群也会自动同步回来。

[grid@raclhr-11gR2-N1 ~]$  cluvfy comp clocksync -n all -verbose

 

Verifying Clock Synchronization across the cluster nodes

 

Checking if Clusterware is installed on all nodes...

Check of Clusterware install passed

 

Checking if CTSS Resource is running on all nodes...

Check: CTSS Resource running on all nodes

  Node Name                             Status                 

  ------------------------------------  ------------------------

  raclhr-11gr2-n2                       passed                 

  raclhr-11gr2-n1                       passed                  

Result: CTSS resource check passed

 

 

Querying CTSS for time offset on all nodes...

Result: Query of CTSS for time offset passed

 

Check CTSS state started...

Check: CTSS state

  Node Name                             State                  

  ------------------------------------  ------------------------

  raclhr-11gr2-n2                       Active                 

  raclhr-11gr2-n1                       Active                 

CTSS is in Active state. Proceeding with check of clock time offsets on all nodes...

Reference Time Offset Limit: 1000.0 msecs

Check: Reference Time Offset

  Node Name     Time Offset               Status                 

  ------------  ------------------------  ------------------------

  raclhr-11gr2-n2  0.0                       passed                 

  raclhr-11gr2-n1  0.0                       passed                 

 

Time offset is within the specified limits on the following set of nodes:

"[raclhr-11gr2-n2, raclhr-11gr2-n1]"

Result: Check of clock time offsets passed

 

 

Oracle Cluster Time Synchronization Services check passed

 

Verification of Clock Synchronization across the cluster nodes was successful.

 

1.1.2      NTP同步模式

开启NTP:

mv /etc/ntp.conf.bak /etc/ntp.conf

service ntpd status

/sbin/service ntpd start

# chkconfig ntpd off

ps -ef|grep ntp

 

节点1

[root@raclhr-11gR2-N1 ~]# crsctl check ctss

CRS-4700: The Cluster Time Synchronization Service is in Observer mode.

 

[root@raclhr-11gR2-N1 ~]#  crsctl stat res -t -init

ora.ctssd

      1        ONLINE  ONLINE       raclhr-11gr2-n1          OBSERVER           

 

节点1ctss日志:

/u01/app/11.2.0/grid/log/raclhr-11gr2-n1/ctssd/octssd.log

2018-06-30 20:51:29.388: [    CTSS][899475200]sclsctss_gvss1: NTP default config file found

2018-06-30 20:51:29.389: [    CTSS][899475200]sclsctss_gvss8: Return [0] and NTP status [2].

2018-06-30 20:51:29.389: [    CTSS][899475200]ctss_check_vendor_sw: Vendor time sync software is detected. status [2].

2018-06-30 20:51:29.389: [    CTSS][899475200]ctss_check_vendor_sw: Ctssd is switching to observer role

2018-06-30 20:51:29.389: [    CTSS][899475200]clsctsselect_update_mbrdata: Updating pridata: { version[1] node[1] swversion[186647296] mode[0xee] }.

2018-06-30 20:51:29.639: [  CRSCCL][671086336]clsCclGetPriMemberData: Detected pridata change for node[1]. Retrieving it to the cache.

2018-06-30 20:51:31.434: [    CTSS][916338432]ctss_checkcb: clsdm requested check alive. checkcb_data{mode[0xee], offset[0 ms]}, length=[8].

2018-06-30 20:51:35.258: [    CTSS][901576448]ctsscomm_recv_cb2: Receive incoming message event. Msgtype [1].

2018-06-30 20:51:35.258: [    CTSS][901576448]ctsscomm_msg_hndlr: Received sync msg

2018-06-30 20:51:35.259: [    CTSS][901576448]ctsscomm_msg_hndlr: Received from slave ( mode [0xc4] nodenum [2] hostname [raclhr-11gr2-n2] )

2018-06-30 20:51:35.656: [  CRSCCL][671086336]clsCclGetPriMemberData: Detected pridata change for node[2]. Retrieving it to the cache.

2018-06-30 20:51:43.240: [    CTSS][901576448]ctsscomm_recv_cb2: Receive incoming message event. Msgtype [1].

2018-06-30 20:51:43.240: [    CTSS][901576448]ctsscomm_msg_hndlr: Received sync msg

2018-06-30 20:51:43.240: [    CTSS][901576448]ctsscomm_msg_hndlr: Received from slave ( mode [0xc6] nodenum [2] hostname [raclhr-11gr2-n2] )

2018-06-30 20:51:51.217: [    CTSS][901576448]ctsscomm_recv_cb2: Receive incoming message event. Msgtype [1].

2018-06-30 20:51:51.217: [    CTSS][901576448]ctsscomm_msg_hndlr: Received sync msg

2018-06-30 20:51:51.218: [    CTSS][901576448]ctsscomm_msg_hndlr: Received from slave ( mode [0xc6] nodenum [2] hostname [raclhr-11gr2-n2] )

2018-06-30 20:51:59.194: [    CTSS][901576448]ctsscomm_recv_cb2: Receive incoming message event. Msgtype [1].

2018-06-30 20:51:59.194: [    CTSS][901576448]ctsscomm_msg_hndlr: Received sync msg

2018-06-30 20:51:59.195: [    CTSS][901576448]ctsscomm_msg_hndlr: Received from slave ( mode [0xc6] nodenum [2] hostname [raclhr-11gr2-n2] )

节点1octssd.log中记录发现ntp服务,ctss服务会自动切换到观望模式。

 

2018-06-30 20:57:27.608: [    CTSS][839321344]ctsselect_msm: CTSS mode is [0xc6]

2018-06-30 20:57:27.608: [    CTSS][839321344]ctssslave_swm1_2: Ready to initiate new time sync process.

2018-06-30 20:57:27.609: [    CTSS][839321344]ctssslave_swm2_1: Waiting for time sync message from master. sync_state[2].

2018-06-30 20:57:27.612: [    CTSS][845625088]ctsscomm_recv_cb2: Receive incoming message event. Msgtype [2].

2018-06-30 20:57:27.613: [    CTSS][845625088]ctssslave_msg_handler4_1: Waiting for slave_sync_with_master to finish sync process. sync_state[3].

2018-06-30 20:57:27.613: [    CTSS][839321344]ctssslave_swm2_3: Received time sync message from master.

2018-06-30 20:57:27.613: [    CTSS][839321344]ctssslave_swm17: LT [1530363447sec 613028usec], MT [1530363447sec 140655884569984usec], Delta [4410usec]

2018-06-30 20:57:27.613: [    CTSS][839321344]ctssslave_swm19: The offset is [19748 usec] and sync interval set to [1]

2018-06-30 20:57:27.613: [    CTSS][839321344]ctssslave_swm: Received from master (mode [0xee] nodenum [1] hostname [raclhr-11gr2-n1] )

2018-06-30 20:57:27.613: [    CTSS][839321344]ctsselect_msm: Sync interval returned in [1]

2018-06-30 20:57:27.613: [    CTSS][845625088]ctssslave_msg_handler4_3: slave_sync_with_master finished sync process. Exiting clsctssslave_msg_handler

节点2octssd.log中也会记录发现ntp服务,ctss服务为观望模式,并且同步时间的主节点是节点1

 

 

 

1.1.3      模拟集群时间不一致

如果在我们生产系统中碰到集群时间不一致会导致什么结果,我们的排查思路是怎么样的,以下是模拟集群时间不一致的场景。

更改节点2的时间,向后推移2天:

将系统时间设定成20180702日的命令如下:

#date -s 07/02/2018

将系统时间设定成下午232306秒的命令如下。

#date -s 23:23:06

 

[root@raclhr-11gR2-N2 ctssd]# crsctl stat res -t -init

ora.ctssd

      1        ONLINE  ONLINE       raclhr-11gr2-n2          ACTIVE:172768000   

[root@raclhr-11gR2-N2 ctssd]# crsctl check ctss

CRS-4701: The Cluster Time Synchronization Service is in Active mode.

CRS-4702: Offset (in msec): 172768000

172768000微妙大约为2:

SYS@lhrrac11> select 172768000/1000/24/60/60 from dual;

 

172768000/1000/24/60/60

-----------------------

             1.99962963

 

更改节点2的时间后,在ASM和DB的alert日志中产生了以下的告警信息:

Time drift detected. Please check VKTM trace file for more details.

 

drift表示漂移。

 

[grid@raclhr-11gR2-N2 trace]$ pwd

/u01/app/grid/diag/asm/+asm/+ASM2/trace

[grid@raclhr-11gR2-N2 trace]$ ll -lrt *vktm*

-rw-r----- 1 grid oinstall  136 May 17 14:09 +ASM2_vktm_29999.trm

-rw-r----- 1 grid oinstall 1847 May 17 14:09 +ASM2_vktm_29999.trc

-rw-r----- 1 grid oinstall  529 Jun  4 14:52 +ASM2_vktm_32504.trm

-rw-r----- 1 grid oinstall 7238 Jun  4 14:52 +ASM2_vktm_32504.trc

-rw-r----- 1 grid oinstall   78 Jun  4 14:59 +ASM2_vktm_14800.trm

-rw-r----- 1 grid oinstall 1079 Jun  4 14:59 +ASM2_vktm_14800.trc

-rw-r----- 1 grid oinstall   90 Jun  4 17:26 +ASM2_vktm_14991.trm

-rw-r----- 1 grid oinstall 1200 Jun  4 17:26 +ASM2_vktm_14991.trc

-rw-r----- 1 grid oinstall   89 Jun 29 10:05 +ASM2_vktm_17961.trm

-rw-r----- 1 grid oinstall 1200 Jun 29 10:05 +ASM2_vktm_17961.trc

-rw-r----- 1 grid oinstall  191 Jul  2 21:35 +ASM2_vktm_19774.trm

-rw-r----- 1 grid oinstall 3171 Jul  2 21:35 +ASM2_vktm_19774.trc

[grid@raclhr-11gR2-N2 trace]$ cat +ASM2_vktm_19774.trc

*** 2018-06-30 19:22:12.650

VKTM running at (1)millisec precision with DBRM quantum (100)ms

[Start] HighResTick = 1530357732650537

kstmrmtickcnt = 0 : ksudbrmseccnt[0] = 1530357732

kstmchkdrift (kstmhighrestimecntkeeper:highres): Time stalled at 1530363888044519

 

*** 2018-06-10 20:04:00.000

kstmchkdrift (kstmhighrestimecntkeeper:highres): Time jumped forward by

(172844812599)usec at (1528632240000738) whereas (1000000) is allowed

 

usec代表微秒,ms表示毫秒,1s=1000ms=1000000us

VKTM进程发现系统时间变了,alert日志会产生相应的告警信息,从产生的trace文件中可知,系统向前推进了172844812599微秒,也即为2天,也就是我们模拟更改的时间,而允许的差异范围为1秒。

SYS@lhrrac11> select 172844812599/1000/1000/24/60/60 from dual;

 

172844812599/1000/1000/24/60/60

-------------------------------

                     2.00051866

 

节点2octssd.log中和ctss状态都记录了偏移的时间:

2018-07-02 21:54:39.330: [    CTSS][1400497920]ctsselect_msm: CTSS mode is [0x84]

2018-07-02 21:54:39.330: [    CTSS][1400497920]ctssslave_swm1_2: Ready to initiate new time sync process.

2018-07-02 21:54:39.330: [    CTSS][1400497920]ctssslave_swm2_1: Waiting for time sync message from master. sync_state[2].

2018-07-02 21:54:39.331: [    CTSS][1404700416]ctsscomm_recv_cb2: Receive incoming message event. Msgtype [2].

2018-07-02 21:54:39.331: [    CTSS][1404700416]ctssslave_msg_handler4_1: Waiting for slave_sync_with_master to finish sync process. sync_state[3].

2018-07-02 21:54:39.331: [    CTSS][1400497920]ctssslave_swm2_3: Received time sync message from master.

2018-07-02 21:54:39.331: [    CTSS][1400497920]ctssslave_swm: The magnitude [172757997797] of the offset [172757997797 usec] is larger than [86400000000 usec] sec which is the CTSS limit.

2018-07-02 21:54:39.331: [    CTSS][1400497920]ctssslave_swm: The magnitude of the systime diff is larger than max adjtime limit. Offset [172757997797] usec will be changed to max adjtime limit [+/- 131071].

2018-07-02 21:54:39.331: [    CTSS][1400497920]ctssslave_swm15: The CTSS master is behind this node. The local time offset [-131071 usec] is being adjusted. Sync method [2]

2018-07-02 21:54:39.331: [    CTSS][1400497920]ctssslave_swm17: LT [1530539679sec 331583usec], MT [1530366921sec 139882790197210usec], Delta [1267usec]

2018-07-02 21:54:39.331: [    CTSS][1400497920]ctssslave_swm19: The offset is [131071 usec] and sync interval set to [4]

2018-07-02 21:54:39.331: [    CTSS][1400497920]ctssslave_swm: Received from master (mode [0x8c] nodenum [1] hostname [raclhr-11gr2-n1] )

2018-07-02 21:54:39.331: [    CTSS][1400497920]ctsselect_msm: Sync interval returned in [4]

2018-07-02 21:54:39.331: [    CTSS][1404700416]ctssslave_msg_handler4_3: slave_sync_with_master finished sync process. Exiting clsctssslave_msg_handler

 

集群的时间同步校验也是失败的,校验结果是需要同步节点2的时间,此时因为集群时间差异较大,同步服务往往是无法做到的,只有手工同步才能修复。

校验集群的时间同步:

[grid@raclhr-11gR2-N2 ~]$ cluvfy comp clocksync -n all -verbose

 

Verifying Clock Synchronization across the cluster nodes

 

Checking if Clusterware is installed on all nodes...

Check of Clusterware install passed

 

Checking if CTSS Resource is running on all nodes...

Check: CTSS Resource running on all nodes

  Node Name                             Status                 

  ------------------------------------  ------------------------

  raclhr-11gr2-n2                       passed                 

  raclhr-11gr2-n1                       passed                 

Result: CTSS resource check passed

 

 

Querying CTSS for time offset on all nodes...

Result: Query of CTSS for time offset passed

 

Check CTSS state started...

Check: CTSS state

  Node Name                             State                  

  ------------------------------------  ------------------------

  raclhr-11gr2-n2                       Active                 

  raclhr-11gr2-n1                       Active                 

CTSS is in Active state. Proceeding with check of clock time offsets on all nodes...

Reference Time Offset Limit: 1000.0 msecs

Check: Reference Time Offset

  Node Name     Time Offset               Status                 

  ------------  ------------------------  ------------------------

  raclhr-11gr2-n2  1.727568E8                failed                 

  raclhr-11gr2-n1  0.0                       passed                 

Result: PRVF-9661 :  Time offset is greater than acceptable limit on node "raclhr-11gr2-n2" [actual = "1.727568E8", acceptable = "1000.0" ]

 

PRVF-9652 : Cluster Time Synchronization Services check failed

 

Verification of Clock Synchronization across the cluster nodes was unsuccessful.

Checks did not pass for the following node(s):

        raclhr-11gr2-n2

1.727568E8表示科学计数法,为1.7*10的8次方,即172756800ms,即2天。

在没有同步时间之前,重启节点2是无法正常启动的,从以下命令可知是在ctss这一步有问题,通过重新更改正确时间后,集群才能正常启动。

[root@raclhr-11gR2-N2 ~]# crsctl stat res -t -init

--------------------------------------------------------------------------------

NAME           TARGET  STATE        SERVER                   STATE_DETAILS      

--------------------------------------------------------------------------------

Cluster Resources

--------------------------------------------------------------------------------

ora.asm

      1        ONLINE  OFFLINE                               Instance Shutdown  

ora.cluster_interconnect.haip

      1        ONLINE  ONLINE       raclhr-11gr2-n2                             

ora.crf

      1        ONLINE  ONLINE       raclhr-11gr2-n2                             

ora.crsd

      1        ONLINE  OFFLINE                                                  

ora.cssd

      1        ONLINE  ONLINE       raclhr-11gr2-n2                             

ora.cssdmonitor

      1        ONLINE  ONLINE       raclhr-11gr2-n2                             

ora.ctssd

      1        ONLINE  OFFLINE                                                  

ora.diskmon

      1        OFFLINE OFFLINE                                                  

ora.evmd

      1        ONLINE  OFFLINE                                                  

ora.gipcd

      1        ONLINE  ONLINE       raclhr-11gr2-n2                             

ora.gpnpd

      1        ONLINE  ONLINE       raclhr-11gr2-n2                              

ora.mdnsd

      1        ONLINE  ONLINE       raclhr-11gr2-n2                             

查看集群的告警日志:

/u01/app/11.2.0/grid/log/raclhr-11gr2-n2/alertraclhr-11gr2-n2.log

 

2018-07-02 22:05:36.344

[ctssd(30350)]CRS-2405:The Cluster Time Synchronization Service on host raclhr-11gr2-n2 is shutdown by user

2018-07-02 22:05:40.689

[ctssd(30358)]CRS-2407:The new Cluster Time Synchronization Service reference node is host raclhr-11gr2-n1.

2018-07-02 22:05:40.689

[ctssd(30358)]CRS-2401:The Cluster Time Synchronization Service started on host raclhr-11gr2-n2.

2018-07-02 22:05:42.704

[ctssd(30358)]CRS-2404:The Cluster Time Synchronization Service detects that the local time is significantly different from the mean cluster time. Details in /u01/app/11.2.0/grid/log/raclhr-11gr2-n2/ctssd/octssd.log.

2018-07-02 22:05:43.395

[ctssd(30358)]CRS-2402:The Cluster Time Synchronization Service aborted on host raclhr-11gr2-n2. Details at  in /u01/app/11.2.0/grid/log/raclhr-11gr2-n2/ctssd/octssd.log.

2018-07-02 22:05:44.404

[ohasd(29989)]CRS-2807:Resource 'ora.asm' failed to start automatically.

2018-07-02 22:05:44.405

[ohasd(29989)]CRS-2807:Resource 'ora.crsd' failed to start automatically.

2018-07-02 22:05:44.405

[ohasd(29989)]CRS-2807:Resource 'ora.ctssd' failed to start automatically.

2018-07-02 22:05:44.405

[ohasd(29989)]CRS-2807:Resource 'ora.evmd' failed to start automatically.

查看octssd.log

2018-07-02 22:05:42.702: [    CTSS][1805252352]ctsscomm_recv_cb2: Receive incoming message event. Msgtype [3].

2018-07-02 22:05:42.702: [    CTSS][1805252352]ctsscomm_recv_cb4_2: Receive active version change msg. Old active version [186647296] New active version [186647296].

2018-07-02 22:05:42.702: [    CTSS][1805252352]ctsscomm_recv_cb2: Receive incoming message event. Msgtype [2].

2018-07-02 22:05:42.702: [    CTSS][1805252352]ctssslave_msg_handler4_1: Waiting for slave_sync_with_master to finish sync process. sync_state[3].

2018-07-02 22:05:42.703: [    CTSS][1798948608]ctssslave_swm2_3: Received time sync message from master.

2018-07-02 22:05:42.703: [    CTSS][1798948608]ctssslave_swm: sendtime{sec[1530540340], usec[690191]}, receivetime{sec[1530540342], usec[702977]}.

2018-07-02 22:05:42.703: [    CTSS][1798948608]ctssslave_swm: The RTT of sync msg [2012786] is too large for time sync to be accurate. Recommends retry. Returns [17].

2018-07-02 22:05:42.703: [    CTSS][1798948608]ctssslave_swm: Received from master (mode [0x8c] nodenum [1] hostname [raclhr-11gr2-n1] )

2018-07-02 22:05:42.703: [    CTSS][1798948608]ctsselect_monitor_steysync_mode: Failed in clsctssslave_sync_with_master [17]. Retries [0/3].

2018-07-02 22:05:42.703: [    CTSS][1798948608]ctssslave_swm1_1: Waiting for last time sync process to finish. sync_state[6].

2018-07-02 22:05:42.703: [    CTSS][1805252352]ctssslave_msg_handler4_3: slave_sync_with_master finished sync process. Exiting clsctssslave_msg_handler

2018-07-02 22:05:42.703: [    CTSS][1798948608]ctssslave_swm1_2: Ready to initiate new time sync process.

2018-07-02 22:05:42.703: [    CTSS][1798948608]ctssslave_swm2_1: Waiting for time sync message from master. sync_state[2].

2018-07-02 22:05:42.704: [    CTSS][1805252352]ctsscomm_recv_cb2: Receive incoming message event. Msgtype [2].

2018-07-02 22:05:42.704: [    CTSS][1805252352]ctssslave_msg_handler4_1: Waiting for slave_sync_with_master to finish sync process. sync_state[3].

2018-07-02 22:05:42.704: [    CTSS][1798948608]ctssslave_swm2_3: Received time sync message from master.

2018-07-02 22:05:42.704: [    CTSS][1798948608]ctssslave_swm: The magnitude [172752141259 usec] of the offset [172752141259 usec] is larger than [86400000000 usec] sec which is the CTSS limit.

2018-07-02 22:05:42.704: [    CTSS][1798948608]ctsselect_monitor_steysync_mode: Failed in clsctssslave_sync_with_master [12]: Time offset is too much to be corrected

2018-07-02 22:05:42.704: [    CTSS][1805252352]ctssslave_msg_handler4_3: slave_sync_with_master finished sync process. Exiting clsctssslave_msg_handler

2018-07-02 22:05:43.395: [    CTSS][2023593728]ctss_checkcb: clsdm requested check alive. checkcb_data{mode[0xd0], offset[172752141 ms]}, length=[8].

2018-07-02 22:05:43.395: [    CTSS][1798948608]ctsselect_monitor_steysync_mode: CTSS daemon exiting [12].

2018-07-02 22:05:43.395: [    CTSS][1798948608]CTSS daemon aborting

2018-07-02 22:05:44.398: [    CTSS][2023593728]ctss_checkcb: clsdm requested check alive. checkcb_data{mode[0xd0], offset[172752141 ms]}, length=[8].

 

下面开始修复系统:

 

将系统时间设定成20180630日的命令如下:

#date -s 06/30/2018

将系统时间设定成下午232306秒的命令如下。

#date -s 22:14:06

 

然后重启CRS服务:

crsctl stop crs -f

crsctl start crs

 

然后ctss自动同步时间:

[root@raclhr-11gR2-N2 ctssd]# crsctl stat res -t -init

--------------------------------------------------------------------------------

NAME           TARGET  STATE        SERVER                   STATE_DETAILS      

--------------------------------------------------------------------------------

Cluster Resources

--------------------------------------------------------------------------------

ora.ctssd

      1        ONLINE  ONLINE       raclhr-11gr2-n2          ACTIVE:100         

[root@raclhr-11gR2-N2 ctssd]# crsctl stat res -t -init

ora.ctssd

      1        ONLINE  ONLINE       raclhr-11gr2-n2          ACTIVE:0           

 





小麦苗课程

小麦苗课堂开课啦,如下是现有的课程,欢迎咨询小麦苗:


课程名称

课时

上课时间

价格

OCP(从入门到专家)

每年1期,35课时左右/

2000-2200

1588(可优惠)

OCM认证

每年N期,9课时/

2000-2200

22888

高可用课程(rac+dg+ogg

每年1期,20课时左右/

2000-2200

1888(可优惠)

Oracle初级入门

每年1期,15课时左右/

2000-2200

800

Oracle健康检查脚本

可微信或微店购买。

88

Oracle数据库技能直通车

包含如下3个课程:

①《11g OCP网络课程培训》(面向零基础) 价值1600

②《11g OCM网络班课程培训》(Oracle技能合集)价值10000+

③《RAC + DG + OGG 高可用网络班课程》 价值2000

以上3个课程全部打包只要5888,只要5888所有课程带回家,终身指导!所有课程都是在线讲课,不是播放视频,课件全部赠送!

注意:以上OCPOCM课程只包括培训课程,不包括考试费用。OCM提供培训+视频,但是不提供练习环境和资料。报名一次,OCP和高可用的课程可以免费终身循环听课。

5888

OCP+高可用(rac+dg+ogg

报名OCP+高可用课程,可以优惠300元,优惠后的价格为3188.

3188(可优惠)

注意:

1、每次上课前30分钟答疑。

2、OCM实时答疑,提供和考试一样的练习模拟环境,只要按照老师讲的方式来练习,可以保证100%通过。

3、授课方式:YY语音网络直播讲课(非视频) + QQ互动答疑 + 视频复习

4、OCP课时可以根据大家学习情况进行增加或缩减。

5、以上所有课程均可循环听课。

6、12c OCM课程私聊。

7、Oracle初级入门课程,只教大家最实用+最常用的Oracle操作维护知识。

8、以上所有课程,可以加小麦苗微信(lhrbestxh)或QQ(646634621)详聊,优惠多多。


培训项目

连接地址

DB笔试面试历史连接

http://mp.weixin.qq.com/s/Vm5PqNcDcITkOr9cQg6T7w

OCP培训说明连接

https://mp.weixin.qq.com/s/2cymJ4xiBPtTaHu16HkiuA

OCM培训说明连接

https://mp.weixin.qq.com/s/7-R6Cz8RcJKduVv6YlAxJA

高可用(RAC+DG+OGG)培训说明连接

https://mp.weixin.qq.com/s/4vf042CnOdAD8zDyjUueiw

OCP最新题库解析历史连接(052)

http://mp.weixin.qq.com/s/bUgn4-uciSndji_pUbLZfA

微店地址

https://weidian.com/s/793741433?wfr=c&ifr=shopdetail





About Me

.............................................................................................................................................

● 本文作者:小麦苗,部分内容整理自网络,若有侵权请联系小麦苗删除

● 本文在itpub(http://blog.itpub.net/26736162/abstract/1/)、博客园(http://www.cnblogs.com/lhrbest)和个人微信公众号(xiaomaimiaolhr)上有同步更新

● 本文itpub地址:http://blog.itpub.net/26736162/abstract/1/

● 本文博客园地址:http://www.cnblogs.com/lhrbest

● 本文pdf版、个人简介及小麦苗云盘地址:http://blog.itpub.net/26736162/viewspace-1624453/

● 数据库笔试面试题库及解答:http://blog.itpub.net/26736162/viewspace-2134706/

● DBA宝典今日头条号地址:http://www.toutiao.com/c/user/6401772890/#mid=1564638659405826

.............................................................................................................................................

● QQ群号:230161599(满)、618766405

● 微信群:可加我微信,我拉大家进群,非诚勿扰

● 联系我请加QQ好友646634621,注明添加缘由

● 于 2018-06-01 06:00 ~ 2018-06-31 24:00 在魔都完成

● 最新修改时间:2018-06-01 06:00 ~ 2018-06-31 24:00

● 文章内容来源于小麦苗的学习笔记,部分整理自网络,若有侵权或不当之处还请谅解

● 版权所有,欢迎分享本文,转载请保留出处

.............................................................................................................................................

小麦苗的微店https://weidian.com/s/793741433?wfr=c&ifr=shopdetail

小麦苗出版的数据库类丛书http://blog.itpub.net/26736162/viewspace-2142121/

小麦苗OCP、OCM、高可用网络班http://blog.itpub.net/26736162/viewspace-2148098/

.............................................................................................................................................

使用微信客户端扫描下面的二维码来关注小麦苗的微信公众号(xiaomaimiaolhr)及QQ群(DBA宝典)、添加小麦苗微信,学习最实用的数据库技术。

小麦苗的微信公众号小麦苗的DBA宝典QQ群2小麦苗的微信二维码小麦苗的微店

   小麦苗的微信公众号      小麦苗的DBA宝典QQ群2       小麦苗的微信二维码          小麦苗的微店

.............................................................................................................................................


欢迎与我联系





目录
相关文章
|
3月前
|
运维 Oracle 前端开发
Oracle 11g RAC集群日常运维命令总结
Oracle 11g RAC集群日常运维命令总结
96 2
|
3月前
|
Oracle 关系型数据库
分布式锁设计问题之Oracle RAC保证多个节点写入内存Page的一致性如何解决
分布式锁设计问题之Oracle RAC保证多个节点写入内存Page的一致性如何解决
|
4月前
|
存储 负载均衡 Oracle
|
1月前
|
存储 Oracle 关系型数据库
Oracle数据库的应用场景有哪些?
【10月更文挑战第15天】Oracle数据库的应用场景有哪些?
162 64
|
3月前
|
存储 自然语言处理 Oracle
Oracle数据库字符集概述及修改方式
【8月更文挑战第15天】Oracle 数据库字符集定义了数据的编码方案,决定可存储的字符类型及其表示方式。主要作用包括数据存储、检索及跨系统传输时的正确表示。常见字符集如 AL32UTF8 支持多语言,而 WE8MSWIN1252 主用于西欧语言。修改字符集风险高,可能导致数据问题,需事先备份并评估兼容性。可通过 ALTER DATABASE 语句直接修改或采用导出-导入数据的方式进行。完成后应验证数据完整性。此操作复杂,须谨慎处理。
|
3月前
|
数据采集 Oracle 关系型数据库
实时计算 Flink版产品使用问题之怎么实现从Oracle数据库读取多个表并将数据写入到Iceberg表
实时计算Flink版作为一种强大的流处理和批处理统一的计算框架,广泛应用于各种需要实时数据处理和分析的场景。实时计算Flink版通常结合SQL接口、DataStream API、以及与上下游数据源和存储系统的丰富连接器,提供了一套全面的解决方案,以应对各种实时计算需求。其低延迟、高吞吐、容错性强的特点,使其成为众多企业和组织实时数据处理首选的技术平台。以下是实时计算Flink版的一些典型使用合集。
|
22天前
|
SQL Oracle 关系型数据库
Oracle数据库优化方法
【10月更文挑战第25天】Oracle数据库优化方法
29 7
|
29天前
|
存储 Oracle 关系型数据库
数据库数据恢复—Oracle ASM磁盘组故障数据恢复案例
Oracle数据库数据恢复环境&故障: Oracle ASM磁盘组由4块磁盘组成。Oracle ASM磁盘组掉线 ,ASM实例不能mount。 Oracle数据库故障分析&恢复方案: 数据库数据恢复工程师对组成ASM磁盘组的磁盘进行分析。对ASM元数据进行分析发现ASM存储元数据损坏,导致磁盘组无法挂载。
|
2月前
|
Oracle 关系型数据库 数据库
数据库数据恢复—Oracle数据库文件出现坏块的数据恢复案例
打开oracle数据库报错“system01.dbf需要更多的恢复来保持一致性,数据库无法打开”。 数据库没有备份,无法通过备份去恢复数据库。用户方联系北亚企安数据恢复中心并提供Oracle_Home目录中的所有文件,急需恢复zxfg用户下的数据。 出现“system01.dbf需要更多的恢复来保持一致性”这个报错的原因可能是控制文件损坏、数据文件损坏,数据文件与控制文件的SCN不一致等。数据库恢复工程师对数据库文件进一步检测、分析后,发现sysaux01.dbf文件损坏,有坏块。 修复并启动数据库后仍然有许多查询报错,export和data pump工具使用报错。从数据库层面无法修复数据库。
数据库数据恢复—Oracle数据库文件出现坏块的数据恢复案例