问题背景:
测试服务器上部署ambari大数据平台后,发现METRICS COLLECTOR服务出现问题,该服务不能启动成功,有博文指出是ntpd服务有问题,因此,查看了ntpd服务的状态,状态如下:
[root@slave2 root]# systemctl status ntpd ● ntpd.service - Network Time Service Loaded: loaded (/usr/lib/systemd/system/ntpd.service; enabled; vendor preset: disabled) Active: active (running) since Sun 2022-04-10 11:27:07 CST; 5h 39min ago Process: 756 ExecStart=/usr/sbin/ntpd -u ntp:ntp $OPTIONS (code=exited, status=0/SUCCESS) Main PID: 781 (ntpd) CGroup: /system.slice/ntpd.service └─781 /usr/sbin/ntpd -u ntp:ntp -g Apr 10 11:27:07 slave2 ntpd[781]: 0.0.0.0 c012 02 freq_set kernel 0.057 PPM Apr 10 11:27:12 slave2 ntpd[781]: Listen normally on 4 ens33 192.168.0.18 UDP 123 Apr 10 11:27:12 slave2 ntpd[781]: Listen normally on 5 ens33 fe80::20c:29ff:fe23:5740 UDP 123 Apr 10 11:27:12 slave2 ntpd[781]: new interface(s) found: waking up resolver Apr 10 11:35:55 slave2 ntpd[781]: 0.0.0.0 c615 05 clock_sync Apr 10 12:27:07 slave2 ntpd[781]: frequency file /var/lib/ntp/drift.TEMP: Permission denied Apr 10 13:27:07 slave2 ntpd[781]: frequency file /var/lib/ntp/drift.TEMP: Permission denied Apr 10 14:27:07 slave2 ntpd[781]: frequency file /var/lib/ntp/drift.TEMP: Permission denied Apr 10 15:27:07 slave2 ntpd[781]: frequency file /var/lib/ntp/drift.TEMP: Permission denied Apr 10 16:27:07 slave2 ntpd[781]: frequency file /var/lib/ntp/drift.TEMP: Permission denied
ntpd服务报错:frequency file /var/lib/ntp/drift/ntp.drift.TEMP: Permission denied ,此问题将导致时间同步出现问题,也就是说 ntpq -p 命令可能会无法正常执行,而如果时间同步出现问题,大数据平台将会出现各种稀奇古怪的问题,比如 ambari collector这个服务就需要按时间收取节点的各个服务信息,时间的不同将会导致服务部能够启动。
问题解决方案:
查看drift这个文件,发现属组变更为了root属组,因此,临时将该文件提升为777权限。后考虑不太安全,因此,将属组调整为750,总之,该文件的属性应该是调整为如下所示:
[root@slave2 ntp]# ll total 4 -rw-r--r-- 1 ntp ntp 6 Mar 28 22:10 drift
此时在次重启ntpd服务,时间服务即可正常了。
[root@slave2 ntp]# ntpq -p remote refid st t when poll reach delay offset jitter ============================================================================== master LOCAL(0) 6 u 11 64 7 0.391 -0.083 0.008 [root@slave2 ntp]# ntpstat unsynchronised polling server every 8 s [root@slave2 ntp]# ntpstat unsynchronised polling server every 8 s
再次重启ambari collector 服务恢复正常。
总结:
测试服务器有一次模拟sshd报错ssh_exchange_identification: read: Connectio...的问题,因此,将/var/lib/目录权限调整为了777,由此导致ambari平台的一个节点出现了问题。