故障现象:
主机7.26晚上9:22左右异常重启
分析过程:
产品版本信息。
HPE ProLiant DL380 Gen10 | BIOS: U30 | Date (ISO-8601): 2019-11-13
VMware ESXi 6.7.0 build-16075168
ESXi 6.7 P02 ESXi670-202004002 04/28/2020 16075168
主机完成重启的时间。
vmksummary.log
2020-07-26T13:28:09Z bootstop: Host has booted
检查syslog和vmkernel,发现主机是在UTC时间2020-07-26T13:23:57突然重启,
重启前没有生成core dump,日志还在持续输出,从esxi层面看,并没有发现可以导致esxi重启的日志信息。
检查了ipmi的事件记录,主机重启前也没有发现异常的event。
syslog.log
2020-07-26T13:20:01Z root: CalcFreeSpace sizeKB: 52224, freeMB: 541832
2020-07-26T13:23:57Z watchdog-vobd: [2097732] Begin '/usr/lib/vmware/vob/bin/vobd', min-uptime = 60, max-quick-failures = 5, max-total-failures = 1000000, bg_pid_file = '', reboot-flag = '0'
2020-07-26T13:23:57Z watchdog-vobd: Executing '/usr/lib/vmware/vob/bin/vobd'
2020-07-26T13:23:57Z jumpstart[2097715]: Launching Executor
2020-07-26T13:23:57Z jumpstart[2097715]: Setting up Executor - Reset Requested
2020-07-26T13:23:57Z jumpstart[2097743]: Executor Reset - polling for commands
2020-07-26T13:23:57Z jumpstart[2097715]: BmcInfoImpl: Retrieve Version information failed
2020-07-26T13:23:57Z jumpstart[2097715]: ignoring plugin 'tls-advanced-option' because version '6.7.0' has already been run.
2020-07-26T13:23:57Z jumpstart[2097715]: executing start plugin: check-required-memory
vmkernel.log
2020-07-26T13:20:09.517Z cpu9:13495753)MemSchedAdmit: 489: uw.13495753 (74029406) extraMin/extraFromParent: 5656/5656, ams (2355) childEmin/eMinLimit: 14606/20000
VMB: 66: Reserved 4 MPNs starting @ 0x4a0
VMB: 113: mbMagic: 1badb005, mbInfo 0x600000
VMB: 106: Changed PAT MSR from 0x7040600070406 to 0x7010600070106
EFI: 196: 64-bit EFI revision 2.5632
VMB_SERIAL: 264: Serial port set to default configuration.
结论:
esxi层面没有发现异常,可能是服务器硬件出了问题,需要服务器硬件层面再去检查。