故障案例-ESXI6.7 EP13 紫屏分析

简介: 一台ESXI6.7 EP13 紫屏分析过程

产品版本信息。
Huawei RH2288H V3 | BIOS: 3.87 | Date (ISO-8601): 2018-02-02
VMware ESXi 6.5.0 build-5969303
ESXi 6.5 U1 ESXi 6.5 U1 7/27/2017 5969303 N/A

下面是紫萍发生时的stacktrace,显示LINT1/NMI 导致的紫萍,应该是硬件问题。
2020-07-22T19:47:32.067Z cpu0:66825)@BlueScreen: LINT1/NMI (motherboard nonmaskable interrupt), undiagnosed. This may be a hardware problem; please contact your hardware vendor.
2020-07-22T19:47:32.068Z cpu0:66825)Code start: 0x41802ca00000 VMK uptime: 127:07:45:14.433
2020-07-22T19:47:32.068Z cpu0:66825)0x4380c0002c60:[0x41802caed451]PanicvPanicInt@vmkernel#nover+0x545 stack: 0x41802caed451
2020-07-22T19:47:32.068Z cpu0:66825)0x4380c0002d00:[0x41802caed4dd]Panic_NoSave@vmkernel#nover+0x4d stack: 0x4380c0002d60
2020-07-22T19:47:32.068Z cpu0:66825)0x4380c0002d60:[0x41802caea7ae]NMICheckLint1@vmkernel#nover+0x19a stack: 0x0
2020-07-22T19:47:32.069Z cpu0:66825)0x4380c0002e20:[0x41802caea844]NMI_Interrupt@vmkernel#nover+0x94 stack: 0x0
2020-07-22T19:47:32.069Z cpu0:66825)0x4380c0002ea0:[0x41802cb2c531]IDTNMIWork@vmkernel#nover+0x99 stack: 0x0
2020-07-22T19:47:32.069Z cpu0:66825)0x4380c0002f20:[0x41802cb2d9c1]Int2_NMI@vmkernel#nover+0x19 stack: 0x418040000000
2020-07-22T19:47:32.069Z cpu0:66825)0x4380c0002f40:[0x41802cb3d044]gate_entry_@vmkernel#nover+0x0 stack: 0x0
2020-07-22T19:47:32.070Z cpu0:66825)0x43916849bcf0:[0x41802ca8b9c2]Power_ArchSetCState@vmkernel#nover+0x106 stack: 0x7fffffffffffffff
2020-07-22T19:47:32.070Z cpu0:66825)0x43916849bd20:[0x41802ccc49d3]CpuSchedIdleLoopInt@vmkernel#nover+0x39b stack: 0x1
2020-07-22T19:47:32.070Z cpu0:66825)0x43916849bd90:[0x41802ccc728a]CpuSchedDispatch@vmkernel#nover+0x114a stack: 0x410000000001
2020-07-22T19:47:32.071Z cpu0:66825)0x43916849bec0:[0x41802ccc8502]CpuSchedWait@vmkernel#nover+0x27a stack: 0x100000000000000
2020-07-22T19:47:32.071Z cpu0:66825)0x43916849bf40:[0x41802ccc85d5]CpuSched_NoEvqWait@vmkernel#nover+0x19 stack: 0x0
2020-07-22T19:47:32.071Z cpu0:66825)0x43916849bf50:[0x41802d5cc345]TcpipDispatch@(tcpip4)#+0x345 stack: 0x6
2020-07-22T19:47:32.071Z cpu0:66825)0x43916849bfe0:[0x41802ccc91b5]CpuSched_StartWorld@vmkernel#nover+0x99 stack: 0x0
2020-07-22T19:47:32.075Z cpu0:66825)base fs=0x0 gs=0x418040000000 Kgs=0x0

IPMI日志相同时间点有下面一个event.
162 2020-07-22T19:47:38 2 111 (Unknown) 2 (System Event) 83 Assert + Slot/Connector Fault Status

下一步:
需要服务器硬件厂商做进一步排查

目录
相关文章
|
传感器 虚拟化
故障案例-ESXI6.5主机无法发生重启,并有发生网卡无故UP DOWN的事件
VSAN环境下的一台ESXI6.5主机无法发生重启,并发生网卡无故UP DOWN的事件.以下是故障分析过程和解决方法
2895 0
|
存储 JSON 运维
Facebook 工程经验 --PCIe 故障监控和修复
Facebook 工程经验 --PCIe 故障监控和修复
258 1
Facebook 工程经验 --PCIe 故障监控和修复
|
虚拟化
VMware故障案例分享-ESXi 6.7异常重启
一台VSAN环境下的ESXi 6.7异常重启分析
3926 0
|
存储 文字识别 Oracle
虚拟机模拟部署Extended Clusters(三)故障模拟测试,存储链路断开
集群状态: [root@prod02 ~]# crsctl stat res -t -------------------------------------------------------------------------------- NAME TARGET ST.
1531 0
|
存储 SQL 文字识别
虚拟机模拟部署Extended Clusters(四)故障模拟测试,存储链路恢复
asm 磁盘组 当链路恢复之后,磁盘状态显示MISSING(CRS_0000,OCR_0000)。 [grid@prod02 ~]$ sqlplus / as sysdba SQL*Plus: Release 11.
4837 0