内存控制器错误信息[备忘]

简介: 参考日志错误信息: [root@hh-yun-compute-130125 ~]# cat /var/log/messages | grep -i errorMar 1 04:58:05 hh-yun-compute-130125 kernel: sbridge: HANDLING MCE MEMORY ERRORMar 1 04:58:06 hh-yun-compute-130

参考日志错误信息:

[root@hh-yun-compute-130125 ~]# cat /var/log/messages | grep -i error
Mar  1 04:58:05 hh-yun-compute-130125 kernel: sbridge: HANDLING MCE MEMORY ERROR
Mar  1 04:58:06 hh-yun-compute-130125 kernel: EDAC MC1: CE row 2, channel 0, label "CPU_SrcID#1_Channel#2_DIMM#0": 1 Unknown error(s): memory scrubbing on FATAL area : cpu=6 Err=0008:00c2 (ch=2), addr = 0x16113a9000 => socket=1, Channel=2(mask=4), rank=0
Mar  1 10:27:08 hh-yun-compute-130125 kernel: sbridge: HANDLING MCE MEMORY ERROR
Mar  1 10:27:09 hh-yun-compute-130125 kernel: EDAC MC1: CE row 2, channel 0, label "CPU_SrcID#1_Channel#2_DIMM#0": 1 Unknown error(s): memory scrubbing on FATAL area : cpu=6 Err=0008:00c2 (ch=2), addr = 0x15e1c49000 => socket=1, Channel=2(mask=4), rank=0
Mar  1 13:52:56 hh-yun-compute-130125 kernel: sbridge: HANDLING MCE MEMORY ERROR
Mar  1 13:52:57 hh-yun-compute-130125 kernel: EDAC MC1: CE row 2, channel 0, label "CPU_SrcID#1_Channel#2_DIMM#0": 1 Unknown error(s): memory scrubbing on FATAL area : cpu=6 Err=0008:00c2 (ch=2), addr = 0x160e949000 => socket=1, Channel=2(mask=4), rank=0
Mar  2 04:16:56 hh-yun-compute-130125 kernel: sbridge: HANDLING MCE MEMORY ERROR
Mar  2 04:16:56 hh-yun-compute-130125 kernel: sbridge: HANDLING MCE MEMORY ERROR
Mar  2 04:16:57 hh-yun-compute-130125 kernel: EDAC MC1: CE row 2, channel 0, label "CPU_SrcID#1_Channel#2_DIMM#0": 1 Unknown error(s): memory scrubbing on FATAL area : cpu=6 Err=0008:00c2 (ch=2), addr = 0x1613a61000 => socket=1, Channel=2(mask=4), rank=0
Mar  2 04:16:57 hh-yun-compute-130125 kernel: EDAC MC1: CE row 2, channel 0, label "CPU_SrcID#1_Channel#2_DIMM#0": 1 Unknown error(s): memory scrubbing on FATAL area : cpu=6 Err=0008:00c2 (ch=2), addr = 0x1613a79000 => socket=1, Channel=2(mask=4), rank=0


参考信息2:

[root@hh-yun-compute-130125 ~]# cat /sys/devices/system/edac/mc/mc?/ce*count
0
0
8
0
[root@hh-yun-compute-130125 ~]# cat /sys/devices/system/edac/mc/mc1/ce_count
8


模块信息

[root@hh-yun-compute-130125 ~]# modinfo sb_edac
filename:       /lib/modules/2.6.32-504.3.3.el6.x86_64/kernel/drivers/edac/sb_edac.ko
description:    MC Driver for Intel Sandy Bridge and Ivy Bridge memory controllers -  Ver: 1.1.0
author:         Red Hat Inc. (http://www.redhat.com)
author:         Mauro Carvalho Chehab <mchehab@redhat.com>
license:        GPL
srcversion:     01CFEEBE911D55B6FE660BE
alias:          pci:v00008086d00002FA0sv*sd*bc*sc*i*
alias:          pci:v00008086d00000EA8sv*sd*bc*sc*i*
alias:          pci:v00008086d00003CA8sv*sd*bc*sc*i*
depends:        edac_core
vermagic:       2.6.32-504.3.3.el6.x86_64 SMP mod_unload modversions
parm:           edac_op_state:EDAC Error Reporting state: 0=Poll,1=NMI (int)


[root@hh-yun-compute-130125 ~]# modinfo edac_core
filename:       /lib/modules/2.6.32-504.3.3.el6.x86_64/kernel/drivers/edac/edac_core.ko
description:    Core library routines for EDAC reporting
author:         Doug Thompson www.softwarebitmaker.com, et al
license:        GPL
srcversion:     C21E296292A2174839A086C
depends:
vermagic:       2.6.32-504.3.3.el6.x86_64 SMP mod_unload modversions
parm:           check_pci_errors:Check for PCI bus parity errors: 0=off 1=on (int)
parm:           edac_pci_panic_on_pe:Panic on PCI Bus Parity error: 0=off 1=on (int)
parm:           edac_mc_panic_on_ue:Panic on uncorrected error: 0=off 1=on (int)
parm:           edac_mc_log_ue:Log uncorrectable error to console: 0=off 1=on (int)
parm:           edac_mc_log_ce:Log correctable error to console: 0=off 1=on (int)
parm:           edac_mc_poll_msec:Polling period in milliseconds

官方解释:

Total Correctable Errors count attribute file:

	'ce_count'

	This attribute file displays the total count of correctable
	errors that have occurred on this csrow. This
	count is very important to examine. CEs provide early
	indications that a DIMM is beginning to fail. This count
	field should be monitored for non-zero values and report
	such information to the system administrator.


启用 mcelog

[root@hh-yun-compute-130125 ~]# service  mcelogd restart
Stopping mcelog                                     [确定]
Starting mcelog daemon                              [确定]
[root@hh-yun-compute-130125 ~]# mcelog
mcelog: Family 6 Model 3e CPU: only decoding architectural errors


查询日志

[root@hh-yun-compute-130125 ~]# tail /var/log/mcelog
mcelog: failed to prefill DIMM database from DMI data
mcelog: mcelog server already running


相关评估

This is a harmless warning message. The DIMM database prefill relies on a specific non-standard format of the DIMMs in the DMI BIOS tables. If this format is not used by the BIOS, mcelog will only discover DIMMs as they get their first error (if the CPU reports DIMMs in machine check errors). Please understand for the most part, mcelog should be ignored.

因此最终决定忽略该信息



目录
相关文章
|
4天前
|
缓存 监控 数据可视化
linux查看内存信息
在Linux中检查内存使用:`free -h`或`-m`显示简洁内存统计;`cat /proc/meminfo`获取详细信息;`top`或`htop`(如果安装)实时监控进程内存占用;`vmstat`查看虚拟内存统计;`sar -r`(需要sysstat)报告系统内存活动。图形工具如Gnome System Monitor提供可视化界面。
29 4
|
4天前
|
存储 JSON 监控
Higress Controller**不是将配置信息推送到Istio的内存存储里面的**。
【2月更文挑战第30天】Higress Controller**不是将配置信息推送到Istio的内存存储里面的**。
16 1
|
4天前
|
弹性计算 网络安全 虚拟化
ECS快照问题之提取内存信息失败如何解决
阿里云ECS用户可以创建的一个虚拟机实例或硬盘的数据备份,用于数据恢复和克隆新实例;本合集将指导用户如何有效地创建和管理ECS快照,以及解决快照过程中可能遇到的问题,确保数据的安全性和可靠性。
|
4天前
|
Linux
|
Ubuntu Linux
Linux:查看服务器信息,CPU、内存、系统版本、内核版本等
Linux:查看服务器信息,CPU、内存、系统版本、内核版本等
996 0
Linux:查看服务器信息,CPU、内存、系统版本、内核版本等
|
4天前
|
编译器
LabVIEW使用性能和内存信息
LabVIEW使用性能和内存信息
10 1
|
6月前
|
测试技术 API
【OS Pintos】Project1 项目要求说明 | 进程中止信息 | 参数传递 | 用户内存访问 | 有关项目实现的建议
【OS Pintos】Project1 项目要求说明 | 进程中止信息 | 参数传递 | 用户内存访问 | 有关项目实现的建议
84 0
|
4天前
|
编译器 C语言
[字符串和内存函数]错误信息报告函数strerror详解
[字符串和内存函数]错误信息报告函数strerror详解
30 2
[字符串和内存函数]错误信息报告函数strerror详解
|
4天前
|
存储 JSON 运维
【运维】Powershell 服务器系统管理信息总结(进程、线程、磁盘、内存、网络、CPU、持续运行时间、系统账户、日志事件)
【运维】Powershell 服务器系统管理信息总结(进程、线程、磁盘、内存、网络、CPU、持续运行时间、系统账户、日志事件)
56 0
|
4天前
|
缓存 Linux
百度搜索:蓝易云【Linux系统中查看内存信息的方法有哪些?】
这些是在Linux系统中查看内存信息的常见方法。根据您的需求和具体环境,您可以选择适合您的方法来查看内存信息。
82 0