基于Zabbix IPMI监控服务器硬件状况

简介:

公司有多个分部,且机房没有专业值班,机房等级不够。在这种情况下,又想实时监控机房环境,于是使用IPMI方式来达到目的。由于之前已经部署了Zabbix监控系统,本次将结合Zabbix自带的IPMI,完成服务器温度及风扇转速等的监控。

1.环境说明

被监控端服务器型号:Dell PowerEdge R510 
规划分配的IPMI地址: 10.103.1.100

2.Zabbix监控平台说明

Zabbix版本: 3.2.1,在安装时,未使用--with-openipmi 
Zabbix网络接口可以连通10.103.1.100

3.前置学习

维基百科IPMI: http://zh.wikipedia.org/wiki/IPMI 
IBM DeveloperWorks -- 使用ipmitool实现Linux系统下对服务器的ipmi管理:http://www.ibm.com/developerworks/cn/linux/l-ipmi/ 
Dell -- Managing Dell PowerEdge Servers Using IPMItool:http://www.dell.com/downloads/global/power/ps4q04-20040204-Murphy.pdf 
Zabbix IPMI checks:https://www.zabbix.com/documentation/3.2/manual/config/items/itemtypes/ipmi 
使用IPMITOOL实现终端重定向(课外读物):http://docs.linuxtone.org/ebooks/Dell/ipmitool.pdf

4.配置IPMI

4.1.配置IPMI地址

可以参考前置推荐中的《Managing Dell PowerEdge Servers Using IPMItool》在服务器启动时进行IPMI地址的配置,并开启IPMI Over LAN。 
也可以使用Dell的iDRAC开启IPMI功能,具体可以查看文章最后的参考资料。

2

4.2.获取传感器信息

登录Zabbix服务器,通过ipmitool远程访问Dell服务器传感器信息

# ipmitool -I lan -H 10.103.1.100 -U root -P calvin -L user sensor list
# ipmitool -I lan -H 10.103.1.100 -U root -P calvin -L user sensor get "FAN MOD 1B RPM"

2

2

4.3.安装IPMItool软件包

# yum -y install OpenIPMI OpenIPMI-devel ipmitool freeipmi

4.4.配置Zabbix

注:为了支持IPMI,需要在zabbix server/proxy安装时增加--with-openipmi参数

服务器端配置zabbix IPMI pollers 
zabbix_server.conf/zabbix_proxy.conf

# sed -i '/# StartIPMIPollers=0/aStartIPMIPollers=5' zabbix_server.conf
# service zabbix-server restart

4.5.导入监控模板

下面提供DELL的2个型号的IPMI模板: 
template-ipmi-dell-poweredge-r510 
template-ipmi-dell-poweredge-2950 
添加监控主机,关联上本模板,并在IPMI页面,设置Authentication algorithmDefault,Privilege levelUserUsernamesensorPasswordsensor_pass,保存即可。 
使用此种方法获取数据的结果就是效率很差,基本没什么数据。

5.使用Zabbix External checks自定义IPMI

本来是选择nagios的IPMI插件:check_ipmi_sensor,文件是:check_ipmi_sensor_v3-v3.9.tar.gz 
具体使用方法详见:http://www.thomas-krenn.com/en/wiki/IPMI_Sensor_Monitoring_Plugin

5.1.安装perl-IPC-Run模块

yum -y install perl-IPC-Run perl-Getopt-Long

5.2.使用check_ipmi_sensor查看效果

但是发现报错。

# ./check_ipmi_sensor -f ipmi.cfg -H 10.103.1.100 -vvv
------------- debug output for sel (-vvv is set): ------------
  /usr/sbin/ipmi-sel was executed with the following parameters:
    /usr/sbin/ipmi-sel -h 10.103.1.100 --config-file ipmi.cfg --driver-type=LAN_2_0 --output-event-state --interpret-oem-data --entity-sensor-names
  output of FreeIPMI:
ID  | Date        | Time     | Name                                        | Type                     | State    | Event
1   | Apr-08-2011 | 06:42:13 | System Board SEL                            | Event Logging Disabled   | Nominal  | Log Area Reset/Cleared
2   | Jan-01-1970 | 08:00:31 | System Board Intrusion                      | Physical Security        | Critical | General Chassis Intrusion ; Intrusion while system Off
3   | Jan-01-1970 | 08:00:36 | System Board Intrusion                      | Physical Security        | Critical | General Chassis Intrusion ; Intrusion while system Off
4   | Aug-15-2011 | 23:09:53 | Disk Drive Bay 1 Drive 2                    | Drive Slot               | Critical | Drive Fault ; OEM Event Data2 code = 01h ; OEM Event Data3 code = 02h
5   | Aug-16-2011 | 11:38:25 | Disk Drive Bay 1 Drive 2                    | Drive Slot               | Nominal  | Drive Presence ; OEM Event Data2 code = 01h ; OEM Event Data3 code = 02h
6   | Aug-16-2011 | 11:38:25 | Disk Drive Bay 1 Drive 2                    | Drive Slot               | Critical | Drive Fault ; OEM Event Data2 code = 01h ; OEM Event Data3 code = 02h
7   | Aug-16-2011 | 11:38:55 | Disk Drive Bay 1 Drive 2                    | Drive Slot               | Nominal  | Drive Presence ; OEM Event Data2 code = 01h ; OEM Event Data3 code = 02h
8   | Jun-10-2012 | 22:41:13 | System Board Ambient Temp                   | Temperature              | Warning  | Upper Non-critical - going high ; Sensor Reading = 45.00 C ; Threshold = 45.00 C
9   | Jun-11-2012 | 02:53:53 | System Board Ambient Temp                   | Temperature              | Nominal  | Upper Non-critical - going high ; Sensor Reading = 43.00 C ; Threshold = 45.00 C
10  | Nov-05-2012 | 21:56:42 | Disk Drive Bay 1 Drive 2                    | Drive Slot               | Critical | Drive Fault ; OEM Event Data2 code = 01h ; OEM Event Data3 code = 02h
11  | Nov-14-2012 | 21:53:58 | Disk Drive Bay 1 Drive 2                    | Drive Slot               | Nominal  | Drive Presence ; OEM Event Data2 code = 01h ; OEM Event Data3 code = 02h
12  | Nov-14-2012 | 21:53:58 | Disk Drive Bay 1 Drive 2                    | Drive Slot               | Critical | Drive Fault ; OEM Event Data2 code = 01h ; OEM Event Data3 code = 02h
13  | Nov-14-2012 | 21:54:19 | Disk Drive Bay 1 Drive 2                    | Drive Slot               | Nominal  | Drive Presence ; OEM Event Data2 code = 01h ; OEM Event Data3 code = 02h
14  | Nov-15-2012 | 16:12:03 | Disk Drive Bay 1 Drive 2                    | Drive Slot               | Critical | Drive Fault ; OEM Event Data2 code = 01h ; OEM Event Data3 code = 02h
15  | Nov-17-2012 | 17:14:34 | Disk Drive Bay 1 Drive 2                    | Drive Slot               | Nominal  | Drive Presence ; OEM Event Data2 code = 01h ; OEM Event Data3 code = 02h
16  | Nov-17-2012 | 17:14:34 | Disk Drive Bay 1 Drive 2                    | Drive Slot               | Critical | Drive Fault ; OEM Event Data2 code = 01h ; OEM Event Data3 code = 02h
17  | Nov-17-2012 | 17:15:40 | Disk Drive Bay 1 Drive 2                    | Drive Slot               | Nominal  | Drive Presence ; OEM Event Data2 code = 01h ; OEM Event Data3 code = 02h
18  | Nov-19-2012 | 20:47:57 | Disk Drive Bay 1 Drive 2                    | Drive Slot               | Nominal  | Drive Presence ; OEM Event Data2 code = 01h ; OEM Event Data3 code = 02h
19  | Nov-19-2012 | 20:50:04 | Disk Drive Bay 1 Drive 2                    | Drive Slot               | Nominal  | Drive Presence ; OEM Event Data2 code = 01h ; OEM Event Data3 code = 02h
20  | Jan-01-1970 | 08:00:33 | System Board Intrusion                      | Physical Security        | Critical | General Chassis Intrusion ; Intrusion while system Off
21  | Jan-01-1970 | 08:00:38 | System Board Intrusion                      | Physical Security        | Critical | General Chassis Intrusion ; Intrusion while system Off
22  | Jun-27-2014 | 17:27:38 | Disk Drive Bay 1 Drive 2                    | Drive Slot               | Nominal  | Drive Presence ; OEM Event Data2 code = 01h ; OEM Event Data3 code = 02h
23  | Jun-27-2014 | 17:27:53 | Disk Drive Bay 1 Drive 2                    | Drive Slot               | Nominal  | Drive Presence ; OEM Event Data2 code = 01h ; OEM Event Data3 code = 02h
24  | Jan-01-1970 | 08:00:31 | System Board Intrusion                      | Physical Security        | Critical | General Chassis Intrusion ; Intrusion while system Off
25  | Jan-01-1970 | 08:00:36 | System Board Intrusion                      | Physical Security        | Critical | General Chassis Intrusion ; Intrusion while system Off
26  | Oct-31-2016 | 05:48:35 | System Board Ambient Temp                   | Temperature              | Warning  | Lower Non-critical - going low ; Sensor Reading = 8.00 C ; Threshold = 8.00 C
27  | Oct-31-2016 | 09:00:38 | System Board Ambient Temp                   | Temperature              | Nominal  | Lower Non-critical - going low ; Sensor Reading = 10.00 C ; Threshold = 8.00 C
------------- debug output for sensors (-vvv is set): ------------
  script was executed with the following parameters:
    ./check_ipmi_sensor -f ipmi.cfg -H 10.103.1.100 -vvv
  check_ipmi_sensor version:
    3.9
  FreeIPMI version:
    ipmi-sensors - 1.2.9
  FreeIPMI was executed with the following parameters:
    /usr/sbin/ipmi-sensors -h 10.103.1.100 --config-file ipmi.cfg --quiet-cache --sdr-cache-recreate --interpret-oem-data --output-sensor-state --ignore-not-available-sensors --driver-type=LAN_2_0 --output-sensor-thresholds
  FreeIPMI return code: 0
  output of FreeIPMI:
Record ID | Sensor Name | Sensor Group | Monitoring Status | Sensor Units | Sensor Reading
5 | Ambient Temp | Temperature | Nominal | C | 28.000000
7 | CMOS Battery | Battery | Nominal | N/A | 'OK'
8 | VCORE PG | Voltage | Nominal | N/A | 'State Deasserted'
9 | VCORE PG | Voltage | Nominal | N/A | 'State Deasserted'
10 | 0.75 VTT PG | Voltage | Nominal | N/A | 'State Deasserted'
11 | 0.75 VTT PG | Voltage | Nominal | N/A | 'State Deasserted'
12 | CPU VTT PG | Voltage | Nominal | N/A | 'State Deasserted'
13 | 1.5V PG | Voltage | Nominal | N/A | 'State Deasserted'
14 | 1.8V PG | Voltage | Nominal | N/A | 'State Deasserted'
15 | 5V PG | Voltage | Nominal | N/A | 'State Deasserted'
16 | MEM CPU2 FAIL | Voltage | Nominal | N/A | 'State Deasserted'
17 | 5V Riser PG | Voltage | Nominal | N/A | 'State Deasserted'
18 | MEM CPU1 FAIL | Voltage | Nominal | N/A | 'State Deasserted'
19 | VTT CPU2 FAIL | Voltage | Nominal | N/A | 'State Deasserted'
20 | VTT CPU1 FAIL | Voltage | Nominal | N/A | 'State Deasserted'
21 | 0.9V PG | Voltage | Nominal | N/A | 'State Deasserted'
22 | CPU2 1.8 PLL PG | Voltage | Nominal | N/A | 'State Deasserted'
23 | CPU1 1.8 PLL PG | Voltage | Nominal | N/A | 'State Deasserted'
24 | 1.1 FAIL | Voltage | Nominal | N/A | 'State Deasserted'
25 | 1.0 LOM FAIL | Voltage | Nominal | N/A | 'State Deasserted'
26 | 1.0 AUX FAIL | Voltage | Nominal | N/A | 'State Deasserted'
27 | Heatsink Pres | Entity Presence | Nominal | N/A | 'Entity Present'
28 | iDRAC6 Ent Pres | Entity Presence | Critical | N/A | 'Entity Absent'
29 | USB Cable Pres | Entity Presence | Nominal | N/A | 'Entity Present'
31 | Riser Presence | Entity Presence | Nominal | N/A | 'Entity Present'
32 | FAN MOD 1A RPM | Fan | Nominal | RPM | 3480.000000
34 | FAN MOD 2A RPM | Fan | Nominal | RPM | 3480.000000
36 | FAN MOD 3A RPM | Fan | Nominal | RPM | 3480.000000
39 | FAN MOD 4A RPM | Fan | Nominal | RPM | 3480.000000
40 | Presence | Entity Presence | Nominal | N/A | 'Entity Present'
41 | Presence | Entity Presence | Nominal | N/A | 'Entity Present'
42 | Presence | Entity Presence | Nominal | N/A | 'Entity Present'
43 | Presence | Entity Presence | Nominal | N/A | 'Entity Present'
44 | Presence  | Entity Presence | Nominal | N/A | 'Entity Present'
45 | Status | Processor | Nominal | N/A | 'Processor Presence detected'
46 | Status | Processor | Nominal | N/A | 'Processor Presence detected'
47 | Status | Power Supply | Nominal | N/A | 'Presence detected'
48 | Current | Current | Nominal | A | 0.400000
49 | Current | Current | Nominal | A | 0.400000
50 | Voltage | Voltage | Nominal | V | 218.000000
51 | Voltage | Voltage | Nominal | V | 218.000000
52 | Status | Power Supply | Nominal | N/A | 'Presence detected'
53 | Status | Cable/Interconnect | Nominal | N/A | 'Cable/Interconnect is connected'
54 | OS Watchdog | Watchdog 2 | Nominal | N/A | 'OK'
56 | Intrusion | Physical Security | Nominal | N/A | 'OK'
57 | PS Redundancy | Power Supply | Nominal | N/A | 'Fully Redundant'
58 | Fan Redundancy | Fan | Nominal | N/A | 'Fully Redundant'
60 | System Level | Current | Nominal | W | 168.000000
61 | Power Optimized | OEM Reserved | Nominal | N/A | 'Good'
62 | Drive | Drive Slot | Nominal | N/A | 'Drive Presence'
65 | Cable SAS A | Cable/Interconnect | Nominal | N/A | 'Cable/Interconnect is connected'
66 | Cable SAS B | Cable/Interconnect | Nominal | N/A | 'Cable/Interconnect is connected'
67 | DKM Status | OEM Reserved | N/A | N/A | 'OEM Event = 0000h'
119 | FAN MOD 5A RPM | Fan | Nominal | RPM | 3480.000000

--------------------- end of debug output ---------------------
IPMI Status: Use of uninitialized value in string ne at ./check_ipmi_sensor line 737.
Use of uninitialized value in string ne at ./check_ipmi_sensor line 737.
Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 738.
Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 738.
Use of uninitialized value in string ne at ./check_ipmi_sensor line 749.
Use of uninitialized value in string ne at ./check_ipmi_sensor line 749.
Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 750.
Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 750.
Use of uninitialized value in string ne at ./check_ipmi_sensor line 759.
Use of uninitialized value in string ne at ./check_ipmi_sensor line 737.
Use of uninitialized value in string ne at ./check_ipmi_sensor line 737.
Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 738.
Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 738.
Use of uninitialized value in string ne at ./check_ipmi_sensor line 749.
Use of uninitialized value in string ne at ./check_ipmi_sensor line 749.
Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 750.
Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 750.
Use of uninitialized value in string ne at ./check_ipmi_sensor line 759.
Use of uninitialized value in string ne at ./check_ipmi_sensor line 737.
Use of uninitialized value in string ne at ./check_ipmi_sensor line 737.
Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 738.
Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 738.
Use of uninitialized value in string ne at ./check_ipmi_sensor line 749.
Use of uninitialized value in string ne at ./check_ipmi_sensor line 749.
Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 750.
Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 750.
Use of uninitialized value in string ne at ./check_ipmi_sensor line 759.
Use of uninitialized value in string ne at ./check_ipmi_sensor line 737.
Use of uninitialized value in string ne at ./check_ipmi_sensor line 737.
Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 738.
Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 738.
Use of uninitialized value in string ne at ./check_ipmi_sensor line 749.
Use of uninitialized value in string ne at ./check_ipmi_sensor line 749.
Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 750.
Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 750.
Use of uninitialized value in string ne at ./check_ipmi_sensor line 759.
Use of uninitialized value in string ne at ./check_ipmi_sensor line 737.
Use of uninitialized value in string ne at ./check_ipmi_sensor line 737.
Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 738.
Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 738.
Use of uninitialized value in string ne at ./check_ipmi_sensor line 749.
Use of uninitialized value in string ne at ./check_ipmi_sensor line 749.
Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 750.
Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 750.
Use of uninitialized value in string ne at ./check_ipmi_sensor line 759.
Use of uninitialized value in string ne at ./check_ipmi_sensor line 737.
Use of uninitialized value in string ne at ./check_ipmi_sensor line 737.
Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 738.
Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 738.
Use of uninitialized value in string ne at ./check_ipmi_sensor line 749.
Use of uninitialized value in string ne at ./check_ipmi_sensor line 749.
Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 750.
Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 750.
Use of uninitialized value in string ne at ./check_ipmi_sensor line 759.
Use of uninitialized value in string ne at ./check_ipmi_sensor line 737.
Use of uninitialized value in string ne at ./check_ipmi_sensor line 737.
Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 738.
Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 738.
Use of uninitialized value in string ne at ./check_ipmi_sensor line 749.
Use of uninitialized value in string ne at ./check_ipmi_sensor line 749.
Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 750.
Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 750.
Use of uninitialized value in string ne at ./check_ipmi_sensor line 759.
Use of uninitialized value in string ne at ./check_ipmi_sensor line 737.
Use of uninitialized value in string ne at ./check_ipmi_sensor line 737.
Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 738.
Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 738.
Use of uninitialized value in string ne at ./check_ipmi_sensor line 749.
Use of uninitialized value in string ne at ./check_ipmi_sensor line 749.
Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 750.
Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 750.
Use of uninitialized value in string ne at ./check_ipmi_sensor line 759.
Use of uninitialized value in string ne at ./check_ipmi_sensor line 737.
Use of uninitialized value in string ne at ./check_ipmi_sensor line 737.
Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 738.
Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 738.
Use of uninitialized value in string ne at ./check_ipmi_sensor line 749.
Use of uninitialized value in string ne at ./check_ipmi_sensor line 749.
Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 750.
Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 750.
Use of uninitialized value in string ne at ./check_ipmi_sensor line 759.
Use of uninitialized value in string ne at ./check_ipmi_sensor line 737.
Use of uninitialized value in string ne at ./check_ipmi_sensor line 737.
Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 738.
Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 738.
Use of uninitialized value in string ne at ./check_ipmi_sensor line 749.
Use of uninitialized value in string ne at ./check_ipmi_sensor line 749.
Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 750.
Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 750.
Use of uninitialized value in string ne at ./check_ipmi_sensor line 759.
Use of uninitialized value in string ne at ./check_ipmi_sensor line 737.
Use of uninitialized value in string ne at ./check_ipmi_sensor line 737.
Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 738.
Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 738.
Use of uninitialized value in string ne at ./check_ipmi_sensor line 749.
Use of uninitialized value in string ne at ./check_ipmi_sensor line 749.
Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 750.
Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 750.
Use of uninitialized value in string ne at ./check_ipmi_sensor line 759.
Critical [iDRAC6 Ent Pres = Critical ('Entity Absent'), System Board Intrusion = Critical (Physical Security), System Board Intrusion = Critical (Physical Security), Disk Drive Bay 1 Drive 2 = Critical (Drive Slot), Disk Drive Bay 1 Drive 2 = Critical (Drive Slot), System Board Ambient Temp = Warning (Temperature), Disk Drive Bay 1 Drive 2 = Critical (Drive Slot), Disk Drive Bay 1 Drive 2 = Critical (Drive Slot), Disk Drive Bay 1 Drive 2 = Critical (Drive Slot), Disk Drive Bay 1 Drive 2 = Critical (Drive Slot), System Board Intrusion = Critical (Physical Security), System Board Intrusion = Critical (Physical Security), System Board Intrusion = Critical (Physical Security), System Board Intrusion = Critical (Physical Security), System Board Ambient Temp = Warning (Temperature)] | 'Ambient Temp'=28.000000;:;: 'FAN MOD 1A RPM'=3480.000000;:;: 'FAN MOD 2A RPM'=3480.000000;:;: 'FAN MOD 3A RPM'=3480.000000;:;: 'FAN MOD 4A RPM'=3480.000000;:;: 'Current'=0.400000;:;: 'Current'=0.400000;:;: 'Voltage'=218.000000;:;: 'Voltage'=218.000000;:;: 'System Level'=168.000000;:;: 'FAN MOD 5A RPM'=3480.000000;:;:
Ambient Temp = 28.000000 (Status: Nominal)
CMOS Battery = 'OK' (Status: Nominal)
VCORE PG = 'State Deasserted' (Status: Nominal)
VCORE PG = 'State Deasserted' (Status: Nominal)
0.75 VTT PG = 'State Deasserted' (Status: Nominal)
0.75 VTT PG = 'State Deasserted' (Status: Nominal)
CPU VTT PG = 'State Deasserted' (Status: Nominal)
1.5V PG = 'State Deasserted' (Status: Nominal)
1.8V PG = 'State Deasserted' (Status: Nominal)
5V PG = 'State Deasserted' (Status: Nominal)
MEM CPU2 FAIL = 'State Deasserted' (Status: Nominal)
5V Riser PG = 'State Deasserted' (Status: Nominal)
MEM CPU1 FAIL = 'State Deasserted' (Status: Nominal)
VTT CPU2 FAIL = 'State Deasserted' (Status: Nominal)
VTT CPU1 FAIL = 'State Deasserted' (Status: Nominal)
0.9V PG = 'State Deasserted' (Status: Nominal)
CPU2 1.8 PLL PG = 'State Deasserted' (Status: Nominal)
CPU1 1.8 PLL PG = 'State Deasserted' (Status: Nominal)
1.1 FAIL = 'State Deasserted' (Status: Nominal)
1.0 LOM FAIL = 'State Deasserted' (Status: Nominal)
1.0 AUX FAIL = 'State Deasserted' (Status: Nominal)
Heatsink Pres = 'Entity Present' (Status: Nominal)
iDRAC6 Ent Pres = 'Entity Absent' (Status: Critical)
USB Cable Pres = 'Entity Present' (Status: Nominal)
Riser Presence = 'Entity Present' (Status: Nominal)
FAN MOD 1A RPM = 3480.000000 (Status: Nominal)
FAN MOD 2A RPM = 3480.000000 (Status: Nominal)
FAN MOD 3A RPM = 3480.000000 (Status: Nominal)
FAN MOD 4A RPM = 3480.000000 (Status: Nominal)
Presence = 'Entity Present' (Status: Nominal)
Presence = 'Entity Present' (Status: Nominal)
Presence = 'Entity Present' (Status: Nominal)
Presence = 'Entity Present' (Status: Nominal)
Presence = 'Entity Present' (Status: Nominal)
Status = 'Processor Presence detected' (Status: Nominal)
Status = 'Processor Presence detected' (Status: Nominal)
Status = 'Presence detected' (Status: Nominal)
Current = 0.400000 (Status: Nominal)
Current = 0.400000 (Status: Nominal)
Voltage = 218.000000 (Status: Nominal)
Voltage = 218.000000 (Status: Nominal)
Status = 'Presence detected' (Status: Nominal)
Status = 'Cable/Interconnect is connected' (Status: Nominal)
OS Watchdog = 'OK' (Status: Nominal)
Intrusion = 'OK' (Status: Nominal)
PS Redundancy = 'Fully Redundant' (Status: Nominal)
Fan Redundancy = 'Fully Redundant' (Status: Nominal)
System Level = 168.000000 (Status: Nominal)
Power Optimized = 'Good' (Status: Nominal)
Drive = 'Drive Presence' (Status: Nominal)
Cable SAS A = 'Cable/Interconnect is connected' (Status: Nominal)
Cable SAS B = 'Cable/Interconnect is connected' (Status: Nominal)
FAN MOD 5A RPM = 3480.000000 (Status: Nominal)不过根据它的提示(其实插件也是调用如下命令),可以使用

/usr/sbin/ipmi-sel -h 10.103.1.100 --config-file ipmi.cfg --driver-type=LAN_2_0 --output-event-state --interpret-oem-data --entity-sensor-names执行结果是:

# /usr/sbin/ipmi-sel -h 10.103.1.100 --config-file ipmi.cfg --driver-type=LAN_2_0 --output-event-state --interpret-oem-data --entity-sensor-names
ID  | Date        | Time     | Name                                        | Type                     | State    | Event
1   | Apr-08-2011 | 06:42:13 | System Board SEL                            | Event Logging Disabled   | Nominal  | Log Area Reset/Cleared
2   | Jan-01-1970 | 08:00:31 | System Board Intrusion                      | Physical Security        | Critical | General Chassis Intrusion ; Intrusion while system Off
3   | Jan-01-1970 | 08:00:36 | System Board Intrusion                      | Physical Security        | Critical | General Chassis Intrusion ; Intrusion while system Off
4   | Aug-15-2011 | 23:09:53 | Disk Drive Bay 1 Drive 2                    | Drive Slot               | Critical | Drive Fault ; OEM Event Data2 code = 01h ; OEM Event Data3 code = 02h
5   | Aug-16-2011 | 11:38:25 | Disk Drive Bay 1 Drive 2                    | Drive Slot               | Nominal  | Drive Presence ; OEM Event Data2 code = 01h ; OEM Event Data3 code = 02h
6   | Aug-16-2011 | 11:38:25 | Disk Drive Bay 1 Drive 2                    | Drive Slot               | Critical | Drive Fault ; OEM Event Data2 code = 01h ; OEM Event Data3 code = 02h
7   | Aug-16-2011 | 11:38:55 | Disk Drive Bay 1 Drive 2                    | Drive Slot               | Nominal  | Drive Presence ; OEM Event Data2 code = 01h ; OEM Event Data3 code = 02h
8   | Jun-10-2012 | 22:41:13 | System Board Ambient Temp                   | Temperature              | Warning  | Upper Non-critical - going high ; Sensor Reading = 45.00 C ; Threshold = 45.00 C
9   | Jun-11-2012 | 02:53:53 | System Board Ambient Temp                   | Temperature              | Nominal  | Upper Non-critical - going high ; Sensor Reading = 43.00 C ; Threshold = 45.00 C
10  | Nov-05-2012 | 21:56:42 | Disk Drive Bay 1 Drive 2                    | Drive Slot               | Critical | Drive Fault ; OEM Event Data2 code = 01h ; OEM Event Data3 code = 02h
11  | Nov-14-2012 | 21:53:58 | Disk Drive Bay 1 Drive 2                    | Drive Slot               | Nominal  | Drive Presence ; OEM Event Data2 code = 01h ; OEM Event Data3 code = 02h
12  | Nov-14-2012 | 21:53:58 | Disk Drive Bay 1 Drive 2                    | Drive Slot               | Critical | Drive Fault ; OEM Event Data2 code = 01h ; OEM Event Data3 code = 02h
13  | Nov-14-2012 | 21:54:19 | Disk Drive Bay 1 Drive 2                    | Drive Slot               | Nominal  | Drive Presence ; OEM Event Data2 code = 01h ; OEM Event Data3 code = 02h
14  | Nov-15-2012 | 16:12:03 | Disk Drive Bay 1 Drive 2                    | Drive Slot               | Critical | Drive Fault ; OEM Event Data2 code = 01h ; OEM Event Data3 code = 02h
15  | Nov-17-2012 | 17:14:34 | Disk Drive Bay 1 Drive 2                    | Drive Slot               | Nominal  | Drive Presence ; OEM Event Data2 code = 01h ; OEM Event Data3 code = 02h
16  | Nov-17-2012 | 17:14:34 | Disk Drive Bay 1 Drive 2                    | Drive Slot               | Critical | Drive Fault ; OEM Event Data2 code = 01h ; OEM Event Data3 code = 02h
17  | Nov-17-2012 | 17:15:40 | Disk Drive Bay 1 Drive 2                    | Drive Slot               | Nominal  | Drive Presence ; OEM Event Data2 code = 01h ; OEM Event Data3 code = 02h
18  | Nov-19-2012 | 20:47:57 | Disk Drive Bay 1 Drive 2                    | Drive Slot               | Nominal  | Drive Presence ; OEM Event Data2 code = 01h ; OEM Event Data3 code = 02h
19  | Nov-19-2012 | 20:50:04 | Disk Drive Bay 1 Drive 2                    | Drive Slot               | Nominal  | Drive Presence ; OEM Event Data2 code = 01h ; OEM Event Data3 code = 02h
20  | Jan-01-1970 | 08:00:33 | System Board Intrusion                      | Physical Security        | Critical | General Chassis Intrusion ; Intrusion while system Off
21  | Jan-01-1970 | 08:00:38 | System Board Intrusion                      | Physical Security        | Critical | General Chassis Intrusion ; Intrusion while system Off
22  | Jun-27-2014 | 17:27:38 | Disk Drive Bay 1 Drive 2                    | Drive Slot               | Nominal  | Drive Presence ; OEM Event Data2 code = 01h ; OEM Event Data3 code = 02h
23  | Jun-27-2014 | 17:27:53 | Disk Drive Bay 1 Drive 2                    | Drive Slot               | Nominal  | Drive Presence ; OEM Event Data2 code = 01h ; OEM Event Data3 code = 02h
24  | Jan-01-1970 | 08:00:31 | System Board Intrusion                      | Physical Security        | Critical | General Chassis Intrusion ; Intrusion while system Off
25  | Jan-01-1970 | 08:00:36 | System Board Intrusion                      | Physical Security        | Critical | General Chassis Intrusion ; Intrusion while system Off
26  | Oct-31-2016 | 05:48:35 | System Board Ambient Temp                   | Temperature              | Warning  | Lower Non-critical - going low ; Sensor Reading = 8.00 C ; Threshold = 8.00 C
27  | Oct-31-2016 | 09:00:38 | System Board Ambient Temp                   | Temperature              | Nominal  | Lower Non-critical - going low ; Sensor Reading = 10.00 C ; Threshold = 8.00 C

5.3编写Zabbix外部检查(External checks)脚本

# pwd
/usr/local/zabbix/share/zabbix/externalscripts
# cat check_ipmi

下面是脚本内容

#!/bin/bash
#用于检测ipmi相关信息
#Create on 2016-011-18
#@author: Chinge_Yang

args="$*"
echo $(date +%F-%T) $args >> /tmp/check_ipmi.debug

check_ipmi_dir=/usr/local/zabbix/shell/check_ipmi_sensor
check_ipmi_bin=$check_ipmi_dir/check_ipmi_sensor

ipmi_sensors=/usr/sbin/ipmi-sensors
ipmi_cfg=$check_ipmi_dir/ipmi.cfg

#$check_ipmi_bin -f $ipmi_cfg -v $args
#${ipmi_sel} $args --config-file $ipmi_cfg --driver-type=LAN_2_0 --output-event-state --interpret-oem-data --entity-sensor-names 
options="--quiet-cache --sdr-cache-recreate --interpret-oem-data --output-sensor-state --ignore-not-available-sensors --driver-type=LAN_2_0 --output-sensor-thresholds"

function usage(){
    echo "Usage: `basename $0` options (-h HOST|-n NAME)"
}

function check(){
    result=$($ipmi_sensors -h $host --config-file $ipmi_cfg $options|grep "$name"|awk -F"| " '{print $NF}')
    printf "%.4f\n" $result
}

if [ $# -lt 4 ]  
then
    usage
    exit 55     
fi  

# 用法: scriptname -options
# 注意: 必须使用破折号 (-) 
# 参数后接冒号,表示必须接值
while getopts ":h:n:" Option;do
  case $Option in
    h)
    host=$OPTARG
    ;;
    n)
    name=$OPTARG
    ;;
    *)
    usage
    ;;   # 默认情况的处理
  esac
done

shift $(($OPTIND - 1))
#  (译者注: shift命令是可以带参数的, 参数就是移动的个数)
#  将参数指针减1, 这样它将指向下一个参数.
#  $1 现在引用的是命令行上的第一个非选项参数,
#+ 如果有一个这样的参数存在的话.

check

exit 0

添加执行权限

chmod a+x check_ipmi

5.4新建自定义模板

这里就不详细介绍内容了,其实就是改改上文中的模板而来,一张图看完内容:

2

给2张图看看效果:

2

2

好吧,最后发现,就算是自定义脚本,仍然是获取数据艰难,脚本执行ipmi的命令都timeout。。。。



本文转自 ygqygq2 51CTO博客,原文链接:http://blog.51cto.com/ygqygq2/1874277,如需转载请自行联系原作者

相关文章
|
2月前
|
存储 数据挖掘 虚拟化
服务器数据恢复—Raid5阵列两块硬盘硬件故障掉线的数据恢复案例
服务器数据恢复环境: 一台某品牌存储设备上有一组由10块硬盘(9块数据盘+1块热备盘)组建的raid5阵列,上层部署vmware exsi虚拟化平台。 服务器故障: raid5阵列中两块硬盘对应的指示灯亮黄灯掉线。硬盘序列号无法读取,通过SAS扩展卡也无法读取。
|
1天前
|
监控 安全 前端开发
使用 Zabbix 监控堆外应用
使用 Zabbix 监控堆外应用
19 9
|
26天前
|
SQL 监控 数据库
OceanBase社区版可以通过Zabbix监控
OceanBase社区版可以通过Zabbix监控
63 7
|
28天前
|
监控 数据可视化 BI
服务器监控软件Zabbix
【10月更文挑战第19天】
40 6
|
2月前
|
SQL 监控 数据库
OceanBase社区版可以通过Zabbix监控
【10月更文挑战第5天】随着OceanBase社区版的广泛应用,企业纷纷采用这一高性能、高可用的分布式数据库系统。为了确保系统的稳定运行,使用成熟的Zabbix监控工具进行全面监控至关重要。本文通过具体示例介绍了如何配置Zabbix监控OceanBase,包括安装配置、创建监控模板和监控项、编写脚本、设置触发器及图形展示等步骤,帮助读者快速上手,及时发现并解决问题,确保业务始终处于最佳状态。
69 2
|
3月前
|
监控 关系型数据库 MySQL
zabbix agent集成percona监控MySQL的插件实战案例
这篇文章是关于如何使用Percona监控插件集成Zabbix agent来监控MySQL的实战案例。
65 2
zabbix agent集成percona监控MySQL的插件实战案例
|
12天前
|
人工智能 弹性计算 编解码
阿里云GPU云服务器性能、应用场景及收费标准和活动价格参考
GPU云服务器作为阿里云提供的一种高性能计算服务,通过结合GPU与CPU的计算能力,为用户在人工智能、高性能计算等领域提供了强大的支持。其具备覆盖范围广、超强计算能力、网络性能出色等优势,且计费方式灵活多样,能够满足不同用户的需求。目前用户购买阿里云gpu云服务器gn5 规格族(P100-16G)、gn6i 规格族(T4-16G)、gn6v 规格族(V100-16G)有优惠,本文为大家详细介绍阿里云gpu云服务器的相关性能及收费标准与最新活动价格情况,以供参考和选择。
|
17天前
|
机器学习/深度学习 人工智能 弹性计算
什么是阿里云GPU云服务器?GPU服务器优势、使用和租赁费用整理
阿里云GPU云服务器提供强大的GPU算力,适用于深度学习、科学计算、图形可视化和视频处理等多种场景。作为亚太领先的云服务提供商,阿里云的GPU云服务器具备灵活的资源配置、高安全性和易用性,支持多种计费模式,帮助企业高效应对计算密集型任务。
|
19天前
|
存储 分布式计算 固态存储
阿里云2核16G、4核32G、8核64G配置云服务器租用收费标准与活动价格参考
2核16G、8核64G、4核32G配置的云服务器处理器与内存比为1:8,这种配比的云服务器一般适用于数据分析与挖掘,Hadoop、Spark集群和数据库,缓存等内存密集型场景,因此,多为企业级用户选择。目前2核16G配置按量收费最低收费标准为0.54元/小时,按月租用标准收费标准为260.44元/1个月。4核32G配置的阿里云服务器按量收费标准最低为1.08元/小时,按月租用标准收费标准为520.88元/1个月。8核64G配置的阿里云服务器按量收费标准最低为2.17元/小时,按月租用标准收费标准为1041.77元/1个月。本文介绍这些配置的最新租用收费标准与活动价格情况,以供参考。
|
17天前
|
机器学习/深度学习 人工智能 弹性计算
阿里云GPU服务器全解析_GPU价格收费标准_GPU优势和使用说明
阿里云GPU云服务器提供强大的GPU算力,适用于深度学习、科学计算、图形可视化和视频处理等场景。作为亚太领先的云服务商,阿里云GPU云服务器具备高灵活性、易用性、容灾备份、安全性和成本效益,支持多种实例规格,满足不同业务需求。

推荐镜像

更多