icinga2通过check_hpasm监控HP服务器硬件报警:
https://labs.consol.de/nagios/check_hpasm/#download
注:该工具也可用来直接监控Windows系统,Windows服务器在安装系统时默认会安装hp-snmp-agents组件
被监控服务器需安装hp-snmp-agents(及snmp)
# dpkg -i hp-snmp-agents_10.40-2909.34_amd64.deb
# /sbin/hpsnmpconfig
输入y (即使用已有/etc/snmp/snmpd.conf配置)
如果报下面错误,解决办法也简单(不一定通用,我遇到的是这么好的),执行/sbin/hpsnmpconfig,然后选择n,第1项随便输入2次一样的密码(commutiy那个),后面全默认回车,就好了
CRITICAL - snmpwalk returns no product name (cpqsinfo-mib), wrong device
redhat上可通过下面命令排错:
# snmpwalk -v 2c -c public 127.0.0.1 1.3.6.1.4.1.232
# /etc/init.d/hp-snmp-agents status (确保是start状态)
# tar zxfv check_hpasm-4.7.5.4.tar.gz
# cd check_hpasm-4.7.5.4
# ./configure
# make
# make install
# cp -rv /usr/local/nagios/libexec/check_hpasm /usr/lib64/nagios/plugins/
# /usr/lib64/nagios/plugins/check_hpasm -H 10.0.0.3 -C public --perfdata=short
配置icinga2
# vi /etc/icinga2/conf.d/templates.conf
object CheckCommand "HP" {
import "plugin-check-command"
command = [ PluginDir + "/check_hpasm" ]
arguments = {
"-H" = "$address$"
"-C" = "$snmp$"
"--perfdata" = "$perf$"
}
}
:wq
# vi /etc/icinga2/conf.d/services.conf
apply Service "HP" {
import "generic-service"
check_command = "HP"
vars.snmp="SPD.ubuntusrv#989"
vars.perf="--perfdata=short"
assign where host.address == "10.29.1.52" || host.address == "10.29.1.53"
}
:wq
# service icinga2 restart
也可借助被监控端的nrpe(不易受网络影响)
Ubuntu:
# vi /etc/nagios/nrpe.cfg
command[check_hpubt]=/usr/lib/nagios/plugins/check_hpasm -H 127.0.0.1 -C public
:wq
# service nagios-nrpe-server restart
Redhat:
# vi /etc/nagios/nrpe.cfg
command[check_hpubt]=sudo /usr/lib64/nagios/plugins/check_hpasm -H 127.0.0.1 -C public
:wq
# service nrpe restart
# vi /etc/sudoers
nagios ALL=(ALL) NOPASSWD:/usr/lib64/nagios/plugins/*
#Defaults requiretty (默认未注释)
:wq
监控端icinga2配置略
附:
-v:显示服务器硬件详细信息
--hpasmcli /sbin/hpasmcli 显示硬盘健康情况
--snmpwalk /usr/bin/snmpwalk 结果同--hpasmcli
--blacklist daac 排除控制器加速器健康情况
https://labs.consol.de/nagios/check_hpasm/