使用Zabbix通过BMC管理口监控HP服务器

简介: 使用Zabbix监控系统,通过本地脚本抓取服务器硬件信息,推送到Zabbix监控硬件健康状态,从而能够及时发现硬件问题。监控硬件如:磁盘、内存、电源、温度等。

概述

本文的环境:Zabbix版本为3.4,一台Server,一台Porxy,一台agent。Porxy主动抓取agent的状态并sender到Server。


首先需要保证服务器的BMC口能够联网,并且拥有管理用户和密码,Proxy和agent能够保持联网。本文只针对HP系列服务器,其他品牌服务器后续更新。


安装


首先安装所需的软件包

yum install perl-IO-Socket-SSL.noarch perl-XML-Simple.noarch perl-Class-Accessor perl-Config-Tiny.noarch perl-Monitoring-Plugin
rpm -ivh http://dl.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm

脚本的代码我贴在文末,脚本存放在/root下,无需修改其内容,文件名保存为hp.pl,权限保持为-rw-r--r-- 即可。


执行脚本

perl /root/check_ilo2_health.pl -u Admin -p aliyun -3 -v  -t 60 -H 192.168.1.1

选项:

            -u     用户名

            -p     密码

            -v     版本

            -t     超时时间,单位为s

            -H   主机BMC管理IP

返回以下内容:(截取部分风扇)


               <ZONE VALUE = "System"/>
               <LABEL VALUE = "Fan 1"/>
               <STATUS VALUE = "Not Installed"/>
               <SPEED VALUE = "0" UNIT="Percentage"/>
          </FAN>
          <FAN>
               <ZONE VALUE = "System"/>
               <LABEL VALUE = "Fan 2"/>
               <STATUS VALUE = "Not Installed"/>
chunk: 1ff
chunk size: 511
               <SPEED VALUE = "0" UNIT="Percentage"/>
          </FAN>
          <FAN>
               <ZONE VALUE = "System"/>
               <LABEL VALUE = "Fan 3"/>
               <STATUS VALUE = "OK"/>
               <SPEED VALUE = "27" UNIT="Percentage"/>
          </FAN>
          <FAN>
               <ZONE VALUE = "System"/>
               <LABEL VALUE = "Fan 4"/>

这里可以看到,可以抓取到风扇的序号和运行状态,当前状态为“Not Installed”和“OK”,我们可以通过grep和awk进行过滤,筛选需要的信息。

这时可以写一个脚本,脚本内容如下:


#!/bin/bash
LSI_LOG=/tmp/hp.log
perl /root/check_ilo2_health.pl -u administrator -p aliyun -3 -v -t 60 -H 192.168.1.1 >$LSI_LOG
cat $LSI_LOG | grep -v "^chunk" |grep -A 2 "Fan [1-9]" |grep "STATUS VALUE" |sed -e 's/"/ /g' | awk -F "/" '{print $1}' | awk -F "=" '{print $2}' 
得到以下内容


  Not Installed 
  Not Installed 
  OK 
  OK 
  Not Installed 
  OK 
  OK 
  OK 

这时就可以写一个循环,将其推送到server上,代码如下:

Sender的用法:

        -z   主机IP

        -s   Zabbix上的主机名

        -k   Zabbix监控项的Key值

        -o   数值,key值的数值

#!/bin/bash
LSI_LOG="/tmp/hp.log"
perl /root/hp.pl -u Administrator -p ****** -3 -v -t 60 -H 192.168.1.11 >$LSI_LOG
#get Fan state
n=1
s=1
cat $LSI_LOG | grep -v "^chunk" |grep -A 2 "Fan [1-9]" |grep "STATUS VALUE" |sed -e 's/"/ /g' | awk -F "/" '{print $1}' | awk -F "=" '{print $2}' > $LSI_LOG.temp
while read line
do
/usr/bin/zabbix_sender -z 192.168.1.10 -s hp01 -k fan"$n".state -o "$line"
((n++))
if [ $n == 13 ];then
n=1
fi
done <$LSI_LOG.temp


脚本保存在/usr/lib/zabbix/externalscripts/hp.sh

创建定时任务,crontab -e ,每五分钟执行一次


*/5 * * * * /usr/lib/zabbix/externalscripts/hp.sh >/dev/null 2>&1


此时在Zabbix添加相应的监控项以及Key值就可以了,需要注意的是,脚本Sender的-s参数(主机名字)一定要和Zabbix的主机名字对应,否则将无法获取数据。

ccb6e3377ac064c27d6a8724fd4c8d28ed7d63ed

查看最新值

8403a0d7945caf6a5cf79a25c1e5760d0fd63987


创建相应触发器即可。




脚本内容


#!/usr/bin/perl
# icinga: -epn

# check_ilo2_health.pl
# based on check_stuff.pl and locfg.pl
#
# Nagios plugin using the Nagios::Plugin module and the
# HP Lights-Out XML PERL Scripting Sample from
# ftp://ftp.hp.com/pub/softlib2/software1/pubsw-linux/p391992567/v60711/linux-LOsamplescripts3.00.0-2.tgz
# checks if all sensors are ok, returns warning on high temperatures and
# fan failures and critical on overall health failure
#
# Alexander Greiner-Baer <alexander.greiner-baer@web.de> 2007 - 2018
# Matthew Stier <Matthew.Stier@us.fujitsu.com> 2011
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program.  If not, see <http://www.gnu.org/licenses/>.
#
#
# Changelog:
# 1.62    Mon, 14 May 2018 19:05:22 +0200
#   retrieve firmware infos only when using --getinfos
# 1.61    Thu, 01 Jun 2017 20:05:04 +0200
#   fix for iLO4 2.50 link state when using --ignorelinkdown
# 1.60    Wed, 12 Aug 2015 18:20:13 +0200
#   provide --sslopts to override defaults settings
#   fix }; for GET_EVENT_LOG
#   applied patch from Rene Koch <rene.koch@siedl.net>:
#     handle missing values when using "-g"
#   CONTROLLER_STATUS not present on iLO4 anymore, use STATUS instead
#   put SSL_VERIFY_NONE in ''
# 1.59    Wed, 28 Jan 2015 18:56:26 +0100
#   fix chunk size handling
#   corrected HTTP/1.1 HOST Header
#   applied patch from Max Winterstein <winterstein@siriusonline.de>:
#     sslv3 support
#   add retries option
#   catch XMLin() errors
#   applied patch from Rene Koch <rene.koch@siedl.net>:
#     ignore battery not installed status (option "-x")
#     display server name (option "-g")
#     added warning for logical drive status "Degraded (Recovering)"
#     display system details (hardware model, serial number, SystemROM, iLO version)
#     display memory size and part number in case of memory failure
#     display hard disk model number in case of hard disk failure
#     display power supply part number in case of power supply failure
# 1.58    Thu, 08 Aug 2013 18:17:02 +0200
#   ignore network link down status (option "-i")
#   added ENCLOSURE_ADDR to drive bay label (bay numbering was inconsistent)
#   ignore spare drives
# 1.57    Fri, 17 May 2013 19:30:48 +0200
#   SSL_verify_mode SSL_VERIFY_NONE (IO::Socket::SSL changed default)
#   event log support for ilo2
#   disable embedded perl in icinga
# 1.56    Fri, 15 Mar 2013 20:47:13 +0100
#   applied patch from Niklas Edmundsson <Niklas.Edmundsson@hpc2n.umu.se>:
#     check processor and memory details
#   applied patches from Dragan Sekerovic <dragan.sekerovic@onestep2.at>:
#     add location label to temperature (option "-b")
#     support for checking event log (option "-l")
#     add iLO version to output
#   add 2 new values for power supply status
#   --
# 1.55    Sun, 05 Aug 2012 20:18:46 +0200
#   faulty drive (option "-c") exits now with CRITICAL instead of WARNING
#   applied patches from Niklas Edmundsson <Niklas.Edmundsson@hpc2n.umu.se>:
#     iLO4 RAID Controller Status
#     nodriveexit
#   add g6 drive status
#   overall health probes every element now
#   fixed bug with drive bay index
#   supports iLO3 with multiple backplanes
#   supports iLO4 disk check
#   Note: overall health may show drive/storage status, even without "-c"
#   --
# 1.54    Thu, 14 Jun 2012 21:36:40 +0200
#   applied fix for iLO4 from Niklas Edmundsson <Niklas.Edmundsson@hpc2n.umu.se>
#   --
# 1.53    Tue, 14 Feb 2012 19:47:40 +0100
#   added new disk bay variant
#   added power supply NOT APPLICABLE
#   --
# 1.52    Wed, 27 Jul 2011 20:46:14 +0200
#   fixed <LABEL VALUE = "Power Supplies"/> again
#   --
# 1.51    Mon, 25 Jul 2011 19:36:53 +0200
#   fixed bug with chunked replies by Matthew Stier
#   --
# 1.5     Sat, 16 Jul 2011 10:02:10 +0200
#    optimized by Matthew Stier
#   --
# 1.47    Thu, 14 Jul 2011 12:02:01 +0200
#   also print perfdata when temperature output is disabled
#   --
# 1.46    Wed, 06 Jul 2011 08:46:51 +0200
#   fixed bug with nagios embedded perl interpreter
#   --
# 1.45    Wed, 13 Oct 2010 22:17:01 +0200
#   new option "--ilo3"
#
#   "--checkdrives" enhancements
#
#   <LABEL VALUE = "Power Supplies"/> shows always "Failed" even when the power
#   supplies are redundant
#
#   improved "--fanredundancy" and "--powerredundancy"
#   --
# 1.44    Mon, 14 Dec 2009 20:11:37 +0100
#   new option "--checkdrives"
#   --
# 1.43    Mon, 17 Aug 2009 20:50:13 +0200
#   new option "--fanredundancy"
#
#   new option "--powerredundancy"
#   --
# 1.42          Mon, 17 Aug 2009 12:52:23 +0100
#   check power supply and fans redundancy
#               gcivitella@enter.it
#   --
# 1.41          Thu, 26 Jul 2007 17:42:36 +0200
#   perfdata label ist now quoted
#   --
# 1.4           Mon, 25 Jun 2007 09:45:52 +0200
#   check vrm and power supply
#
#   new option "--notemperatures"
#
#   new option "--perfdata"
#
#   some minor changes
#   --
# 1.3beta       Wed, 20 Jun 2007 09:57:46 +0200
#   do some error checking
#
#   new option "--inputfile"
#   read bmc output from file
#   --
# 1.2   Mon, 18 Jun 2007 09:33:17 +0200
#   new option "--skipsyntaxerrors"
#   ignores syntax errors in the xml output, maybe required by older firmwares
#
#   introduce a date to the changelog ;)
#   --
# 1.1   do not return warning if temperature status is n/a
#
#   add "<LOCFG VERSION="2.21" />" to get rid of the
#   "<INFORM>Scripting utility should be updated to the latest version.</INFORM>"
#   message
#   --
# 1     initial release

use strict;
use warnings;
use strict 'refs';

use Monitoring::Plugin;
use Sys::Hostname;
use IO::Socket::SSL;
use XML::Simple;

$Net::SSLeay::slowly = 5;

use vars qw($VERSION $PROGNAME  $verbose $warn $critical $timeout $result);
$VERSION = 1.62;

$PROGNAME = "check_ilo2_health";

# instantiate Nagios::Plugin
our $p = Monitoring::Plugin->new(
        usage => "Usage: %s [-H <host>] [ -u|--user=<USERNAME> ]
  [ -p|--password=<PASSWORD> ] [ -f|--inputfile=<filename> ]
  [ -a|--fanredundancy ] [ -c|--checkdrives ] [ -d|--perfdata ]
  [ -e|--skipsyntaxerrors ] [ -n|--notemperatures ] [ -3|--ilo3 ]
  [ -o|--powerredundancy ] [ -b|--locationlabel ] [ -l|--eventlogcheck]
  [ -i|--ignorelinkdown ] [ -x|--ignorebatterymissing ] [ -s|--sslv3 ]
  [ -t <timeout> ] [ -r <retries> ] [ -g|--getinfos ] [ --sslopts ]
  [ -v|--verbose ] ",
        version => $VERSION,
        blurb => 'This plugin checks the health status on a remote iLO2|3|4 device
and will return OK, WARNING or CRITICAL. iLO (integrated Lights-Out)
can be found on HP Proliant servers.'
);

$p->add_arg(
  spec => 'host|H=s',
  help =>
  qq{-H, --host=STRING
  Specify the host on the command line.},
);

# add all arguments
$p->add_arg(
  spec => 'user|u=s',
  help =>
  qq{-u, --user=STRING
  Specify the username on the command line.},
);

$p->add_arg(
  spec => 'password|p=s',
  help =>
  qq{-p, --password=STRING
  Specify the password on the command line.},
);

$p->add_arg(
  spec => 'inputfile|f=s',
  help =>
  qq{-f, --inputfile=STRING
  Read input from file.},
);

$p->add_arg(
  spec => 'fanredundancy|a',
  help =>
  qq{-a, --fanredundancy
  Check fan redundancy},
);

$p->add_arg(
  spec => 'checkdrives|c',
  help =>
  qq{-c, --checkdrives
  Check drive bays.},
);

$p->add_arg(
  spec => 'perfdata|d',
  help =>
  qq{-d, --perfdata
  Enable perfdata on output.},
);

$p->add_arg(
  spec => 'locationlabel|b',
  help =>
  qq{-b, --locationlabel
  Show temperature with location.},
);

$p->add_arg(
  spec => 'eventlogcheck|l',
  help =>
  qq{-l, --eventlogcheck
  Parse ILO eventlog for interesting events (f.e. broken memory).},
);


$p->add_arg(
  spec => 'skipsyntaxerrors|e',
  help =>
  qq{-e, --skipsyntaxerrors
  Skip syntax errors on older firmwares.},
);

$p->add_arg(
  spec => 'ignorebatterymissing|x',
  help =>
  qq{-x, --ignorebatterymissing
  Ignore Battery missing status.},
);

$p->add_arg(
  spec => 'ignorelinkdown|i',
  help =>
  qq{-i, --ignorelinkdown
  Ignore NIC Link Down status (iLO4).},
);

$p->add_arg(
  spec => 'notemperatures|n',
  help =>
  qq{-n, --notemperatures
  Disable temperature listing.},
);

$p->add_arg(
  spec => 'powerredundancy|o',
  help =>
  qq{-o, --powerredundancy
  Check power redundancy.},
);

$p->add_arg(
  spec => 'getinfos|g',
  help =>
  qq{-g, --getinfos
  Display additional infos like firmware version and servername. May need increased timeout.},
);

$p->add_arg(
  spec => 'ilo3|3',
  help =>
  qq{-3, --ilo3
  Check iLO3|4 device.},
);

$p->add_arg(
  spec => 'retries|r=i',
  help => 
  qq{-r, --retries=INTEGER
  Number of retries.},
);

$p->add_arg(
  spec => 'sslv3|s',
  help => 
  qq{-s, --sslv3
  Use sslv3 for connection.},
);

$p->add_arg(
  spec => 'sslopts=s',
  help => 
  qq{--sslopts
  Sets IO::Socket:SSL Options, defaults to 'SSL_verify_mode => SSL_VERIFY_NONE'.
  Some firmware may need --sslopts 'SSL_verify_mode => SSL_VERIFY_NONE, SSL_version => "TLSv1"'.},
);

# parse arguments
$p->getopts;

my $return = "OK";
my $message = "";
our $xmlinput = "";
our $isinput = 0;
our $drive_input = "";
our $is_drive_input = 0;
our $drive_xml_broken = 0;
our $client;
our $is_event_input = 0;
our $event_severity = "";
our $event_class = "";
our $event_description = "";
our %event_status;
my $host = $p->opts->host;
my $hostname = $p->opts->host;
my $username = $p->opts->user;
my $password = $p->opts->password;
my $inputfile = $p->opts->inputfile;
our $skipsyntaxerrors = defined($p->opts->skipsyntaxerrors) ? 1 : 0;
my $optfanredundancy = defined($p->opts->fanredundancy) ? 1 : 0;
my $optpowerredundancy = defined($p->opts->powerredundancy) ? 1 : 0;
my $notemperatures = defined($p->opts->notemperatures) ? 1 : 0;
our $optcheckdrives = defined($p->opts->checkdrives) ? 1 : 0;
my $optilo3 = defined($p->opts->ilo3) ? 1 : 0;
my $iloboardversion = defined($p->opts->ilo3) ? "ILO>=3" : "ILO2";
my $perfdata = defined($p->opts->perfdata) ? 1 : 0;
my $locationlabel = defined($p->opts->locationlabel) ? 1 : 0;
my $eventlogcheck = defined($p->opts->eventlogcheck) ? 1 : 0;
my $ignorelinkdown = defined($p->opts->ignorelinkdown) ? 1 : 0;
my $ignorebatterymissing = defined($p->opts->ignorebatterymissing) ? 1 : 0;
my $getinfos = defined($p->opts->getinfos) ? 1 : 0;
our %drives;
our $drive;
our $drivestatus;
our $product_name = "";
our $serial_number = "";
our $server_name = "";
my $retries=0;
my $xml;
my $sslv3 = defined($p->opts->sslv3) ? 1 : 0;
my $sslopts = 'SSL_verify_mode => SSL_VERIFY_NONE';
our @product;
our @serial;
our @sname;

$message = "(Board-Version: $iloboardversion) ";

unless ( ( defined($inputfile) ) ||
         ( defined($host) && defined($username) && defined($password) ) ) {
  $p->nagios_die("ERROR: Missing host, password and user.");
}

if ( defined ( $p->opts->retries ) ) {
  $retries = $p->opts->retries;
}

if ( defined ( $p->opts->sslopts ) ) {
  $sslopts = $p->opts->sslopts;
}

alarm $p->opts->timeout;

my $boundary;
our $sendsize;
my $localhost = hostname() || 'localhost';
print "hostname is $localhost\n" if ( $p->opts->verbose );

for (my $i=0;$i<=$retries;$i++) {
  print "retry: $i\n" if ( $p->opts->verbose );
  unless ( defined($inputfile) ) {
    # query code from locfg.pl
    # Set the default SSL port number if no port is specified
    $host .= ":443" unless ($host =~ m/:/);
    #
    # Open the SSL connection and the input file
    $client = new IO::Socket::SSL->new(PeerAddr => $host, eval $sslopts, $sslv3 ? 
    ( SSL_version => 'SSLv3' ) : () );
    unless ( $client ) {
      $p->nagios_exit(
        return_code => "UNKNOWN",
        message => "ERROR: Failed to establish SSL connection with $host $! $SSL_ERROR."
      );
    }

    if ( $optilo3 ) {
      print "sending ilo3\n" if ( $p->opts->verbose );
      my $cmd = '<?xml version="1.0"?>';
      $cmd .= '<LOCFG VERSION="2.21" />';
      $cmd .= '<RIBCL VERSION="2.21">';
      $cmd .= '<LOGIN USER_LOGIN="'.$username.'" PASSWORD="'.$password.'">';
      $cmd .= '<SERVER_INFO MODE="read">';
      $cmd .= '<GET_EMBEDDED_HEALTH />';
      if ( $eventlogcheck ) { 
        $cmd .= '<GET_EVENT_LOG />';
      }
      if ( $getinfos ) {
        $cmd .= '<GET_HOST_DATA />';
        $cmd .= '<GET_PRODUCT_NAME />';
        $cmd .= '<GET_SERVER_NAME />';
      }
      $cmd .= '</SERVER_INFO>';
      $cmd .= '</LOGIN>';
      $cmd .= '</RIBCL>';
      $cmd .= "\r\n";
      send_or_calculate(0,$cmd);

      send_to_client(0, "POST /ribcl HTTP/1.1\r\n");
      send_to_client(0, "HOST: $hostname\r\n");          # Mandatory for http 1.1
      send_to_client(0, "TE: chunked\r\n");
      send_to_client(0, "Connection: Close\r\n");         # Required
      send_to_client(0, "Content-length: $sendsize\r\n"); # Mandatory for http 1.1
      send_to_client(0, "\r\n");
      send_or_calculate(1,$cmd);  #Send it to iLO
    }
    else {
      # send xml to BMC
      print $client '<?xml version="1.0"?>' . "\r\n";
      print $client '<LOCFG VERSION="2.21" />' . "\r\n";
      print $client '<RIBCL VERSION="2.21">' . "\r\n";
      print $client '<LOGIN USER_LOGIN="'.$username.'" PASSWORD="'.$password.'">' . "\r\n";
      print $client '<SERVER_INFO MODE="read">' . "\r\n";
      print $client '<GET_EMBEDDED_HEALTH />' . "\r\n";
      if ( $eventlogcheck ) { 
        print $client '<GET_EVENT_LOG />' . "\r\n"; 
      }
      if ( $getinfos ) {
        print $client '<GET_HOST_DATA />' . "\r\n";
        print $client '<GET_PRODUCT_NAME />' . "\r\n";
        print $client '<GET_SERVER_NAME />' . "\r\n";
      }
      print $client '</SERVER_INFO>' . "\r\n";
      print $client '</LOGIN>' . "\r\n";
      print $client '</RIBCL>' . "\r\n";
    }
  }
  else {
    open($client,$inputfile) or $p->nagios_die("ERROR: $inputfile not found");
  }

# retrieve data
  if ( $optilo3 && !$inputfile ) {
    read_chunked_reply();
  }
  else {
    while (my $ln = <$client>) {
      parse_reply($ln);
    }
    close $client;
  }

# parse with XML::Simple
  if ( $xmlinput && $isinput == 0 ) {
    $xml = eval { XMLin($xmlinput, ForceArray => 1) };
    if ( $@ ) {
      if ( $i < $retries ) { 
        next;
      }
      $p->nagios_exit(
        return_code => "UNKNOWN",
        message => "ERROR: $@"
      );
    }
    else {
      last;
    }
  }
  else {
    $p->nagios_exit(
      return_code => "UNKNOWN",
      message => "ERROR: No parseable output."
    );
  }
}

if ( $getinfos ) {
  $serial_number = "";
  if ( defined $serial[3] ) {
    $serial[3] =~ tr/ //ds ;
    $serial_number = "Serial: $serial[3]";
  }
  $server_name = "";
  $server_name = " - Servername: $sname[1]" if defined $sname[1];
  my $system_rom = undef;
  my $firmware_name = undef;
  my $firmware_version = undef;

  # loop through firmware hash
  foreach my $index (keys %{ $xml->{'FIRMWARE_INFORMATION'}[0] }) {
    if (defined $xml->{'FIRMWARE_INFORMATION'}[0]->{$index}[0]->{'FIRMWARE_NAME'}[0]->{'VALUE'}) {
      if ($xml->{'FIRMWARE_INFORMATION'}[0]->{$index}[0]->{'FIRMWARE_NAME'}[0]->{'VALUE'} eq "iLO") {
        $firmware_name    = 'iLO';
        $firmware_version = $xml->{'FIRMWARE_INFORMATION'}[0]->{$index}[0]->{'FIRMWARE_VERSION'}[0]->{'VALUE'};
      } elsif ($xml->{'FIRMWARE_INFORMATION'}[0]->{$index}[0]->{'FIRMWARE_NAME'}[0]->{'VALUE'} eq "HP ProLiant System ROM") {
        $system_rom = $xml->{'FIRMWARE_INFORMATION'}[0]->{$index}[0]->{'FIRMWARE_VERSION'}[0]->{'VALUE'};
      }
    }
  }

  my $product_name = undef;
  if (! defined $product[1]) {
    $product_name = "Unknown product";
  } else {
    $product_name = $product[1]
  }
  $serial_number = "Unknown serial" if ! defined $serial_number;
  $firmware_name = "iLO" if ! defined $firmware_name;
  $firmware_version = "Unknown firmware version" if ! defined $firmware_version;
  $server_name = "Unknown server name" if ! defined $server_name;

  if (defined $system_rom) {
    $message = "($product_name - SystemROM: $system_rom - $serial_number - $firmware_name FW $firmware_version" . "$server_name) ";
  } else {
    $message = "($product_name - $serial_number - $firmware_name FW $firmware_version" . "$server_name) ";
  }
}

my $drive_xml;
if ( $optcheckdrives && !$drive_xml_broken ) {
  if ( $drive_input && $is_drive_input == 0 ) {
    $drive_xml = eval { XMLin($drive_input, ForceArray => 1) };
    if ( $@ ) {
      $p->nagios_exit(
        return_code => "UNKNOWN",
        message => "ERROR: $@"
      );
    }
  }
  elsif ( ref $xml->{'STORAGE'}[0]->{'CONTROLLER'} ) {
    # iLO4 specific, no need for $drive_input
  }
  else {
    # No need to error out if host uncapable of checking drive status
    warn "No drive_input found" if ( $p->opts->verbose );
  }
}

my $temperatures = $xml->{'TEMPERATURE'}[0]->{'TEMP'};
my $backplanes = $drive_xml->{'BACKPLANE'};
my $raidcontroller = $xml->{'STORAGE'}[0]->{'CONTROLLER'};
my @checks;
push(@checks,$xml->{'FANS'}[0]->{'FAN'});
push(@checks,$xml->{'VRM'}[0]->{'MODULE'});
push(@checks,$xml->{'POWER_SUPPLIES'}[0]->{'SUPPLY'});
if($xml->{'PROCESSORS'}) {
        push(@checks,$xml->{'PROCESSORS'}[0]->{'PROCESSOR'});
}
my $memdetails;
if($xml->{'MEMORY'}) {
        $memdetails = $xml->{'MEMORY'}[0]->{'MEMORY_DETAILS'}[0];
}
my $health = $xml->{'HEALTH_AT_A_GLANCE'}[0];
my $label;
my $status;
my $temperature;
my $cautiontemp;
my $criticaltemp;

## check overall health status

my $componentstate;
foreach (keys %{$health}) {
  $componentstate = $health->{$_}[0]->{'STATUS'};
  if ( defined($componentstate) && ( $componentstate !~ m/^Ok$|^OTHER$|^NOT APPLICABLE$/i ) ) {
    if ($_ eq 'STORAGE') {
      if ( ref($raidcontroller) ) {
       # For iLO4 we can look at the raid controller to get a more detailed
       # status, so just log a WARNING unless we find something CRITICAL
       # later on.
       $return = "WARNING" unless ( $return eq "CRITICAL" );
      }
      else {
       $return = "CRITICAL";
      }
    }
    elsif ( ( $_ eq 'BATTERY' ) && $ignorebatterymissing && ( $componentstate =~ m/^Not Installed$/i ) ) {
      next;
    }
    elsif ( ( $_ eq 'NETWORK' ) && $ignorelinkdown && ( $componentstate =~ m/^Link Down$/i || $componentstate =~ m/^Degraded$/i ) ) {
      next;
    }
    else {
      $return = "CRITICAL";
    }
    $message .= "$_ $componentstate, ";
  }
}

if ( $optpowerredundancy ) {
  my $powerredundancy = $health->{'POWER_SUPPLIES'}[1]->{'REDUNDANCY'};
  if ( defined($powerredundancy) &&
    ( $powerredundancy !~ m/^Fully Redundant$|^REDUNDANT$|^NOT APPLICABLE$/i ) ) {
    $return = "CRITICAL";
    $message .= "Power supply $powerredundancy, ";
  }
}

if ( $optfanredundancy ) {
  my $fanredundancy = $health->{'FANS'}[1]->{'REDUNDANCY'};
  if ( defined($fanredundancy) &&
    ( $fanredundancy !~ m/^Fully Redundant$|^REDUNDANT$|^NOT APPLICABLE$/i ) ) {
    $return = "CRITICAL";
    $message .= "Fans $fanredundancy, ";
  }
}

# check fans, vrm and power supplies
foreach my $check ( @checks ) {
  if ( ref($check) ) {
    foreach my $item ( @$check ) {
      $label=$item->{'LABEL'}[0]->{'VALUE'};
      $status=$item->{'STATUS'}[0]->{'VALUE'};
      if ( defined($label) && defined($status) ) {
        # misleading output on some iLO3 shows always failed, skip it
        if ($label =~ m/^Power Supplies$/) {
          next;
        }
        if ($label =~ m/^Power Supply/) {
          # get details for power supplies
          $label =~ s/ /_/g;
          if ( ( $status !~ m"^Ok$|^Good|^n/a$|^Not Installed$|^Unknown$"i ) ) {
            $return = "WARNING" unless ( $return eq "CRITICAL" );
            if ( defined($item->{'MODEL'}[0]->{'VALUE'}) ) {
              $message .= "$label is $status (ModelNumber: $item->{'MODEL'}[0]->{'VALUE'}) ";
            }
            else {
              $message .= "$label is $status ";
            }
          }
          next;
        }
        $label =~ s/ /_/g;
        if ( ( $status !~ m"^Ok$|^Good|^n/a$|^Not Installed$|^Unknown$"i ) ) {
          $return = "WARNING" unless ( $return eq "CRITICAL" );
          $message .= "$label: $status, ";
        }
      }
    }
  }
}

# check memory status (iLO4 only?)
if ( ref($memdetails) ) {
  foreach my $loc ( sort keys %{$memdetails} ) {
    foreach ( @{$memdetails->{$loc}} ) {
      $status = $_->{'STATUS'}[0]->{'VALUE'};
      if ( ( $status !~ m"^Ok$|^Good|^n/a$|^Not Present$"i ) ) {
        $return = "WARNING" unless ( $return eq "CRITICAL" );
        my $socket = $_->{'SOCKET'}[0]->{'VALUE'};
        my $size = $_->{'SIZE'}[0]->{'VALUE'};
        my $part = $_->{'PART'}[0]->{'NUMBER'};
        if ( defined $part ) {
          # works only with new iLO4 firmware
          $message .= "Mem $loc $socket: $status (Size: $size, PartNumber: $part), ";
        }
        else {
          $message .= "Mem $loc $socket: $status (Size: $size), ";
        }
      }
    }
  }
}

# check newer drive bays (iLO3)
if ( ref($backplanes) ) {
  my $backplane = 0;
  foreach ( @{$backplanes} ) {
    if ( defined($_->{'ENCLOSURE_ADDR'}[0]->{'VALUE'} ) ) {
      $backplane = $_->{'ENCLOSURE_ADDR'}[0]->{'VALUE'};
    }
    else {
      $backplane++;
    }
    if ( $_->{'DRIVE_BAY'} ) {
      for ( my $i=0; $i<= $#{$_->{'DRIVE_BAY'}}; $i++ ) {
        $label=$backplane." ".$_->{'DRIVE_BAY'}[$i]->{'VALUE'};
        $status=$_->{'STATUS'}[$i]->{'VALUE'};
        $drives{$label}{'status'} = $status;
      }
    }
    if ( $_->{'DRIVE'} ) {
      for ( my $i=0; $i<= $#{$_->{'DRIVE'}}; $i++ ) {
        $label=$backplane." ".$_->{'DRIVE'}[$i]->{'BAY'};
        $status=$_->{'DRIVE_STATUS'}[$i]->{'VALUE'};
        $drives{$label}{'status'} = $status;
      }
    }
  }
}

# seems that iLO4 reads the state from the RAID controller, nice
if ( ref($raidcontroller) ) {
  foreach ( @{$raidcontroller} ) {
    my $ctrllabel = $_->{'LABEL'}[0]->{'VALUE'};
    my $ctrlstatus = $_->{'CONTROLLER_STATUS'}[0]->{'VALUE'} || $_->{'STATUS'}[0]->{'VALUE'};
    if($ctrlstatus ne 'OK') {
      $return = "CRITICAL";
      $message .= "SmartArray $ctrllabel Status: $ctrlstatus, ";
    }
    my $cachestatus = $_->{'CACHE_MODULE_STATUS'}[0]->{'VALUE'};
    if($cachestatus && $cachestatus ne 'OK') {
      # FIXME: There are probably other valid cache module states that
      #        needs to be excluded.
      $return = "CRITICAL";
      $message .= "SmartArray $ctrllabel Cache Status: $cachestatus, ";
    }
    foreach ( @{$_->{'DRIVE_ENCLOSURE'}} ) {
      my $enclabel = $_->{'LABEL'}[0]->{'VALUE'};
      my $encstatus = $_->{'STATUS'}[0]->{'VALUE'};
      my $encmodel = $_->{'MODEL_NUMBER'}[0]->{'VALUE'};
      if($encstatus ne 'OK') {
              $message .= "SmartArray $ctrllabel Enclosure $enclabel: $encstatus (ModelNumber: $encmodel) - check hardware status in OS, ";
        $return = "CRITICAL";
      }
    }
    foreach ( @{$_->{'LOGICAL_DRIVE'}} ) {
      my $ldlabel = $_->{'LABEL'}[0]->{'VALUE'};
      my $ldstatus = $_->{'STATUS'}[0]->{'VALUE'};
      if($ldstatus ne 'OK') {
              $message .= "SmartArray $ctrllabel LD $ldlabel: $ldstatus, ";
              if($ldstatus eq 'Degraded (Rebuilding)' || $ldstatus eq 'Degraded (Recovering)') {
                $return = "WARNING" unless ( $return eq "CRITICAL" );
              }
              else {
                $return = "CRITICAL";
              }
      }
      foreach ( @{$_->{'PHYSICAL_DRIVE'}} ) {
        $label = "$ctrllabel $_->{'LABEL'}[0]->{'VALUE'}";
        $status = $_->{'STATUS'}[0]->{'VALUE'};
        my $model = $_->{'MODEL'}[0]->{'VALUE'};
        $drives{$label}{'status'} = $status;
        $drives{$label}{'model'} = $model;
      }
    }
  }
}

# check drive bays
if ( $optcheckdrives ) {
  foreach ( sort keys(%drives) ) {
    if ( ( $drives{$_}{'status'} !~ m"^(Ok)$|^(n/a)$|^(Spare)$|^(Not Installed)|^(Not Present/Not Installed)$|^(spun down)$"i ) ) {
      $return = "CRITICAL";
      $message .= "$_: ".$drives{$_}{'status'};
      if (defined $drives{$_}{'model'}){
        $message .= " (Drive ModelNumber: " . $drives{$_}{'model'} ."), ";
      }
    }
  }
}

# check event logs
if ( $eventlogcheck ) {
  foreach ( keys %event_status ) {
    next if ( $event_status{$_} =~ m/Repaired/ );
    $message .= " $_:$event_status{$_} ";
    $return = "WARNING" unless ( $return eq "CRITICAL" );
  }
}

unless ( $message ) {
  $message .= "No faults detected, ";
}

# check temperatures
if ( ref($temperatures) ) {
  unless ( $notemperatures ) {
    $message .= "Temperatures: ";
  }
  foreach my $temp ( @$temperatures ) {
    $label=$temp->{'LABEL'}[0]->{'VALUE'};
    if ( $locationlabel && defined($temp->{'LOCATION'}[0]->{'VALUE'}) ) {
      $label .= " (" . $temp->{'LOCATION'}[0]->{'VALUE'} . ")";
    }
    $status=$temp->{'STATUS'}[0]->{'VALUE'};
    $temperature=$temp->{'CURRENTREADING'}[0]->{'VALUE'};
    if ( defined($label) && defined($status) && defined($temperature) ) {
      $label =~ s/ /_/g;
      unless ( ( $status =~ m"^Ok$|^n/a$|^Not Installed$"i ) ) {
        $return = "WARNING" unless ( $return eq "CRITICAL" );
        $message .= "$label ($status): $temperature, "
          if ( $notemperatures );
      }
      unless ( ( $status =~ m"^n/a$|^Not Installed$"i ) )  {
        $message .= "$label ($status): $temperature, "
          unless ( $notemperatures );
        if ( $perfdata ) {
          $cautiontemp=$temp->{'CAUTION'}[0]->{'VALUE'};
          $criticaltemp=$temp->{'CRITICAL'}[0]->{'VALUE'};
          # Returned value can be 'N/A', enforce this being a number
          if($cautiontemp && $cautiontemp !~ /^[0-9]+/) {
                $cautiontemp=undef;
          }
          if($criticaltemp && $criticaltemp !~ /^[0-9]+/) {
                $criticaltemp=undef;
          }
          if ( defined($cautiontemp) && defined($criticaltemp) ) {
            $p->set_thresholds(
              warning  => $cautiontemp,
              critical => $criticaltemp,
            );
            my $threshold = $p->threshold;
            # add perfdata
            $p->add_perfdata(
              label   => $label,
              value   => $temperature,
              uom     => "",
              threshold => $threshold,
            );
          }
        }
      }
    }
    else {
      $message .= "no reading, ";
    }
  }
}


# strip trailing ","
$message =~ s/, $//;

$p->nagios_exit(
  return_code => $return,
  message => $message
);


# send_to_client, send_or_calculate and read_chunked_reply
# are adapted from locfg.pl

sub send_to_client
{
  my ($send, $cmd) = @_;
  print $cmd if ( $p->opts->verbose && length($cmd) < 1024 );
  print $client $cmd;
  $sendsize -= length($cmd) if ( $send );
}

sub send_or_calculate    # used for iLO 3 only
{
  $sendsize = 0;
  my ($send, $cmd) = @_;
  if ($send) {
    print $client $cmd;
  }
  $sendsize += length($cmd);
  print "size $sendsize\n" if ( $p->opts->verbose );
}


sub read_chunked_reply    # used for iLO 3 only
{
  my $ln = "";
  my $lp = "";
  my $hide = 1;
  my $chunk = 1;
  my $chunkSize;

  while( 1 ) {
    # Read a line
    $ln = <$client>;
    # Get length of line
    my $length =  length($ln);
    # Exit loop if zero
    if ( $length == 0 ) {
      if ( $p->opts->verbose ) {
        print "read_chunked_reply: read a zero-length line. Continue...\n";
      }
      last;
    }
    # Skip HTTP headers and first line of chunked responses
    if ( $hide ) {
        $hide = 0 if ( $ln =~ m/^\r\n$/ );
        print "Head: " . $ln if ( $p->opts->verbose );
        next;
    }
    # Get size of chunk
    if ( $chunk ) {
      print "chunk: " . $ln if ( $p->opts->verbose );
      $ln =~ s/\r|\n//g;
      $chunkSize = hex($ln);
      $chunk = 0;
      print "chunk size: $chunkSize\n" if ( $p->opts->verbose );
      next;
    }
    # Last Chunk
    if ( $chunkSize == 0 ) {
      print "read_chunked_reply: reach end of responses.\n" if ($p->opts->verbose);
      last;
    }
    # End of chunk, process incomplete line
    if ( $chunkSize < $length ) {
      $chunk = 1; # Next line, new chunk
      $hide = 0;  # Skip hide
      $lp .= substr($ln, 0, $chunkSize); # Truncate and append
    }
    # End of chunk, process complete line
    elsif ( $chunkSize == $length ) {
      $chunk = 1; # Next line, new chunk
      $hide = 1;  # Hide new chunk's first line
      $lp .= $ln; # Append line as-is
    }
    # Process line
    else {
      $chunkSize -= $length; # Decrement chunk size
      $lp .= $ln; # Append line as-is
    }
    # Skip incomplete line
    next unless ( $lp =~ m/\n$/ );
    # Parse complete line
    parse_reply($lp);
    # Line parsed, clear line
    $lp = "";
  }
  if ($client->error()) {
     print "Error: connection error " . $client->error() . "\n";
  }
}

sub parse_reply
{
  my ($line) = @_;
  $line =~ s/\r\n$/\n/;
  print $line if ( $p->opts->verbose );

  if ( $getinfos ) {
    # Prune all unnecessary lines
    $isinput = 1 if ( $line =~ m"<GET_EMBEDDED_HEALTH_DATA>|</DRIVES>" );
    $xmlinput .= $line if ( $isinput );
    $isinput = 0 if ( $line =~ m"</GET_EMBEDDED_HEALTH_DATA>|<DRIVES>" );
    $product_name = $line if ( $line =~ m"<PRODUCT_NAME VALUE" );
    $serial_number = $line if ( $line =~ m/FIELD NAME="Serial Number"/ );
    $server_name = $line if ( $line =~ m"<SERVER_NAME" );
    @product = split (/"/, $product_name)  if defined $product_name;
    @serial  = split (/"/, $serial_number) if defined $serial_number;
    @sname   = split (/"/, $server_name )  if defined $server_name;
  }
  else {
    # Prune all unnecessary lines
    $isinput = 1 if ( $line =~ m"<GET_EMBEDDED_HEALTH_DATA>|</DRIVES>|</FIRMWARE_INFORMATION>" );
    $xmlinput .= $line if ( $isinput );
    $isinput = 0 if ( $line =~ m"</GET_EMBEDDED_HEALTH_DATA>|<DRIVES>|<FIRMWARE_INFORMATION>" );
  }

  # drive check needs special handling
  # <DRIVES>
  #    <BACKPLANE>
  #       <FIRMWARE_VERSION VALUE="1.18"/>
  #       <ENCLOSURE_ADDR VALUE="224"/>
  #     <DRIVE_BAY VALUE = "1"/>
  #       <PRODUCT_ID VALUE = "EH0300FBQDD    "/>
  #       <STATUS VALUE = "Ok"/>
  #       <UID_LED VALUE = "Off"/>
  #     <DRIVE_BAY VALUE = "2"/>
  #       <PRODUCT_ID VALUE = "EH0300FBQDD    "/>
  #       <STATUS VALUE = "Fault"/>
  #       <UID_LED VALUE = "Off"/>
  #    </BACKPLANE>
  # </DRIVES>

  $is_drive_input = 1 if ( $line =~ m"<DRIVES>" );
  $drive_input .= $line if ( $is_drive_input );
  $is_drive_input = 0 if ( $line =~ m"</DRIVES>" );

  # because on many (older?) iLOs drive status is not XML
  if ($optcheckdrives) {
    if ( $line =~ m/<Drive Bay: / ) {
      $drive_xml_broken = 1;
      # <Drive Bay: "3"; status: "Smart Error"; uid led="Off"/>
      ( $drive, $drivestatus ) = ( $line =~
        m/Drive Bay: "(.*)"; status: "(.*)"; uid led: ".*"/ );
      if ( defined($drive) && defined($drivestatus) ) {
        $drives{$drive} = $drivestatus;
      }
    }
    if ( $line =~ m/<DRIVE BAY=".*" PRODUCT_ID="/ ) {
      $drive_xml_broken = 1;
      # <DRIVE BAY="3" PRODUCT_ID="N/A"STATUS="Smart Error" UID_LED="Off"/>
      ( $drive, $drivestatus ) = ( $line =~
        m/DRIVE BAY="(.*)" PRODUCT_ID=".*"STATUS="(.*)" UID_LED=".*"/ );
      if ( defined($drive) && defined($drivestatus) ) {
        $drives{$drive} = $drivestatus;
      }
    }
  }

  if ( $eventlogcheck ) {
    $is_event_input = 1 if ( $line =~ m"<EVENT" );
    if ( $is_event_input ) {
      if ( $line =~ m/SEVERITY="(.*?)"/ ) {
        $event_severity = $1;
        #print "SEV: $event_severity\n";
      }
      if ( $line =~ m/CLASS="(.*?)"/ ) {
        $event_class = $1;
        #print "CLASS: $event_class\n";
      }
      if ( $line =~ m/DESCRIPTION="(.*?)"/ ) {
        $event_description = $1;
        #print "DESCRIPTION: $event_description\n";
      }
    }
    $is_event_input = 0 if ( $is_event_input && $line =~ m"/>" );
    if ( $is_event_input == 0 && $event_class ) {
      if ( ($event_class !~ m/POST|Maintenance/) && ( $event_severity !~ m/Informational/) ) {
        $event_status{$event_description} = $event_severity;
        $event_class = "";
      }
    }
  }

  if ( $line =~ m/MESSAGE='(.*)'/ ) {
    my $msg = $1;

    if ( $msg =~ m/No error/i ) {
      # Skip
    }
    elsif ( $msg =~ m/Syntax error/i && $skipsyntaxerrors ) {
      # Skip
    }
    else {
      close $client;
      $p->nagios_exit(
        return_code => "UNKNOWN",
        message => "ERROR: $msg."
      );
    }
  }
}





相关文章
|
1月前
|
监控 数据可视化 BI
服务器监控软件Zabbix
【10月更文挑战第19天】
40 6
|
3月前
|
存储 弹性计算 运维
自动化监控和响应ECS系统事件
阿里云提供的ECS系统事件用于记录云资源信息,如实例启停、到期通知等。为实现自动化运维,如故障处理与动态调度,可使用云助手插件`ecs-tool-event`。该插件定时获取并转化ECS事件为日志存储,便于监控与响应,无需额外开发,适用于大规模集群管理。详情及示例可见链接文档。
|
3月前
|
存储 监控 Linux
监控Linux服务器
详细介绍了如何监控Linux服务器,包括监控CPU、内存、磁盘存储和带宽的使用情况,以及使用各种系统监控工具如vmstat、iostat、sar、top和dstat来分析系统性能,并推荐了一些开源监控系统。
57 0
监控Linux服务器
|
4月前
|
Prometheus 监控 Cloud Native
Web服务器的日志分析与监控
【8月更文第28天】Web服务器日志提供了关于服务器活动的重要信息,包括访问记录、错误报告以及性能数据。有效地分析这些日志可以帮助我们了解用户行为、诊断问题、优化网站性能,并确保服务的高可用性。本文将介绍如何使用日志分析和实时监控工具来监测Web服务器的状态和性能指标,并提供具体的代码示例。
473 0
|
4月前
|
监控 Linux 测试技术
|
4月前
|
监控 关系型数据库 MySQL
如何在 Ubuntu 16.04 上安装和配置 Zabbix 以安全监控远程服务器
如何在 Ubuntu 16.04 上安装和配置 Zabbix 以安全监控远程服务器
38 0
|
7月前
|
弹性计算 监控 安全
【阿里云弹性计算】ECS实例监控与告警系统构建:利用阿里云监控服务保障稳定性
【5月更文挑战第23天】在数字化时代,阿里云弹性计算服务(ECS)为业务连续性提供保障。通过阿里云监控服务,用户可实时监控ECS实例的CPU、内存、磁盘I/O和网络流量等指标。启用监控,创建自定义视图集中显示关键指标,并设置告警规则(如CPU使用率超80%),结合多种通知方式确保及时响应。定期维护和优化告警策略,利用健康诊断工具,能提升服务高可用性和稳定性,确保云服务的卓越性能。
264 1
|
7月前
|
运维 监控 Linux
提升系统稳定性:Linux服务器性能监控与故障排查实践深入理解与实践:持续集成在软件测试中的应用
【5月更文挑战第27天】在互联网服务日益增长的今天,保障Linux服务器的性能和稳定性对于企业运维至关重要。本文将详细探讨Linux服务器性能监控的工具选择、故障排查流程以及优化策略,旨在帮助运维人员快速定位问题并提升系统的整体运行效率。通过实际案例分析,我们将展示如何利用系统资源监控、日志分析和性能调优等手段,有效预防和解决服务器性能瓶颈。
|
7月前
|
缓存 监控 安全
zabbix服务器监控之了解agent的启动过程
zabbix服务器监控之了解agent的启动过程
206 0
|
7月前
|
弹性计算 监控 Shell
监控HTTP 服务器的状态
【4月更文挑战第29天】
65 0

推荐镜像

更多