Nagios利用NRPE监控Linux主机-阿里云开发者社区

一、简介

1、NRPE介绍

NRPE是Nagios的一个功能扩展，它可在远程Linux/Unix主机上执行插件程序。通过在远程服务器上安装NRPE插件及Nagios插件程序来向Nagios监控平台提供该服务器的本地情况，如CPU负载，内存使用，磁盘使用等。这里将Nagios监控端称为Nagios服务器端，而将远程被监控的主机称为Nagios客户端。

Nagios监控远程主机的方法有多种，其方式包括SNMP，NRPE,SSH,NCSA等。这里介绍其通过NRPE监控远程Linux主机的方式。

NRPE（Nagios Remote Plugin Executor）是用于在远端服务器上运行监测命令的守护进程，它用于让Nagios监控端基于安装的方式触发远端主机上的检测命令，并将检测结果返回给监控端。而其执行的开销远低于基于SSH的检测方式，而且检测过程不需要远程主机上的系统账号信息，其安全性也高于SSH的检测方式。

2、NRPE的工作原理

NRPE有两部分组成

check_nrpe插件：位于监控主机上

nrpe daemon：运行在远程主机上，通常是被监控端agent

注意：nrpe daemon需要Nagios-plugins插件的支持，否则daemon不能做任何监控

详细的介绍NRPE的工作原理

当Nagios需要监控某个远程Linux主机的服务或者资源情况时：

首先：Nagios会运行check_nrpe这个插件，告诉它要检查什么；

其次：check_nrpe插件会连接到远程的NRPE daemon，所用的方式是SSL；

然后：NRPE daemon 会运行相应的Nagios插件来执行检查；

最后：NRPE daemon 将检查的结果返回给check_nrpe 插件，插件将其递交给nagios做处理。

二、被监控端安装Nagios-plugins插件和NRPE

1、添加nagios用户

 
        [root@ClientNrpe ~]
        # useradd -s /sbin/nologin nagios

2、安装nagios-plugins，因为NRPE依赖此插件

 
        [root@ClientNrpe ~]
        # yum -y install gcc gcc-c++ make openssl openssl-devel 
       
        [root@ClientNrpe ~]
        # tar xf nagios-plugins-2.0.3.tar.gz  
       
        [root@ClientNrpe ~]
        # cd nagios-plugins-2.0.3 
       
        [root@ClientNrpe nagios-plugins-2.0.3]
        # ./configure  --with-nagios-user=nagios --with-nagios-group=nagios 
       
        [root@ClientNrpe nagios-plugins-2.0.3]
        # make && make install 
       
        #注意：如何要监控mysql 需要添加 --with-mysql

3、安装NRPE

 
        [root@ClientNrpe ~]
        # tar xf nrpe-2.15.tar.gz  
       
        [root@ClientNrpe ~]
        # cd nrpe-2.15 
       
        [root@ClientNrpe nrpe-2.15]
        # ./configure --with-nrpe-user=nagios \ 
       
        > --with-nrpe-group=nagios \ 
       
        > --with-nagios-user=nagios \ 
       
        > --with-nagios-group=nagios \ 
       
        > --
        enable
        -
        command
        -args \ 
       
        > --
        enable
        -ssl 
       
        [root@ClientNrpe nrpe-2.15]
        # make all 
       
        [root@ClientNrpe nrpe-2.15]
        # make install-plugin 
       
        [root@ClientNrpe nrpe-2.15]
        # make install-daemon 
       
        [root@ClientNrpe nrpe-2.15]
        # make install-daemon-config

4、配置NRPE

 
        [root@ClientNrpe ~]
        # grep -v '^#' /usr/local/nagios/etc/nrpe.cfg |sed '/^$/d' 
       
        log_facility=daemon 
       
        pid_file=
        /var/run/nrpe
        .pid 
       
        server_port=5666             
        #监听的端口 
       
        nrpe_user=nagios 
       
        nrpe_group=nagios 
       
        allowed_hosts=192.168.0.105   
        #允许的地址通常是Nagios服务器端 
       
        dont_blame_nrpe=0 
       
        allow_bash_command_substitution=0 
       
        debug=0 
       
        command_timeout=60 
       
        connection_timeout=300 
       
        command
        [check_users]=
        /usr/local/nagios/libexec/check_users 
        -w 5 -c 10 
       
        command
        [check_load]=
        /usr/local/nagios/libexec/check_load 
        -w 15,10,5 -c 30,25,20 
       
        command
        [check_hda1]=
        /usr/local/nagios/libexec/check_disk 
        -w 20% -c 10% -p 
        /dev/hda1 
       
        command
        [check_zombie_procs]=
        /usr/local/nagios/libexec/check_procs 
        -w 5 -c 10 -s Z 
       
        command
        [check_total_procs]=
        /usr/local/nagios/libexec/check_procs 
        -w 150 -c 200

5、启动NRPE

 
        #以守护进程的方式启动 
       
        [root@ClientNrpe ~]
        # /usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d 
       
        [root@ClientNrpe ~]
        # netstat -tulpn | grep nrpe 
       
        tcp        0      0 0.0.0.0:5666                0.0.0.0:*                   LISTEN      22597
        /nrpe           
       
        tcp        0      0 :::5666                     :::*                        LISTEN      22597
        /nrpe

有两种方式用于管理nrpe服务，nrpe有两种运行模式：

 
        -i        
        # Run as a service under inetd or xinetd 
       
        -d        
        # Run as a standalone daemon

可以为nrpe编写启动脚本，使得nrpe以standard alone方式运行：

 
        [root@ClientNrpe ~]
        # cat /etc/init.d/nrped  
       
        #!/bin/bash 
       
        # chkconfig: 2345 88 12 
       
        # description: NRPE DAEMON 
       
        NRPE=
        /usr/local/nagios/bin/nrpe 
       
        NRPECONF=
        /usr/local/nagios/etc/nrpe
        .cfg 
       
        case 
        "$1" 
        in 
       
        start) 
       
        echo 
        -n 
        "Starting NRPE daemon..." 
       
        $NRPE -c $NRPECONF -d 
       
        echo 
        " done." 
       
        ;; 
       
        stop) 
       
        echo 
        -n 
        "Stopping NRPE daemon..." 
       
        pkill -u nagios nrpe 
       
        echo 
        " done." 
       
        ;; 
       
        restart) 
       
        $0 stop 
       
        sleep 
        2 
       
        $0 start 
       
        ;; 
       
        *) 
       
        echo 
        "Usage: $0 start|stop|restart" 
       
        ;; 
       
        esac 
       
        exit 
        0 
       
        [root@ClientNrpe ~]
        # chmod +x /etc/init.d/nrped  
       
        [root@ClientNrpe ~]
        # chkconfig --add nrped 
       
        [root@ClientNrpe ~]
        # chkconfig nrped on 
       
        [root@ClientNrpe ~]
        # service nrped start 
       
        Starting NRPE daemon... 
        done
        . 
       
        [root@ClientNrpe ~]
        # netstat -tnlp 
       
        Active Internet connections (only servers) 
       
        Proto Recv-Q Send-Q Local Address               Foreign Address             State       PID
        /Program 
        name    
       
        tcp        0      0 0.0.0.0:22                  0.0.0.0:*                   LISTEN      1031
        /sshd            
       
        tcp        0      0 127.0.0.1:25                0.0.0.0:*                   LISTEN      1108
        /master          
       
        tcp        0      0 0.0.0.0:5666                0.0.0.0:*                   LISTEN      22597
        /nrpe           
       
        tcp        0      0 :::22                       :::*                        LISTEN      1031
        /sshd            
       
        tcp        0      0 ::1:25                      :::*                        LISTEN      1108
        /master          
       
        tcp        0      0 :::5666                     :::*                        LISTEN      22597
        /nrpe

三、监控端安装NRPE

1、安装NRPE

 
        [root@Nagios ~]
        # tar xf nrpe-2.15.tar.gz  
       
        [root@Nagios ~]
        # cd nrpe-2.15 
       
        [root@Nagios nrpe-2.15]
        # ./configure  
       
        > --with-nrpe-user=nagios \ 
       
        > --with-nrpe-group=nagios \ 
       
        > --with-nagios-user=nagios \ 
       
        > --with-nagios-group=nagios \ 
       
        > --
        enable
        -
        command
        -args \ 
       
        > --
        enable
        -ssl 
       
        [root@Nagios nrpe-2.15]
        # make all 
       
        [root@Nagios nrpe-2.15]
        # make install-plugin 
       
        #安装完成后，会在Nagios安装目录的libexec下生成check_nrpe的插件 
       
        [root@Nagios ~]
        # cd /usr/local/nagios/libexec/ 
       
        [root@Nagios libexec]
        # ll -d check_nrpe  
       
        -rwxrwxr-x. 1 nagios nagios 76769 9月  28 08:07 check_nrpe

2、check_nrpe的用法

 
        [root@Nagios libexec]
        # ./check_nrpe -h 
       
        NRPE Plugin 
        for 
        Nagios 
       
        Copyright (c) 1999-2008 Ethan Galstad (nagios@nagios.org) 
       
        Version: 2.15 
       
        Last Modified: 09-06-2013 
       
        License: GPL v2 with exemptions (-l 
        for 
        more 
        info) 
       
        SSL
        /TLS 
        Available: Anonymous DH Mode, OpenSSL 0.9.6 or higher required 
       
        Usage: check_nrpe -H <host> [ -b <bindaddr> ] [-4] [-6] [-n] [-u] [-p <port>] [-t <timeout>] [-c <
        command
        >] [-a <arglist...>] 
       
        Options: 
       
        -n         = Do no use SSL 
       
        -u         = Make socket timeouts 
        return 
        an UNKNOWN state instead of CRITICAL 
       
        <host>     = The address of the host running the NRPE daemon 
       
        <bindaddr> = bind to 
        local 
        address 
       
        -4         = user ipv4 only 
       
        -6         = user ipv6 only 
       
        [port]     = The port on 
        which 
        the daemon is running (default=5666) 
       
        [timeout]  = Number of seconds before connection 
        times 
        out (default=10) 
       
        [
        command
        ]  = The name of the 
        command 
        that the remote daemon should run 
       
        [arglist]  = Optional arguments that should be passed to the 
        command
        .  Multiple 
       
        arguments should be separated by a space.  If provided, this must be 
       
        the last option supplied on the 
        command 
        line. 
       
        Note: 
       
        This plugin requires that you have the NRPE daemon running on the remote host. 
       
        You must also have configured the daemon to associate a specific plugin 
        command 
       
        with the [
        command
        ] option you are specifying here.  Upon receipt of the 
       
        [
        command
        ] argument, the NRPE daemon will run the appropriate plugin 
        command 
        and 
       
        send the plugin output and 
        return 
        code back to *this* plugin.  This allows you 
       
        to execute plugins on remote hosts and 
        'fake' 
        the results to 
        make 
        Nagios think 
       
        the plugin is being run locally.

 
        通过NRPE监控远程Linux主机要使用chech_nrpe插件进行，其语法格式如下： 
       
        check_nrpe -H <host> [-n] [-u] [-p <port>] [-t <timeout>] [-c <
        command
        >] [-a <arglist...>] 
       
        [root@Nagios libexec]
        # ./check_nrpe -H 192.168.0.81 
       
        NRPE v2.15

3、定义命令

 
        [root@Nagios ~]
        # cd /usr/local/nagios/etc/objects/ 
       
        [root@Nagios objects]
        # vim commands.cfg  
       
        #增加到末尾行 
       
        define 
        command
        { 
       
        command_name    check_nrpe 
       
        command_line    $USER1$
        /check_nrpe 
        -H 
        "$HOSTADDRESS$"  
        -c 
        "$ARG1$" 
       
        }

4、定义服务

 
        [root@Nagios objects]
        # cp windows.cfg linhost.cfg  
       
        [root@Nagios objects]
        # grep -v '^#' linhost.cfg |sed '/^$/d' 
       
        define host{ 
       
        use     linux-server    
       
        host_name   linhost 
       
        alias       
        My Linux Server   
       
        address     192.168.0.81    
       
        } 
       
        define service{ 
       
        use         generic-service 
       
        host_name       linhost 
       
        service_description CHECK USER 
       
        check_command       check_nrpe!check_users 
       
        } 
       
        define service{ 
       
        use         generic-service 
       
        host_name       linhost 
       
        service_description Load 
       
        check_command       check_nrpe!check_load 
       
        } 
       
        define service{ 
       
        use         generic-service 
       
        host_name       linhost 
       
        service_description SDA1 
       
        check_command       check_nrpe!check_hda1 
       
        } 
       
        define service{ 
       
        use         generic-service 
       
        host_name       linhost 
       
        service_description Zombie 
       
        check_command       check_nrpe!check_zombie_procs 
       
        } 
       
        define service{ 
       
        use         generic-service 
       
        host_name       linhost 
       
        service_description Total procs 
       
        check_command       check_nrpe!check_total_procs 
       
        }

这里重点说下，Nagios服务端定义服务的命令完全是根据被监控端NRPE中内置的监控命令，如下图所示

5、启动所定义的命令和服务

 
        [root@Nagios ~]
        # vim /usr/local/nagios/etc/nagios.cfg  
       
        #增加一行 
       
        cfg_file=
        /usr/local/nagios/etc/objects/linhost
        .cfg

6、配置文件语法检查

 
        [root@Nagios ~]
        # service nagios configtest 
       
        Nagios Core 4.0.7 
       
        Copyright (c) 2009-present Nagios Core Development Team and Community Contributors 
       
        Copyright (c) 1999-2009 Ethan Galstad 
       
        Last Modified: 06-03-2014 
       
        License: GPL 
       
        Website: http:
        //www
        .nagios.org 
       
        Reading configuration data... 
       
        Read main config 
        file 
        okay... 
       
        Read object config files okay... 
       
        Running pre-flight check on configuration data... 
       
        Checking objects... 
       
        Checked 20 services. 
       
        Checked 3 hosts. 
       
        Checked 2 host 
        groups
        . 
       
        Checked 0 service 
        groups
        . 
       
        Checked 1 contacts. 
       
        Checked 1 contact 
        groups
        . 
       
        Checked 26 commands. 
       
        Checked 5 
        time 
        periods. 
       
        Checked 0 host escalations. 
       
        Checked 0 service escalations. 
       
        Checking 
        for 
        circular paths... 
       
        Checked 3 hosts 
       
        Checked 0 service dependencies 
       
        Checked 0 host dependencies 
       
        Checked 5 timeperiods 
       
        Checking global event handlers... 
       
        Checking obsessive compulsive processor commands... 
       
        Checking misc settings... 
       
        Total Warnings: 0 
       
        Total Errors:   0 
       
        Things 
        look 
        okay - No serious problems were detected during the pre-flight check 
       
        Object precache 
        file 
        created: 
       
        /usr/local/nagios/var/objects
        .precache

7、重新启动nagios服务

 
        [root@Nagios ~]
        # service nagios restart 
       
        Running configuration check... 
       
        Stopping nagios: 
        done
        . 
       
        Starting nagios: 
        done
        .

8、打开Nagios web监控页面

1）首先点击【Hosts】查看监控主机状态是否为UP

2）其次点击【Services】查看各监控服务的状态是否为OK

注意：在监控新添加的主机linhost；出现状态为CRITICAL，提示没有那个文件或目录。下面是解决办法

在监控Linhost主机时出现一个CRITICAL的警告，查找解决办法

 
        ###被监控端修改NRPE配置文件并重启NRPE服务 
       
        [root@ClientNrpe etc]
        # vim nrpe.cfg  
       
        command
        [check_sda1]=
        /usr/local/nagios/libexec/check_disk 
        -w 20% -c 10% -p 
        /dev/sda1 
       
        [root@ClientNrpe etc]
        # service nrped restart 
       
        ###监控端修改linhost.cfg配置文件并重启nagios和httpd服务 
       
        [root@Nagios objects]
        # vim linhost.cfg  
       
        #注释：原来这里是hda1，现在修改成sda1 
       
        define service{ 
       
        use                     generic-service 
       
        host_name               linhost 
       
        service_description     SDA1 
       
        check_command           check_nrpe!check_sda1 
       
        } 
       
        [root@Nagios ~]
        # service nagios restart 
       
        Running configuration check... 
       
        Stopping nagios: 
        done
        . 
       
        Starting nagios: 
        done
        . 
       
        [root@Nagios ~]
        # service httpd restart 
       
        停止 httpd：                                               [确定] 
       
        正在启动 httpd：                                           [确定]

再次点击【services】即为刷新页面，查看如下图所示：

时间：2014-12-26

更新一个监控httpd服务的错误

今天在看日志的时候，在nginx的错误日志中发现很多一样的错误日志，起初是因为其它php程序的bug呢，后来跟开发人员讨论，排除了这个问题，于是就到Google上搜索，才知道原来是监控上配置文件的问题？

错误日志截图：

解决办法参考这篇文章：

http://forum.joomla.org/viewtopic.php?t=666220

本文转自zys467754239 51CTO博客，原文链接：http://blog.51cto.com/467754239/1558897，如需转载请自行联系原作者

Nagios利用NRPE监控Linux主机

热门文章

最新文章

相关电子书

探索云世界

热门

云计算

大数据

云原生

人工智能

数据库

开发与运维

活动广场

任务中心

训练营

直播

乘风者计划

下载

镜像站

技术资料

Nagios利用NRPE监控Linux主机

热门文章

最新文章

相关电子书