安装配置nagios服务端:
安装前的准备工作
1
2
3
4
5
|
yum -y install httpd gcc glibc glibc-common gd gd-devel php php-mysql mysql mysql-devel mysql-server
groupadd nagcmd
useradd -G nagcmd nagios
passwd nagios
usermod -a -G nagcmd apache
|
vim /etc/httpd/conf/httpd.conf
DirectoryIndex index.php index.html index.html.var #找到这一行,添加index.php
编译安装nagios:
1
2
3
4
5
6
7
8
9
|
wget http:
//prdownloads.sourceforge.net/sourceforge/nagios/nagios-4.0.5.tar.gz
tar zxf nagios-
4.0
.
5
.tar.gz
cd nagios-
4.0
.
5
./configure --
with
-command-group=nagcmd --enable-event-broker
make all
make install
make install-init
make install-commandmode
make install-config
|
附:在解压的时候如果提示下面的错误,是因为系统时间不对
tar: nagios-4.0.5/xdata/xsddefault.c: time stamp 2014-04-12 02:37:42 is 250653.223481153 s in the future
tar: nagios-4.0.5/xdata/xsddefault.h: time stamp 2014-04-12 02:37:42 is 250653.223419364 s in the future
tar: nagios-4.0.5/xdata: time stamp 2014-04-12 02:37:42 is 250653.223359922 s in the future
修改系统时间
cp /usr/share/zoneinfo/Asia/Shanghai /etc/localtime
service ntpd stop
ntpdate asia.pool.ntp.org ; hwclock -w
需要在httpd的配置文件目录(conf.d)中生成Nagios的Web程序配置文件,继续在此编译安装目录输入一条命令:
# make install-webconf
创建一个登录nagios web程序的用户,这个用户帐号在以后通过web登录nagios认证时所用:
# htpasswd -c /usr/local/nagios/etc/htpasswd.users nagiosadmin #输入登陆nagios的密码
# service httpd restart
编译、安装nagios-plugins
nagios的所有监控工作都是通过插件完成的,因此,在启动nagios之前还需要为其安装官方提供的插件。http://exchange.nagios.org/directory/Plugins nagios插件链接
1
2
3
4
5
6
|
wget http:
//nagios-plugins.org/download/nagios-plugins-2.0.tar.gz
tar zxf nagios-plugins-
2.0
.tar.gz
cd nagios-plugins-
2.0
./configure --
with
-nagios-user=nagios --
with
-nagios-group=nagios
make
make install
|
配置并启动Nagios
# chkconfig --add nagios
# chkconfig nagios on
检查其主配置文件
# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
# service nagios start
关闭selinux,
vim /etc/sysconfig/selinux 将其中的selinux后面的值“force”修改为“disable”或者
临时性的改动selinux也可以
1
2
3
|
# setenforce 0
# getenforce
Permissive
|
或者将nagios的CGI程序运行于SELinux/targeted模式而不用关闭selinux:
# chcon -R -t httpd_sys_content_t /usr/local/nagios/sbin
# chcon -R -t httpd_sys_content_t /usr/local/nagios/share
通过web界面查看nagios:
http://your_nagios_IP/nagios
输入帐号和密码
这时候可以看到本机的服务状态,如果没有出现页面,请检查iptables的80端口是否开放
如果出现下面提示,说明是selinux的问题,setenforce 0即可
1
2
3
4
|
Internal Server Error
The server encountered an internal error or misconfiguration and was unable to complete your request.
Please contact the server administrator, root@localhost and inform them of the
time
the error occurred, and anything you might have
done
that may have caused the error.
More information about this error may be available
in
the server error log.
|
安装NRPE,服务端需要自己的check_nrpe,和被监控端联系
下载页面
1
|
http:
//downloads
.sourceforge.net
/project/nagios/nrpe-2
.x
/nrpe-2
.15
/nrpe-2
.15.
tar
.gz?r=&ts=1363788540&use_mirror=hivelocity
|
1
2
3
4
5
6
7
8
9
10
11
12
|
tar
-zxvf nrpe-2.15.
tar
.gz
cd
nrpe-2.15
.
/configure
--with-nrpe-user=nagios \
--with-nrpe-group=nagios \
--with-nagios-user=nagios \
--with-nagios-group=nagios \
--
enable
-
command
-args \
--
enable
-ssl
make
all
make
install
-plugin
make
install
-daemon
make
install
-daemon-config
|
如果./configure时候出现下面错误
1
2
|
checking
for
SSL headers... SSL headers found
in
/usr/local/ssl
checking
for
SSL libraries... configure: error: Cannot
find
ssl libraries
|
则
1
2
3
|
# find /usr/ -name libssl.so
/usr/local/ssl/lib/libssl
.so
# ./configure --with-nrpe-user=nagios --with-nrpe-group=nagios --with-nagios-user=nagios --with-nagios-group=nagios --enable-command-args --enable-ssl --with-ssl-lib=/usr/local/ssl/lib
|
安装配置被监控端:
yum -y install gcc glibc glibc-common gd gd-devel
或者
yum grouplist #检查 yum -y groupinstall "Development Tools" "Development Libraries"
添加nagios用户
1
2
3
4
5
6
|
useradd -s /sbin/nologin nagios
tar zxf nagios-plugins-
2.0
.tar.gz
cd nagios-plugins-
2.0
./configure --
with
-nagios-user=nagios --
with
-nagios-group=nagios --
with
-mysql=/usr/local/mysql
make all
make install
|
安装NRPE
下载页面
1
|
http:
//downloads.sourceforge.net/project/nagios/nrpe-2.x/nrpe-2.15/nrpe-2.15.tar.gz?r=&ts=1363788540&use_mirror=hivelocity
|
1
2
3
4
5
6
7
8
9
10
11
12
|
tar -zxvf nrpe-
2.15
.tar.gz
cd nrpe-
2.15
./configure --
with
-nrpe-user=nagios \
--
with
-nrpe-group=nagios \
--
with
-nagios-user=nagios \
--
with
-nagios-group=nagios \
--enable-command-args \
--enable-ssl
make all
make install-plugin
make install-daemon
make install-daemon-config
|
注:如果./configure有下面提示,请 yum install openssl-devel
checking for SSL headers... SSL headers found in /usr/local/ssl
checking for SSL libraries... configure: error: Cannot find ssl libraries
配置NRPE
# vim /usr/local/nagios/etc/nrpe.cfg 找到相应的做修改
1
2
3
4
5
6
7
8
9
10
|
log_facility=daemon
pid_file=/
var
/run/nrpe.pid
server_address=
172.16
.
100.11
#本机ip,本机提供nrpe服务,所以自己为服务端,此行需手动添加
server_port=
5666
nrpe_user=nagios
nrpe_group=nagios
allowed_hosts=
172.16
.
100.1
#监控端的ip
command_timeout=
60
connection_timeout=
300
debug=
0
|
启动NRPE
# /usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d
为了便于NRPE服务的启动,可以将如下内容定义为/etc/init.d/nrped脚本,用service nrped start 启动:vim /etc/init.d/nrped
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
|
#!/bin/bash
# chkconfig:
2345
88
12
# description: NRPE DAEMON
NRPE=/usr/local/nagios/bin/nrpe
NRPECONF=/usr/local/nagios/etc/nrpe.cfg
case
"$1"
in
start)
echo -n
"Starting NRPE daemon..."
$NRPE -c $NRPECONF -d
echo
" done."
;;
stop)
echo -n
"Stopping NRPE daemon..."
pkill -u nagios nrpe
echo
" done."
;;
restart)
$
0
stop
sleep
2
$
0
start
;;
*)
echo
"Usage: $0 start|stop|restart"
;;
esac
exit
0
|
chmod +x /etc/init.d/nrped
基于NRPE的监控
配置允许远程主机监控的对象
vim /usr/local/nagios/etc/nrpe.cfg
command[check_users]=/usr/local/nagios/libexec/check_users -w 5 -c 10
command[check_load]=/usr/local/nagios/libexec/check_load -w 15,10,5 -c 30,25,20
command[check_hda1]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /dev/hda1
command[check_zombie_procs]=/usr/local/nagios/libexec/check_procs -w 5 -c 10 -s Z
command[check_total_procs]=/usr/local/nagios/libexec/check_procs -w 150 -c 200
上面是系统自带的,下面是添加的
1
2
3
4
5
6
7
8
9
10
11
12
|
command[check_rootdisk]=/usr/local/nagios/libexec/check_disk -w
20
% -c
10
% -p /
command[check_swap]=/usr/local/nagios/libexec/check_disk -w
40
% -c
20
%
command[check_sensors]=/usr/local/nagios/libexec/check_sensors
command[check_zombies]=/usr/local/nagios/libexec/check_procs -w
5
-c
10
-s Z
command[check_sda1]=/usr/local/nagios/libexec/check_disk -w
20
% -c
10
% -p /dev/sda1
command[check_sda3]=/usr/local/nagios/libexec/check_disk -w
20
% -c
10
% -p /dev/sda3
#对mysql监控:
command[check_mysql]=/usr/local/nagios/libexec/check_mysql -utest -ptest -s /
var
/lib/mysql/mysql.sock -H localhost
#为了监控mysql需要在被监控的mysql加授权
#create database test;
#grant select on test.* to test@localhost identified by
"test"
;
#flush privileges;
|
因为我这里是sda1和sda3,磁盘检测请添加自己的
# service nrped restart
再回到nagios服务端 定义配置
cd /usr/local/nagios/libexec
./check_nrpe -H 172.16.100.11 #测试一下
如果提示CHECK_NRPE: Socket timeout after 10 seconds. 那就把iptables的5666端口打开
/usr/local/nagios/libexec/check_nrpe -H 172.16.100.11 -c check_load
OK - load average: 1.55, 1.31, 1.30|load1=1.550;15.000;30.000;0; load5=1.310;10.000;25.000;0; load15=1.300;5.000;20.000;0;
配置服务端
先定义命令 commands.cfg
vim /usr/local/nagios/etc/objects/commands.cfg
在最后添加
define command{
command_name check_nrpe
command_line $USER1$/check_nrpe -H "$HOSTADDRESS$" -c $ARG1$
}
再定义主机和服务
vim /usr/local/nagios/etc/objects/linuxhost.cfg
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
|
define host{
use
linux-server
host_name my linux server
alias mylinux server
address
172.16
.
100.11
#被监控端的ip
}
define service{
use
generic-service
host_name my linux server
service_description check users
check_command check_nrpe!check_users
}
define service{
use
generic-service
host_name my linux server
service_description check load
check_command check_nrpe!check_load
}
define service{
use
generic-service
host_name my linux server
service_description check zombie proce
check_command check_nrpe!check_zombie_procs
}
define service{
use
generic-service
host_name my linux server
service_description check total proce
check_command check_nrpe!check_total_procs
}
define service{
use
generic-service
host_name my linux server
service_description check rootdisk
check_command check_nrpe!check_rootdisk
}
define service{
use
generic-service
host_name my linux server
service_description check swap
check_command check_nrpe!check_swap
}
define service{
use
generic-service
host_name my linux server
service_description check sda1
check_command check_nrpe!check_sda1
}
define service{
use
generic-service
host_name my linux server
service_description check sda3
check_command check_nrpe!check_sda3
}
define service{
use
generic-service
host_name my linux server
service_description check mysql
check_command check_nrpe!check_mysql
}
|
vim /usr/local/nagios/etc/nagios.cfg #添加刚才的配置文件
cfg_file=/usr/local/nagios/etc/objects/linuxhost.cfg
/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg #检查配置是否正确
service nagios restart
到网页查看nagios
如果出现下面的提示,是说明iptables开启了,不过开启的是udp 5666端口,而不是tcp,这里需要开启tcp的5666
[1416991726] Warning: Return code of 255 for check of service 'check total proce' on host 'my linux server' was out of bounds