Nagios是一款企业级开源软件,专注于监控服务器上服务是否正常,不生成图形,提供报警机制,邮件或者短信发送监控状态,它通过各种插件实现不同的功能。
Nagios 监控平台主程序
Nagios-plugins 必选插件
NRPE 监控远程服务器的主机资源
NSClient++ 用于监控Windows主机
NDOUtils 将数据写入数据库
一、安装RHEL7.2
最小化安装,配置IP,时间同步,本地yum源,安装vim(个人习惯)、bash-completion(命令补齐)
# hostnamectl set-hostname nagios_cacti
# yum install vim
# yum install bash-completion
# yum install chrony
# systemctl enable chronyd
# systemctl start chronyd
# vim /etc/chrony.conf
server 10.100.2.5 iburst //增加一行时间源
# yum install ntpdate
# ntpdate 10.100.2.5 //手动同步时间
配置CentOS 163 yum源
# yum install wget
# wget http://mirrors.163.com/centos/7.2.1511/os/x86_64/Packages/yum-3.4.3-132.el7.centos.0.1.noarch.rpm
# wget http://mirrors.163.com/centos/7.2.1511/os/x86_64/Packages/yum-metadata-parser-1.1.4-10.el7.x86_64.rpm
# wget http://mirrors.163.com/centos/7.2.1511/os/x86_64/Packages/yum-plugin-fastestmirror-1.1.31-34.el7.noarch.rpm
# wget http://mirrors.163.com/.help/CentOS7-Base-163.repo
# rpm -qa|grep yum //检查redhat是否安装了yum,及有哪些Yum包
# rpm -qa|grep yum|xargs rpm -e --nodeps //删除redhat自带的yum包
# rpm -ivh yum-3.4.3-132.el7.centos.0.1.noarch.rpm yum-metadata-parser-1.1.4-10.el7.x86_64.rpm yum-plugin-fastestmirror-1.1.31-34.el7.noarch.rpm
# mv CentOS7-Base-163.repo /etc/yum.repos.d/
# vim /etc/yum.repos.d/CentOS7-Base-163.repo //通过":1,$s/$releasever/7/gc"和":1,$s/$basearch/x86_64/gc"查找和替换文件内容
# yum clean all //清除yum缓存
# yum makecache //重建缓存,以提高搜索软件包速度
# yum update //更新系统(省略)
实例应用:
1 监控快速部署
监控需要安装http php nagios nagios-plugins NRPE软件包
yum install -y gd gd-devel openssl openssl-devel httpd php gcc glibc glib-common make wget
net-snmp
setenforce 0
iptables -F
安装nagios 源码包下载安装
wget http://sourceforge.net/projects/nagios/files/nagios-3.x/nagios-3.5.0/nagios-3.5.0.tar.gz/download
groupadd nagios
useradd -g nagios nagios
tar -zxf nagios-3.5.0.tar.gz -C /usr/src/
cd /usr/src/nagios
./configure --with-nagios-user=nagios --with-nagios-group=nagios
make all
make install
make install-init #安装启动脚本
make install-commandmode #安装与配置目录权限
make install-config #安装配置文件模板
make install-webconf #web监控界面配置
安装nagios-plugins和nrpe
wget http://nchc.dl.sourceforge.net/project/nagiosplug/nagiosplug/1.4.16/nagios-plugins-1.4.16.tar.gz
tar -zxf nagios-plugins-1.4.16.tar.gz -C /usr/src/
cd /usr/src/nagios-plugins-1.4.16
./configure --prefix=/usr/local/nagios/
make && make install
wget wget http://nchc.dl.sourceforge.net/project/nagios/nrpe-2.x/nrpe-2.14/nrpe-2.14.tar.gz
tar -zxf nrpe-2.14.tar.gz -C /usr/src/
cd /usr/src/nrpe-2.14
./configure
make all
make install-plugin
make install-daemon
make install-daemon-config
chown -R nagions.nagions /usr/local/nagios
创建账户信息
htpasswd -c /usr/local/nagions/etc/htpasswd.users tomcat
iptables -I INPUT -p tcp --dport 80 -j ACCEPT
service iptables save
启动服务
service httpd start
/etc/init.d/nagios start
chkconfig httpd on
chkconfig --add nagios
chkconfig nagios on
2 修改配置文件
nagios的配置文件较多,主要位于/usr/local/nagios/etc 下
nagios.conf 主配置文件
nrpe.cfg 远程监控配置文件
cgi.conf CGI配置文件
commands.cfg 命令定义文件
contacts.cfg 定义联系人文件
timepreriods.cfg 时间周期定义文件
tempaltes.cfg 对象定义参考模板
localhost.cfg 监控本机配置模板
printer.cfg 监控打印机模板
switch.cfg 监控交换模板
windows.cfg 监控Windows配置模板
很多配置文件无需修改可以直接使用
修改主配置文件nagios.cfg,主要是用cfg_file配置加载其他配置文件。
vim /usr/local/nagios/etc/nagios.cfg
cfg_file=/usr/local/nagios/etc/objects/commands.cfg
cfg_file=/usr/local/nagios/etc/objects/contacts.cfg
cfg_file=/usr/local/nagios/etc/objects/templates.cfg
cfg_file=/usr/local/nagios/etc/objects/timeperiods.cfg
cfg_file=/usr/local/nagios/etc/objects/localhost.cfg
cfg_file=/usr/local/nagios/etc/web1.cfg
cfg_file=/usr/local/nagios/etc/web2.cfg
修改CGI配置文件cgi.cfg,添加tomcat账户进来
vim /usr/local/nagios/etc/cgi.cfg
default_user_name=tomcat
authorized_for_system_information=nagiosadmin,tomcat
authorized_for_configuration_information=nagiosadmin,tomcat
authorized_for_system_commands=nagiosadmin,tomcat
authorized_for_all_services=nagiosadmin,tomcat
authorized_for_all_hosts=nagiosadmin,tomcat
authorized_for_all_service_commands=nagiosadmin,tomcat
authorized_for_all_host_commands=nagiosadmin,tomcat
修改命令配置文件command.cfg,定义命令实现的方式,如邮件报警,使用工具,内容格式等。
vim /usr/local/nagios/etc/objects/commands.cfg
define command{
command_name check_nrpe
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -t 30 -c $ARG1$
}
define command{
command_name check_nrpe_args
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -t 30 -c $ARG1$ -a $ARG2$
}
修改联系人配置文件contacts.cfg 报警的联系人及联系方式
define contact{
contact_name nagiosadmin
use generic-contact
alias Nagios Admin
email yourname@domain.com
}
修改报警时间周期timeperiods.cfg
vim /usr/local/nagios/etc/objects/timeperiods.cfg
define timeperiods{
timeperiod_name 24x7 #监控所有时间段(7*24小时)
alias 24 Hours A Day, 7 Days A Week
sunday 00:00-24:00
monday 00:00-24:00
tuesday 00:00-24:00
wednesday 00:00-24:00
thursday 00:00-24:00
friday 00:00-24:00
saturday 00:00-24:00
}
修改本机的配置localhost.cfg
define host{
use linux-server
host_name duangr-1
alias duangr-1
address 192.168.56.10
}
define service{
use local-service
host_name duangr-1
service_description Host Alive
check_command check-host-alive
}
define service{
use local-service
host_name duangr-1
service_description Users
check_command check_local_users!20!50
}
define service{
use local-service
host_name duangr-1
service_description CPU
check_command check_local_load!5.0,4.0,3.0!10.0,6.0,4.0
}
define service{
use local-service
host_name duangr-1
service_description Disk Root
check_command check_local_disk!20%!10%!/
}
define service{
use local-service
host_name duangr-1
service_description Disk Home
check_command check_local_disk!20%!10%!/export/home
}
define service{
use local-service
host_name duangr-1
service_description Zombie Procs
check_command check_local_procs!5!10!Z
}
define service{
use local-service
host_name duangr-1
service_description Total Procs
check_command check_local_procs!250!400!RSZDT
}
define service{
use local-service
host_name duangr-1
service_description Swap Usage
check_command check_local_swap!20!10
}
修改模板文件templates.cfg
vi /usr/local/nagios/etc/objects/templates.cfg
#联系人模板generic-contact
define contact{
name generic-contact
service_notification_period 24x7
host_notification_period 24x7
service_notification_options w,u,c,r,f,s
host_notification_options d,u,r,f,s
service_notification_commands notify-service-by-email
host_notification_commands notify-host-by-email
register 0
}
#定义generic-host主机模板
define host{
name generic-host
notifications_enabled 1
event_handler_enabled 1
flap_detection_enabled 1
failure_prediction_enabled 1
process_perf_data 1
retain_status_information 1
retain_nonstatus_information 1
notification_period 24x7
register 0
}
#定义Linux主机模板
define host{
name linux-server
use generic-host
check_period 24x7
check_interval 5
retry_interval 1
max_check_attempts 10
check_command check-host-alive
notification_period workhours
notification_interval 120
notification_options d,u,r
contact_groups admins
register 0
}
创建远程监控web1.cfg
vim /usr/local/nagios/etc/web1.cfg
define host{
use linux-server
host_name duangr-2
alias duangr-2
address 192.168.56.11
}
define service{
use local-service
host_name duangr-2
service_description Host Alive
check_command check-host-alive
}
define service{
use local-service
host_name duangr-2
service_description Users
check_command check_nrpe_args!check_users!5 10
}
define service{
use local-service
host_name duangr-2
service_description CPU
check_command check_nrpe_args!check_load!15,10,5 30,25,20
}
define service{
use local-service
host_name duangr-2
service_description Disk Root
check_command check_nrpe_args!check_disk!20% 10% /
}
define service{
use local-service
host_name duangr-2
service_description Disk /export/home
check_command check_nrpe_args!check_disk!20% 10% /export/home
}
define service{
use local-service
host_name duangr-2
service_description Procs Zombie
check_command check_nrpe_args!check_procs!5 10 Z
}
define service{
use local-service
host_name duangr-2
service_description Procs Total
check_command check_nrpe_args!check_procs_args!"-w400 -c600" }
define service{
use local-service
host_name duangr-2
service_description Swap Usage
check_command check_nrpe_args!check_swap!20% 10%
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; 下面是一些常用进程的监控,主要是云平台相关进程
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; 监控crond进程
define service{
use local-service
host_name duangr-2
service_description PS: crond
check_command check_nrpe_args!check_procs_args!"-c1:1 -Ccrond" }
;; 监控zookeeper进程
define service{
use local-service
host_name duangr-2
service_description PS: QuorumPeerMain
check_command check_nrpe_args!check_procs_args!"-c1:1 -Cjava -aserver.quorum.QuorumPeerMain" }
;;监控storm的从节点进程
define service{
use local-service
host_name duangr-2
service_description PS: supervisor
check_command check_nrpe_args!check_procs_args!"-c1:1 -Cjava -adaemon.supervisor" }
;; 监控storm的主节点进程
define service{
use local-service
host_name duangr-2
service_description PS: nimbus
check_command check_nrpe_args!check_procs_args!"-c1:1 -Cjava -adaemon.nimbus" }
;; 监控MetaQ进程
define service{
use local-service
host_name duangr-2
service_description PS: MetaQ
check_command check_nrpe_args!check_procs_args!"-c1:1 -Cjava -ametamorphosis-server-w" }
;; 监控Redis进程
define service{
use local-service
host_name duangr-2
service_description PS: redis-server
check_command check_nrpe_args!check_procs_args!"-c1:1 -Credis-server" }
;; 监控hadoop主节点NameNode进程
define service{
use local-service
host_name duangr-2
service_description PS: NameNode
check_command check_nrpe_args!check_procs_args!"-c1:1 -Cjava -aserver.namenode.NameNode" }
;; 监控hadoop主节点SecondaryNameNode进程
define service{
use local-service
host_name duangr-2
service_description PS: SecondaryNameNode
check_command check_nrpe_args!check_procs_args!"-c1:1 -Cjava -aserver.namenode.SecondaryNameNode" }
;; 监控hadoop主节点ResourceManager进程
define service{
use local-service
host_name duangr-2
service_description PS: ResourceManager
check_command check_nrpe_args!check_procs_args!"-c1:1 -Cjava -aserver.resourcemanager.ResourceManager" }
;; 监控hadoop从节点DataNode进程
define service{
use local-service
host_name duangr-2
service_description PS: DataNode
check_command check_nrpe_args!check_procs_args!"-c1:1 -Cjava -aserver.datanode.DataNode" }
;;监控hadoop从节点NodeManager进程
define service{
use local-service
host_name duangr-2
service_description PS: NodeManager
check_command check_nrpe_args!check_procs_args!"-c1:1 -Cjava -aserver.nodemanager.NodeManager" }
由于duangr-2是远程主机,因此使用check_nrpe_args命令来监控.
/etc/init.d/nagios restart
快速定位配置文件问题所在命令
/usr/local/nagios/bin/nagios -V /usr/local/nagios/etc/nagios.cfg
3 被监控机安装软件 nagios-plugin nrpe
yum install -y openssl openssl-devel
groupadd nagios
useradd -g nagios -s /sbin/nologin nagios
tar -zxf nagios-plugins-2.1.6.tar.gz -C /usr/src/
cd /usr/src/nagios-plugins-2.1.6
./configure --prefix=/usr/local/nagios/ --with-nagios-user=nagios --with-nagios-group=nagios
make && make install
tar -zxf nrpe-2.14.tar.gz -C /usr/src/
cd /usr/src/nrpe-2.14
./configure
make all
make install-plugin
make install-daemon
make install-daemon-config
修改客户端的NRPE配置文件
command[check_users]=/usr/local/nagios/libexec/check_users -w 5 -c 10
command[check_load]=/usr/local/nagios/libexec/check_load -w 15,10,5 -c 30,25,20
command[check_sda2]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /dev/sda2
command[check_swap]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /dev/shm
command[check_home]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /dev/mapper/VolGroup00-LogVol00
command[check_zombie_procs]=/usr/local/nagios/libexec/check_procs -w 5 -c 10 -s Z
command[check_total_procs]=/usr/local/nagios/libexec/check_procs -w 200 -c 300
command[check_ping81]=/usr/local/nagios/libexec/check_ping -H 10.155.0.1 -w 100.0,20% -c 500.0,60%#
command[check_hda1]=/usr/local/nagios/libexec/check_disk -w 20 -c 10 -p /dev/hda1
/usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d
echo "/usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d" >> /etc/rc.local
netstat -lnupt |grep 5666
iptables -I INPUT -p tcp --dport 5666 -j ACCEPT
service iptables save
检查监控命令配置是否ok
/usr/local/nagios/libexec/check_nrpe -H localhost -c check_users -a 5 10
/usr/local/nagios/libexec/check_nrpe -H localhost -c check_load -a 15,10,5 30,25,20
/usr/local/nagios/libexec/check_nrpe -H localhost -c check_disk -a 20% 10% /
/usr/local/nagios/libexec/check_nrpe -H localhost -c check_procs -a 200 400 RSZDT
/usr/local/nagios/libexec/check_nrpe -H localhost -c check_swap -a 20% 10%
没有问题就可以用浏览器访问nagios了
二、安装Nagios
1、下载软件包并安装Nagios
Nagios-4.2.1:
http://nchc.dl.sourceforge.net/project/nagios/nagios-4.x/nagios-4.2.1/nagios-4.2.1.tar.gz
Nagios-plugins-2.1.3:
https://nagios-plugins.org/download/nagios-plugins-2.1.3.tar.gz
NRPE-3.0.1:
https://codeload.github.com/NagiosEnterprises/nrpe/tar.gz/3.0.1
官方安装文档:Nagios QuickstartInstallation Guides
https://assets.nagios.com/downloads/nagioscore/docs/nagioscore/4/en/quickstart.html
# yum install httpd php gcc glibc glibc-common gd gd-devel
# yum install unzip //编译所需,否则会报错。
# useradd -M -s /sbin/nologin nagios
# usermod -aG nagios apache
# tar zxvf nagios-4.2.1.tar.gz
# cd nagios-4.2.1/
# make all
# make install
# make install-init
# make install-config
# make install-commandmode
# make install-webconf
# vim /usr/local/nagios/etc/objects/contacts.cfg
email xxx@xxx.com //修改nagios警告信息的邮件地址
# htpasswd -c /usr/local/nagios/etc/htpasswd.users nagios //配置登录账号和密码
2、安装nagios-plugins插件
# tar zxvfnagios-plugins-2.1.3.tar.gz
# cd nagios-plugins-2.1.3/
# ./configure --with-nagios-user=nagios --with-nagios-group=nagios
# make
# make install
# chown -R nagios.nagios/usr/local/nagios/
# systemctl enable httpd
# systemctl start httpd
# systemctl enable nagios
# systemctl start nagios
# /etc/init.d/nagios checkconfig //检查nagios配置文件是否有错误,或使用以下命令检查:
# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
# firewall-cmd --zone=public --add-service=http –permanent
# firewall-cmd –reload
# systemctl restart firewalld
使用http://10.100.2.158/nagios登录控制台,输入配置的账号密码即可登录。
注:如果web管理员不是使用默认的nagiosadmin,需要修改cgi.cfg
# vim /usr/local/nagios/etc/cgi.cfg
//把所有的nagiosadmin改为自定义的用户名,否则查看Services时会提示权限不够。
默认HTTP会有告警信息,解决办法:在/var/www/html目录新建一个空白index.html文件即可。
# touch /var/www/html/index.html
重启nagios和httpd服务,等待几分钟即恢复正常。
3、安装NRPE插件
# tar zxvf nrpe-3.0.1.tar.gz
# cd nrpe-3.0.1/
# yum install openssl-devel //解决checking for SSL headers... configure: error: Cannotfind ssl headers错误问题
# ./configure --with-nrpe-user=nagios --with-nrpe-group=nagios --with-nagios-user=nagios --with-nagios-group=nagios --enable-command-args --enable-ssl
# make all //编译和安装nrpe
# make install-plugin
# make install-daemon
# make install-config //注:nrpe3.0以下请使用# make install-daemon-config
# /usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg –d //启动nrpe服务
# yum install net-tools
# netstat –tnpl //可以看到5666端口已处于监听状态,说明nrpe服务已启动
# echo “/usr/local/nagios/bin/nrpe-c /usr/local/nagios/etc/nrpe.cfg –d” >> /etc/rc.local
# chmod +x /etc/rc.d/rc.local //设置开机自启动,手动重启方法如下:
# pkill nrpe && /usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg –d
# vim /usr/local/nagios/etc/objects/commands.cfg //末尾增加以下内容
define command{
command_name check_nrpe
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$
}
//允许check_nrpe命令定义nagios服务,-c后面带的$ARG1$参数是传给nrpe daemon执行的检测命令,它必须是nrpe.cfg中所定义的命令。
//自定义的Servers下的cfg配置文件中使用check_nrpe的时候要用”!”带上这个参数。
//可通过# /usr/local/nagios/libexec/check_nrpe –h查看插件的命令参数。
# mkdir /usr/local/nagios/etc/servers //创建servers监控配置文件集中存储目录
# vim /usr/local/nagios/etc/nagios.cfg //修改配置文件
cfg_dir=/usr/local/nagios/etc/servers //启用此规则,即默认读取处理此目录下的配置文件
4、添加客户端(Client被监控端)
1>、客户端安装NRPE和插件nagios-plugins
下载所需软件包
nagios-plugins-2.1.3.tar.gz
nrpe-3.0.1.tar.gz
新建用户
# useradd –M –s /sbin/nologinnagios
先安装nagios-plugins(NRPE依赖于nagios-plugins)
# tar zxvf nagios-plugins-2.1.3.tar.gz
# cd nagios-plugins-2.1.3
# ./configure--with-nagios-user=nagios --with-nagios-group=nagios
# make all
# make install
再安装NRPE
# yum install openssl-devel
# tar zxvf nrpe-3.0.1.tar.gz
# cd nrpe-3.0.1
# ./configure --with-nrpe-user=nagios --with-nrpe-group=nagios --with-nagios-user=nagios --with-nagios-group=nagios --enable-command-args --enable-ssl
# make all
# make install-plugin
# make install-daemon
# make install-config
# ls /usr/local/nagios/libexec/ //查看安装成功的NRPE插件,有check_nrpe说明安装成功
# vim /usr/local/nagios/etc/nrpe.cfg //配置nrpe
allowed_hosts=127.0.0.1,10.100.2.158 //添加服务端IP
dont_blame_nrpe=1 //把0改为1,允许命令参数
#/usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg –d //启动nrpe服务
为了便于NRPE服务的启动,可以定义一个/etc/init.d/nrpe脚本
# vim /etc/init.d/nrpe //输入以下内容:
#!/bin/bash
# chkconfig: 2345 88 12
# description: NRPE DAEMON
NRPE=/usr/local/nagios/bin/nrpe
NRPECONF=/usr/local/nagios/etc/nrpe.cfg
case "$1" in
start)
echo -n "Starting NRPE daemon..."
$NRPE -c $NRPECONF -d
echo " done."
;;
stop)
echo -n "Stopping NRPE daemon..."
pkill -u nagios nrpe
echo " done."
;;
restart)
$0 stop
sleep 2
$0 start
;;
*)
echo "Usage: $0start|stop|restart"
;;
esac
exit 0
# chmod a+x /etc/init.d/nrpe //赋予脚本执行权限,即可以通过systemctl或service执行启动,停止了。
# service nrpe start //启动nrpe
#echo “/usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d” >> /etc/rc.local
或# chkconfig nrpe on //设置为开机自启动
# netstat –tnlp //查看5666端口是否成功启动
测试监控主机和被监控设备之间的连通性(Server上):
#/usr/local/nagios/libexec/check_nrpe -H 10.100.2.200
NRPE v3.0.1 //通信成功
2>、Server监控端创建Client被监控端配置文件
# vim /usr/local/nagios/etc/servers/test.cfg //监控主机上新建Client端配置文件
define host{
use linux-server
host_name commission
alias commission
address 10.100.2.200
max_check_attempts 5
check_period 24x7
notification_interval 30
notification_period 24x7
}
define service{
use generic-service
host_name commission
service_description PING
check_command check_ping!100.0,20%!500.0,60%
}
define service{
use generic-service
host_name commission
service_description SSH
check_command check_ssh
notifications_enabled 0 ;disable notification
}
define service{
use generic-service
host_name commission
service_description CPU
check_command check_nrpe!check_cpu
notifications_enabled 1
}
define service{
use generic-service
host_name commission
service_description Physical Memory
check_command check_nrpe!check_mem
notifications_enabled 1
}
//可以以templates.cfg模板进行修改
关于check_cpu和check_mem自定义插件的使用方法(插件见附件):
2.1从官网下载需要的插件,注意修改+x执行权限和属性
2.2修改Client端配置:修改nrpe.cfg,增加以下内容
command[check_mem]=/usr/local/nagios/libexec/check_mem -w 10 -c 5
command[check_cpu]=/usr/local/nagios/libexec/check_cpu -w 80 -c 90
2.3重启nrpe服务
# pkill nrpe&&/usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d
2.4修改Server端配置:修改test.cfg,在define service中定义check_command
check_command check_nrpe!check_mem
check_command check_nrpe!check_cpu
3>、利用NSClicent++监控远程Windows系统
下载插件包NSCP-0.4.4.19-x64.msi
在Windows客户端安装插件包:
查看服务是否启动,勾选登录中的“允许服务与桌面交互”
安装完成查看启动的端口,5666是nrpe,12489是NSClient++
在监控主机的commands.cfg配置文件中修改以下部分,添加-s 密码:
# vim /usr/local/nagios/etc/objects/commands.cfg
# 'check_nt' command definition
define command{
command_name check_nt
command_line $USER1$/check_nt -H $HOSTADDRESS$ -p 12489 -s 123456 -v $ARG1$ $ARG2$
}
添加监控客户端:
# vim /usr/local/nagios/etc/nagios.cfg
cfg_file=/usr/local/nagios/etc/objects/windows.cfg //启用windows监控,如果有添加启用cfg_dir=/usr/local/nagios/etc/servers目录则需要注释掉windows.cfg,否则会有冲突
以windows.cfg为模板,添加新的windows服务器
# cp /usr/local/nagios/etc/objects/windows.cfg /usr/local/nagios/etc/servers/wintest.cfg
//修改配置中的host_name,IP地址等。
# /usr/local/nagios/libexec/check_nt -H 10.100.2.189 -p 12489 -s 123456 -v UPTIME
//测试客户端连通性(注意有特殊符号需要单引号),以下信息表示连接正常。
System Uptime - 20 day(s) 4 hour(s)11 minute(s) |uptime=29051
# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg //测试配置
# systemctl restart nagios //重启nagios服务
默认check_nt!MEMUSE!-w 80 –c 90监控的是物理内存和虚拟内存的总和,单独监控物理内存方法:
1) 修改Client的nsclient.ini文件三个选项:
[/settings/NRPE/server]下的insecure = true、verify mode = none、allow arguments = true
修改完成后,通过# /usr/local/nagios/libexec/check_nrpe -H 10.100.2.189测试连通性,
I (0.4.4.19 2015-12-08) seem to bedoing fine...表示连接正常,如果提示
CHECK_NRPE: Error - Couldnot complete SSL handshake.则表示未修改正确。
查看监控显示结果:
# /usr/local/nagios/libexec/check_nrpe -H 10.100.2.189 -p 5666 -c CheckMEM -a MaxWarn=80% MaxCrit=90% type=physicalShowAll
2) 修改Server的commands.cfg文件,定义物理内存监控服务
define command{
command_name check_winmem
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -p 5666-c CheckMEM -a MaxWarn=$ARG1$% MaxCrit=$ARG2$% ShowAll=long type=physical
}
3) 修改Server的客户端配置文件xenapp.cfg,定义监控内容
define service{
use generic-service
host_name xenapp
service_description PhysicalMemory
check_command check_winmem!80!90
}
4) 检测配置文件是否有错误:# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
5) 重启nagios服务:# systemctl restart nagios
三、安装Cacti
1、下载软件包并安装
Cacti-0.8.8h:
http://www.cacti.net/downloads/cacti-0.8.8h.tar.gz
Cacti-spine-0.8.8h:
http://www.cacti.net/downloads/spine/cacti-spine-0.8.8h.tar.gz
官方安装手册:
http://docs.cacti.net/manual:088:1_installation.1_install_unix
配置安装环境:
#yum install httpd php php-mysql php-snmp php-xml mariadb mariadb-server
2、安装RRDtool工具
# yum install rrdtool
# rrdtool -h
3、安装SNMP服务
# yum install net-snmp net-snmp-utils
# systemctl enable snmpd
# systemctl start snmpd
4、安装cacti-spine(高效采集器)
# yum install net-snmp-devel mariadb-devel openssl-devel
# yum install autoconf automake binutils dos2unix gcc cpplibtool glibc-devel glibc-headers kernel-headers
# yum install wget patch
# tar zxvf cacti-spine-0.8.8h.tar.gz
# cd cacti-spine-0.8.8h/
# aclocal
# libtoolize –force
# autoheader
# autoconf
# automake
# ./configure
# make
# make install
# cp /usr/local/spine/bin/spine /usr/bin/spine
# cp /usr/local/spine/etc/spine.conf.dist /etc/spine.conf
# chown nagios.nagios /etc/spine.conf
# vim /etc/spine.conf
DB_Host localhost
DB_Database cacti
DB_User cactiuser
DB_Pass 123456
DB_Port 3306
# /usr/bin/spine //执行检查是否有错,安装完cacti后再执行
5、创建cacti数据库
启动数据库:
# systemctl enable mariadb
# systemctl start mariadb
# mysqladmin -uroot password 'rootpasswd' //Mariadb默认密码为空,先设置密码
# mysql -uroot -p //使用root权限账号登录
create database cacti; //创建数据库
grant all on cacti.* to cactiuser@'localhost' identified by '123456'; //授于本地登录权限
6、安装cacti程序
# tar zxvf cacti-0.8.8h.tar.gz
# mv cacti-0.8.8h /var/www/html/cacti
# mysql -u cactiuser -p cacti</var/www/html/cacti/cacti.sql //导入数据库
# chmod -R 777 /var/www/html/cacti/rra //授于rra和log文件夹777权限
# chmod -R 777 /var/www/html/cacti/log
# /usr/bin/spine //显示以下内容表示连接正常
SPINE:Using spine config file [/etc/spine.conf]
SPINE:Version 0.8.8h starting
SPINE:Time: 0.0455 s, Threads: 5, Hosts: 2
7、修改cacti全局配置文件
# vim /var/www/html/cacti/include/config.php
修改默认数据库名及连接数据库的用户名和密码
$database_default= "cacti";
$database_username= "cactiuser";
$database_password= "123456";
修改cacti系统时区,否则php会有告警日志信息
# vim /var/www/html/cacti/include/global.php //增加一行
date_default_timezone_set('Asia/Shanghai');
8、添加RRDtool抓图任务计划
# crontab -e
输入以下任务计划:
*/5 * * ** /usr/bin/php /var/www/html/cacti/poller.php >> /tmp/cacti_rrdtool.log2>&1
9、配置SELinux
测试php模块是否正常,http://10.100.2.158/phpinfo.php
#vim phpinfo.php //在html首页目录下
<?php phpinfo();?> //测试完成后删除文件
-------------------------------------------------------------------------------------------
测试Mysql数据库的连接是否正常,http://10.100.2.158/mysqltest.php
#vim mysqltest.php //html首页目录下,名称随意起
<?php $link=mysql_connect(‘127.0.0.1’,’cactiuser’,’123456’);
If($link) echo “connect success!”;
else echo “connect fail!”;?>
测试数据库连接性前,需要修改sebool值:在SELinux启用情况下,php连接mysql测试会失败
#getsebool -a |grep httpd_can_network_connect //查看httpd进程连接模式,默认为off
#setsebool -P httpd_can_network_connect=1 //启用连接后即可测试正常
------------------------------------------------------------------------------------------
配置SELinux上下文,否则访问cacti时会提示禁止访问:
# yum install policycoreutils-python //安装semanage工具,默认未安装
# ls -Zd cacti/ //查看当前cacti目录的上下文,为admin_home_t
# semanage fcontext -a -t httpd_sys_content_t '/var/www/html/cacti(/.*)?' //定义cacti目录的上下文规则
# restorecon -RFvv cacti/ //更改cacti目录的上下文
修改完成后重启httpd
# vim /usr/local/nagios/etc/objects/commands.cfg
# 'check_nt' command definition
define command{
command_name check_nt
command_line $USER1$/check_nt -H $HOSTADDRESS$ -p 12489 -s 123456 -v $ARG1$ $ARG2$
}
添加监控客户端:
# vim /usr/local/nagios/etc/nagios.cfg
cfg_file=/usr/local/nagios/etc/objects/windows.cfg //启用windows监控,如果有添加启用cfg_dir=/usr/local/nagios/etc/servers目录则需要注释掉windows.cfg,否则会有冲突
以windows.cfg为模板,添加新的windows服务器
# cp /usr/local/nagios/etc/objects/windows.cfg /usr/local/nagios/etc/servers/wintest.cfg
//修改配置中的host_name,IP地址等。
# /usr/local/nagios/libexec/check_nt -H 10.100.2.189 -p 12489 -s 123456 -v UPTIME
//测试客户端连通性(注意有特殊符号需要单引号),以下信息表示连接正常。
System Uptime - 20 day(s) 4 hour(s)11 minute(s) |uptime=29051
# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg //测试配置
# systemctl restart nagios //重启nagios服务
默认check_nt!MEMUSE!-w 80 –c 90监控的是物理内存和虚拟内存的总和,单独监控物理内存方法:
1) 修改Client的nsclient.ini文件三个选项:
[/settings/NRPE/server]下的insecure = true、verify mode = none、allow arguments = true
修改完成后,通过# /usr/local/nagios/libexec/check_nrpe -H 10.100.2.189测试连通性,
I (0.4.4.19 2015-12-08) seem to bedoing fine...表示连接正常,如果提示
CHECK_NRPE: Error - Couldnot complete SSL handshake.则表示未修改正确。
查看监控显示结果:
# /usr/local/nagios/libexec/check_nrpe -H 10.100.2.189 -p 5666 -c CheckMEM -a MaxWarn=80% MaxCrit=90% type=physicalShowAll
2) 修改Server的commands.cfg文件,定义物理内存监控服务
define command{
command_name check_winmem
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -p 5666-c CheckMEM -a MaxWarn=$ARG1$% MaxCrit=$ARG2$% ShowAll=long type=physical
}
3) 修改Server的客户端配置文件xenapp.cfg,定义监控内容
define service{
use generic-service
host_name xenapp
service_description PhysicalMemory
check_command check_winmem!80!90
}
4) 检测配置文件是否有错误:# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
5) 重启nagios服务:# systemctl restart nagios
四、Nagios和Cacti整合
1、安装Ndoutils
Ndoutils是Nagios的一个插件,可以用来把nagios获取的数据导入mysql数据库中,也可以实现与cacti的插件NPC的集成。.
注:Nagios4.2.1至少需要Ndoutils2.1.0及以上版本,否则不兼容
下载Ndoutils
http://nchc.dl.sourceforge.net/project/nagios/ndoutils-2.x/ndoutils-2.1.1/ndoutils-2.1.1.tar.gz
安装Ndoutils:
# tar zxvf ndoutils-2.1.1.tar.gz
# cd ndoutils-2.1.1/
# ./configure --prefix=/usr/local/nagios/ --enable-mysql --with-ndo2db-user=nagios --with-ndo2db-group=nagios
# make all
# make install
准备配置文件:
# cd db
# ./installdb -ucactiuser -p123456 -h localhost -d cacti
//导入mysql.sql,可省略,cacti的npc插件会自动生成相关数据库表。
# cd ..
# cp src/{ndomod-4x.o,ndo2db-4x,log2ndo,file2sock} /usr/local/nagios/bin/
//nagios是4.x的版本就使用-4x,如果是3.x版本则复制对应的-3x。
# cp config/ndo2db.cfg-sample /usr/local/nagios/etc/ndo2db.cfg
# cp config/ndomod.cfg-sample /usr/local/nagios/etc/ndomod.cfg
# cd /usr/local/nagios/etc
# chown nagios.nagiosndo2db.cfg ndomod.cfg
# chmod 664 ndo2db.cfgndomod.cfg
# cd /usr/local/nagios/bin/
# mv ndo2db-4x ndo2db
# mv ndomod-4x.o ndomod.o
# chown nagios.nagios *
修改配置文件:
# vim /usr/local/nagios/etc/ndo2db.cfg
socket_type=tcp
tcp_port=5668
db_servertype=mysql
db_host=localhost
db_port=3306
db_name=cacti
db_prefix=npc_
db_user=cactiuser
db_pass=123456 //密码不能加引号
debug_level=1
# vim /usr/local/nagios/etc/ndomod.cfg
output_type=tcpsocket
output=127.0.0.1
# vim/usr/local/nagios/etc/nagios.cfg
在末尾添加以下内容:
broker_module=/usr/local/nagios/bin/ndomod.o config_file=/usr/local/nagios/etc/ndomod.cfg
// broker_module和config_file放在一行,中间空格隔开。
修改以下选项:
event_broker_options=-1 //默认选项,无需修改。为nagios开启event broker
process_performance_data=1
启动守护进程和nagios:
# /usr/local/nagios/bin/ndo2db -c /usr/local/nagios/etc/ndo2db.cfg
# netstat –tnlp //可以查看5668端口处于监听状态
tcp 0 0 0.0.0.0:5668 0.0.0.0:* LISTEN 30411/ndo2db
# tail -20 /var/log/messages //查看是否有报错
新建启动脚本和服务,以便开机自动启动:
# vim /etc/rc.d/init.d/ndo2db //新建脚本,如下:
#!/bin/bash
#Start ndo2db service
/usr/local/nagios/bin/ndo2db -c /usr/local/nagios/etc/ndo2db.cfg
# chmod +x /etc/rc.d/init.d/ndo2db
# ln -s /etc/rc.d/init.d/ndo2db /etc/rc.d/rc2.d/S98ndo2db
# ln -s /etc/rc.d/init.d/ndo2db /etc/rc.d/rc3.d/S98ndo2db
# ln -s /etc/rc.d/init.d/ndo2db /etc/rc.d/rc5.d/S98ndo2db
//设置235运行级别自动运行(启动顺序98,nagios为99)
如果日志/var/log/messages有以下信息
错误:Sep 23 13:38:10nagios_cacti nagios: ndomod: Still unable to connect to data sink. 0 items lost, 368 queued items to flush.
原因:Ndoutils需要先启动才能再启动Nagios,否则会报错。
解决办法:
# systemctl stop nagios
# pkill ndo2db
#/usr/local/nagios/bin/ndo2db -c /usr/local/nagios/etc/ndo2db.cfg
# systemctl start nagios
2、安装Npc插件
全称Nagios Plugin for Cacti,将nagios的数据通过ndo2db导入到mysql数据库(前面设置的npc_开头的表),然后cacti读取数据库信息将nagios的结果通过NPC展示出来。
下载插件:NPC
http://jaist.dl.sourceforge.net/project/gibtmirdas/npc-2.0.4.tar.gz
# tar zxvf npc-2.0.4.tar.gz
# mv npc /var/www/html/cacti/plugins/
# chown -R nagios.nagios /var/www/html/cacti/plugins/
# cd /var/www/html/cacti/plugins/
配置SELinux上下文,否则在插件管理页面会显示no plugins found:
# ls -Zd npc/ //查看当前npc上下文,为admin_home_t
# semanage fcontext -a -t httpd_sys_content_t '/var/www/html/cacti/plugins/npc(/.*)?' //定义npc上下文规则
# restorecon -RFvv npc/ //更改npc的上下文
登录cacti控制台,打开Plugin Management页面,找到Npc插件,点击install Plugin->enable Plugin,此时graphs旁边就有了npc选项了
修改npc配置:Configuration->Settings->NPC,如下图:
勾选Remote Commands,输入Nagios Command File Path[/usr/local/nagios/var/rw/nagios.cmd],Nagios URL[http://10.100.2.158/nagios/],勾选Host Icons、Service Icons,Save保存。
重启相关服务:
# systemctl stop nagios
# systemctl restartmariadb
# systemctl restart httpd
# /usr/local/nagios/bin/ndo2db -c /usr/local/nagios//etc/ndo2db.cfg
# systemctl start nagios
# tail -20/var/log/messages //此时查看日志会有错误信息
修改mysql数据库表:
# mysql -ucactiuser -p
use cacti;
Nagios4.0及以上版本sql脚本(蓝色字体为4.0以下脚本):
CREATE TABLE IF NOT EXISTS `npc_service_parentservices` (
`service_parentservice_id` int(11) NOT NULL auto_increment,
`instance_id` smallint(6) NOT NULL default '0',
`service_id` int(11) NOT NULL default '0',
`parent_service_object_id` int(11) NOT NULL default '0',
PRIMARY KEY (`service_parentservice_id`),
UNIQUE KEY `instance_id` (`service_id`,`parent_service_object_id`)
) ENGINE=MyISAM COMMENT='Parent services';
ALTER TABLE `npc_hostchecks` ADD COLUMN `long_output` varchar(8192) NOT NULL default '' AFTER `output`;
ALTER TABLE `npc_hoststatus` ADD COLUMN `long_output` varchar(8192) NOT NULL default '' AFTER `output`;
ALTER TABLE `npc_servicechecks` ADD COLUMN `long_output` varchar(8192) NOT NULL default '' AFTER `output`;
ALTER TABLE `npc_servicestatus` ADD COLUMN `long_output` varchar(8192) NOT NULL default '' AFTER `output`;
ALTER TABLE `npc_statehistory` ADD COLUMN `long_output` varchar(8192) NOT NULL default '' AFTER `output`;
ALTER TABLE `npc_eventhandlers` ADD COLUMN `long_output` varchar(8192) NOT NULL default '' AFTER `output`;
ALTER TABLE `npc_systemcommands` ADD COLUMN `long_output` varchar(8192) NOT NULL default '' AFTER `output`;
ALTER TABLE `npc_notifications` ADD COLUMN `long_output` varchar(8192) NOT NULL default '' AFTER `output`;
ALTER TABLE `npc_services` ADD COLUMN `importance` varchar(8192) NOT NULL default '' AFTER `icon_image_alt`;
ALTER TABLE `npc_hosts` ADD COLUMN `importance` varchar(8192) NOT NULL default '' AFTER `z_3d`;
ALTER TABLE `npc_contacts` ADD COLUMN `minimum_importance` varchar(8192) NOT NULL default '' AFTER `notify_host_downtime`;
3、安装Settings、Thold、Monitor、realtime等插件
下载地址:http://docs.cacti.net/plugins
解压缩后,移动到plugins目录,例如:
# tar zxvf settings-v0.71-1.tgz
# mv settings /var/www/html/cacti/plugins/ //其他插件类似
# chown -R nagios.nagios /var/www/html/cacti/plugins
# restorecon -RFvv /var/www/html/cacti/plugins
打开Plugin Management页面,选择相关插件Install Plugint->Enable Plugin
插件配置和使用:
1、 使用monitor插件
a. 打开“Console → Settings → Misc”,可以调整Monitor的各项配置。
例如:勾选“Show Icon Legend”可以在监控页面显示图例,“View”可以选用Tiles类型,以显示设备状态表格。
b. 为cacti添加新设备时,勾选上“Monitor Host”项。对已添加的设备可以通过“Management → Devices”进去修改。
c. 单击Web页面上方的“monitor”标签链接,可以进入查看各设备/主机的状态图示。
2、 使用realtime插件
安装完realtime,在每一个监控图边上,都会有一个小图标,此时点击这个小图标,不会出来实时的数据,会报“The Image Cache Directorydirectory does not exist. Please first create it and set permissions and thenattempt to open another realtime graph”。
提示出没有Cache目录以及权限等的错误,需要进一步配置后才能取到数据。
a. 设置Cache目录及权限
# cd /var/www/html/cacti/
# mkdir cache
# chown –R nagios.nagios cache
# chmod –R 777 cache
# semanage fcontext -a -t httpd_cache_t '/var/www/html/cacti/cache(/.*)?'
# restorecon -RFvv cache/
b. 登录cacti网页
打开“Console →Settings → Misc”,设置“Cache Diredtiory”为“/var/www/html/cacti/cache”,保存后会出现 [OK: DIRFOUND]
此时再点击上图的小图标就会出来数据了,如果启用SELinux,注意修改设置否则提示无写入权限。
3、 使用thold插件
使用thold之前先要配置email参数
a. 配置email参数
# yum install mailx,sendmail
# systemctl mask postfix
# systemctl enable sendmail
# systemctl start sendmail
b. 设置thold模板
c. 创建通知列表(可省略)
d. 创建告警模板
e. 告警的项
4、绘制Nagios监控图
基本流程:定义数据输入方法-定义数据模版-定义绘图模版
打开需要绘图服务的详细信息:
点击“Data Input Method”,选择Yes往cacti里引入一条数据输入脚本
此时在Cacti主控制台的Data Input Methods页面就有了一条新的记录
下面Output Fields栏显示的是这个输入方法定义的输出字段名,下面要定义的数据模版就要引用这些字段(没有这些字段可以参考Unix – Get Load Average新增)
在Console控制台的Data Templates页面,新建一个数据模板(可以通过已有模板复制),如下:
输入新建模板的名称,Data Input Method改为上面新增的NPC - Perfdata - commission: CPU,勾选上Hourly(1 Minute Average),Step值改为60。另外注意检查Output Field的值和Data Source Item名称是否相对应。如下图:
在Console的Graph Templates页面,新建一个绘图模板,方法同上,可以选择一条类似的模板复制生成,然后再修改。
中间Graph Item Inputs有三个数据源,这是绘图的三个输入数据项名称。每个数据项又是引用的上面Graph Item中的某项。如下图:
分别打开Item # 1至# 6,修改Data Source为上面新建的相对应时间的数据源,如下:
新增设备:同Cacti添加设备操作,在Devices页面,点击ADD添加
再在Graph Trees添加主机
然后在graphs控制台就可以看到新增加的绘制的图像了
把绘图添加到Graph Trees几分钟后就可以看到图表了,如果没有,检查下rra目录权限是否正确。或者直接用npc的那个脚本看能不能获取数据,在Data Input Methods里点开自己定义的数据输入方法,可以查到npc里自己这个服务的编号。然后用php -q 路径/perfdata.php --type=service --id=服务编号,看能不能获取到数据。如下:
整合完成后在NPC里查看监控图时显示不了,查看图片的URL地址发现路径不对,如下:http://10.100.2.158/graph_image.php?action=view&local_graph_id=17&rra_id=1
正确路径为:/cacti/graph_image.php
解决办法:修改/var/www/html/cacti/plugins/npc/目录下的以下js文件
js/src/monitoring/services/serviceDetail.js
js/src/monitoring/services/services.js
js/src/monitoring/hosts/hosts.js
js/src/monitoring/hosts/hostDetail.js
js/src/npc.js
js/npc-all-min.js
把/graph_image.php?这一串修改为:/cacti/graph_image.php?即添加/cacti/目录即可
可使用以下查找替换命令:
:1,$s/\/graph_image.php?/\/cacti\/graph_image.php?/gc
修改完成后清空浏览器缓存,再重新打开就可以看到图像了。