最近尝试了一下nagios调用check_oracle_health插件,感觉还不错,不过貌似这个插件我安装后监控的表空间好像比从数据库上查看的表空间少了一半,我让DBA查了被监控机上面总共有70多个表空间,结果监控出来的只有30多个,貌似这个插件从我这里看是从i字母开头的开始监控的,所以i字母以前的表空间都没有被监控。不知道是插件写的问题,还是我配置的问题,这里写下大致步骤:监控的画面如下图:
环境:192.168.1.1(监控机)
192.168.1.2(被监控机)上面跑着oracle数据库。
1、查看被监控是否安装了perl?并且被监控机安装DBI
输入perl -v,出现以下信息则说明已安装
This is perl, v5.8.8 built for x86_64-linux-thread-multi
Copyright 1987-2006, Larry Wall
Perl may be copied only under the terms of either the Artistic License or the
GNU General Public License, which may be found in the Perl 5 source kit.
GNU General Public License, which may be found in the Perl 5 source kit.
Complete documentation for Perl, including FAQ lists, should be found on
this system using "man perl" or "perldoc perl". If you have access to the
Internet, point your browser at http://www.perl.org/, the Perl Home Page.
this system using "man perl" or "perldoc perl". If you have access to the
Internet, point your browser at http://www.perl.org/, the Perl Home Page.
下载DBI
tar zxvf DBI-1.609.tar.gz
cd DBI-1.609
perl Makefile.PL
make all
make install
2、没有报错我们进行下一步安装DBD-Oracle
cd DBI-1.609
perl Makefile.PL
make all
make install
2、没有报错我们进行下一步安装DBD-Oracle
tar zxvf DBD-Oracle-1.52.tar.gz
cd DBD-Oracle-1.52
perl Makefile.PL
cd DBD-Oracle-1.52
perl Makefile.PL
执行上述命令你肯定会遇到如下错误:
Using DBI 1.605 (for perl 5.008005 on i386-linux-thread-multi) installed in /usr/lib/perl5/site_perl/5.8.5/i386-linux-thread-multi/auto/DBI/
Configuring DBD::Oracle for perl 5.008005 on linux (i386-linux-thread-multi)
Remember to actually *READ* the README file! Especially if you have any problems.
Trying to find an ORACLE_HOME
Your LD_LIBRARY_PATH env var is set to ''
The ORACLE_HOME environment variable is not set and I couldn't guess it.
It must be set to hold the path to an Oracle installation directory
on this machine (or a machine with a compatible architecture).
See the appropriate README file for your OS for more information.
ABORTED!
Configuring DBD::Oracle for perl 5.008005 on linux (i386-linux-thread-multi)
Remember to actually *READ* the README file! Especially if you have any problems.
Trying to find an ORACLE_HOME
Your LD_LIBRARY_PATH env var is set to ''
The ORACLE_HOME environment variable is not set and I couldn't guess it.
It must be set to hold the path to an Oracle installation directory
on this machine (or a machine with a compatible architecture).
See the appropriate README file for your OS for more information.
ABORTED!
然后你需要设置你的临时ORACLE_HOME变量,参考你的oracle用户的环境变量,贴上下面的语句:
export ORACLE_HOME=/u01/app/oracle/product/10.2.0/db_1
再执行perl Makefile.PL就OK了
make
make install
make
make install
3、被监控机最后一步开始安装主角了,check_oracle_health
wget
http://labs.consol.de/wp-content/uploads/2009/09/check_oracle_health-1.6.3.tar.gz
tar zxvf check_oracle_health-1.6.3.tar.gz
cd check_oracle_health-1.6.3
./configure --prefix=/usr/local/nagios --with-nagios-user=nagios --with-nagios-group=nagios --with-mymodules-dir=/usr/local/nagios/libexec --with-mymodules-dyndir=/usr/local/nagios/libexec
make all
make install
tar zxvf check_oracle_health-1.6.3.tar.gz
cd check_oracle_health-1.6.3
./configure --prefix=/usr/local/nagios --with-nagios-user=nagios --with-nagios-group=nagios --with-mymodules-dir=/usr/local/nagios/libexec --with-mymodules-dyndir=/usr/local/nagios/libexec
make all
make install
上面的步骤注意写你自己的nagios安装路径。
查看被监控机/usr/local/nagios/libexec目录下插件check_oracle_health是否有了?
查看被监控机/usr/local/nagios/libexec目录下插件check_oracle_health是否有了?
4、切换到oracle用户,试运行一下这个插件看看?
/usr/local/nagios/libexec/check_oracle_health --connect=你oracle的SID --user=oracle用户 --password=oracle密码 --mode=tnsping
输出如下信息说明没有问题:
OK - connection established to 你oracle的SID.
OK - connection established to 你oracle的SID.
或者你可以把最后的--mode=tnsping换成--mode=tablespace-usage试试看是否能查看所有表空间了?
5、上面是oracle用户运行没有任何问题,但是我们是root运行的,所以必须把oracle用户下的所有变量加入到root用户的变量下,再尝试上面的第4步看看是否有问题?没问题则说明OK了!有问题则说明环境变量没加好!
6、被监控测试自己是没问题了,如何让监控机去调用这个脚本呢?在被监控上面的nrpe.cfg文件加入如下内容:
vi /usr/local/nagios/etc/nrpe.cfg
command[check_oracle_health]=/usr/local/nagios/libexec/check_oracle_health --connect=你oracle的SID --user=oracle用户 --password=oracle密码 --mode=tablespace-usage
保存后退出,然后我们重启被监控的nrpe服务
保存后退出,然后我们重启被监控的nrpe服务
killall nrpe
/usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d
7、下面我们该到监控上去检查这个插件了?
/usr/local/nagios/libexec/check_nrpe -H 你的被监控机IP地址 -c check_oracle_health
如果正常会输出你所有的表空间内容,这里我就不列出我的表空间内容了哈!
8、修改监控机的/usr/local/nagios/etc/objects/services.cfg文件,增加如下内容:
define service{
host_name 数据库IP地址
host_name 数据库IP地址
service_description check-oracle-tablespace
check_command check_nrpe!check_oracle_health
max_check_attempts 5
normal_check_interval 3
retry_check_interval 2
check_period 24x7
notification_interval 10
notification_period 24x7
notification_options w,u,c,r
contact_groups sagroup
}
保存重启nagios,你web界面被监控机应该就看到如下图所示了!
check_command check_nrpe!check_oracle_health
max_check_attempts 5
normal_check_interval 3
retry_check_interval 2
check_period 24x7
notification_interval 10
notification_period 24x7
notification_options w,u,c,r
contact_groups sagroup
}
保存重启nagios,你web界面被监控机应该就看到如下图所示了!
这样基本就算大功告成了!
不过上图只看到一个表空间哈!你点进去之后就看到所有表空间的使用率了!
遇到的问题:
帮一好朋友也是安装这个插件,然后oracle用户去执行查看,也就是上面的第4步出现如下信息:
CRITICAL - cannot connect to orcl. install_driver(Oracle) failed: Can't load '/usr/lib/perl5/site_perl/5.8.8/i386-linux-thread-multi/auto/DBD/Oracle/Oracle.so' for module DBD::Oracle: /u01/app/oracle/product/10.2.0/db_1/lib/libnnz10.so: cannot restore segment prot after reloc: Permission denied at /usr/lib/perl5/5.8.8/i386-linux-thread-multi/DynaLoader.pm line 230.
at (eval 13) line 3
Compilation failed in require at (eval 13) line 3.
Perhaps a required shared library or dll isn't installed where expected
at ./check_oracle_health line 4098
查了半天结果是selinux开启导致的。
at (eval 13) line 3
Compilation failed in require at (eval 13) line 3.
Perhaps a required shared library or dll isn't installed where expected
at ./check_oracle_health line 4098
查了半天结果是selinux开启导致的。
关闭selinux命令
setenforce 0
结果就正常了!