<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html><head><meta http-equiv="Cont

本文涉及的产品
转发路由器TR,750小时连接 100GB跨地域
简介: 我们以前使用过的对hbase和hdfs进行健康检查,及剩余hdfs容量告警,简单易用1.针对hadoop2的脚本:#/bin/bashbin=`dirname 0bin=cdbin;pwd`STATE_OK=...

我们以前使用过的对hbase和hdfs进行健康检查,及剩余hdfs容量告警,简单易用

1.针对hadoop2的脚本:

#/bin/bash


bin=`dirname 0bin=cdbin;pwd`


STATE_OK=0
STATE_WARNING=1
STATE_CRITICAL=2
STATE_UNKNOWN=3
STATE_DEPENDENT=4


source /etc/profile


DFS_REMAINING_WARNING=15
DFS_REMAINING_CRITICAL=5
ABNORMAL_QUERY="INCONSISTENT|CORRUPT|FAILED|Exception"


HADOOP_WEB_INTERFACE=h001.hadoop
HBASE_WEB_INTERFACE=h008.hadoop
# hbck and fsck report
output=/var/log/cluster-status
hbase hbck >> outputhadoopfsck/apps/hbase>>output


# check report
count=`egrep -c "ABNORMALQUERY"output`
if [ counteq0];thenecho"[OK]Clusterishealthy.">>output
else
echo "[ABNORMAL] Cluster is abnormal!" >> output   # Get the last matching entry in the report file last_entry=`egrep "ABNORMAL_QUERY" output|tail1echo"(count) lastentry"exitSTATE_CRITICAL
fi



# HDFS usage
dfs_remaining=`curl -s http://HADOOPWEBINTERFACE:50070/jmx?qry=Hadoop:service=NameNode,name=NameNodeInfo|egrepo"PercentRemaining."|egrepo"[09]\.[09]"dfsremainingword="DFSRemaining{dfs_remaining}%"


echo "dfsremainingword">>output


# check HDFS usage
dfs_remaining=`echo dfsremaining|awkF.print$1if[dfs_remaining -lt DFSREMAININGCRITICAL];thenecho"LowDFSspace.dfs_remaining_word"
exit_status=STATECRITICALelif[dfs_remaining -lt DFSREMAININGWARNING];thenecho"LowDFSspace.dfs_remaining_word"
exit_status=STATEWARNINGelseecho"HBasecheckOKDFSandHBasehealthy.dfs_remaining_word"
exit_status=STATEOKfiexitexit_status







2.针对hadoop1的脚本:


#/bin/bash


bin=`dirname 0bin=cdbin;pwd`


STATE_OK=0
STATE_WARNING=1
STATE_CRITICAL=2
STATE_UNKNOWN=3
STATE_DEPENDENT=4

source /etc/profile

DFS_REMAINING_WARNING=15
DFS_REMAINING_CRITICAL=5
ABNORMAL_QUERY="INCONSISTENT|CORRUPT|FAILED|Exception"

HADOOP_WEB_INTERFACE= hadoop的Namenode对外接口ip

# hbck and fsck report
output=/data/logs/cluster-status
HBASEHOME/bin/hbasehbck>>output
HADOOPHOME/bin/hadoopfsck/hbase>>output


# check report
count=`egrep -c "ABNORMALQUERY"output`
if [ $count -eq 0 ]; then
echo "[OK] Cluster is healthy." >> $output
else
echo "[ABNORMAL] Cluster is abnormal!" >> $output


# Get the last matching entry in the report file
last_entry=`egrep "ABNORMALQUERY"output | tail -1`
echo "(count)last_entry"

exit STATE_CRITICAL
  fi
  
  
  # Check RegionServer Status
  dead_region_servers=`curl -s http://
{HADOOP_WEB_INTERFACE}:60010/master-status | grep "Dead Region Servers" -A 500 | grep "Regions in Transition" -B 500 | egrep -o 'target="_blank">.*</a>' | awk -F">" '{print 2}' | awk -F"<" '{print $1}'`
  if [ -z
dead_region_servers ];then
echo "[OK] All RegionServers is healthy." 
echo "[OK] All RegionServers is healthy." >> $output
else
echo "[ABNORMAL] the dead regionserver list:" >> $output
echo deadregionservers>>output
exit $STATE_CRITICAL
fi


# HDFS usage
dfs_remaining=`curl -s http://HADOOPWEBINTERFACE:50070/dfshealth.jsp|egrepo"DFSRemainingdfsremainingword="DFSRemaining{dfs_remaining}%"


echo "dfsremainingword">>output


# check HDFS usage
dfs_remaining=`echo dfsremaining|awkF.print$1if[dfs_remaining -lt $DFS_REMAINING_CRITICAL ]; then
echo "Low DFS space. $dfs_remaining_word"
exit_status=STATECRITICALelif[dfs_remaining -lt $DFS_REMAINING_WARNING ]; then
echo "Low DFS space. $dfs_remaining_word"
exit_status=$STATE_WARNING
else
echo "HBase check OK - DFS and HBase healthy. 
$dfs_remaining_word"
exit_status=STATEOKfiexitexit_status
目录
打赏
0
0
0
0
66
分享
相关文章
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html><head><meta http-equiv="Cont
1.尽可能地了解需求,系统层面适用开闭原则 2.模块化,低耦合,能快速响应变化,也可以避免一个子系统的问题波及整个大系统 3.
765 0
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html><head><meta http-equiv="Cont
一、迁移步骤 1.首先安装最新版本gitlab(gitlab7.2安装) 2.停止旧版本gitlab服务 3.将旧的项目文件完整导入新的gitlab   bundle exec rake gitlab:import:r...
730 0
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html><head><meta http-equiv="Cont
服务端需在vm arguments一栏下加上    -agentlib:jdwp=transport=dt_socket,server=y,address=8000 并以run模式启动 如果以debug模式启动服务端...
732 0
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html><head><meta http-equiv="Cont
 Connection reset by peer的常见原因: 1)服务器的并发连接数超过了其承载量,服务器会将其中一些连接关闭;    如果知道实际连接服务器的并发客户数没有超过服务器的承载量,看下有没有网络流量异常。
874 0
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html><head><meta http-equiv="Cont
NoSuchObjectException(message:There is no database named cloudera_manager_metastore_canary_test_db_hive_hivemetastore_df61080e04cd7eb36c4336f71b5a8bc4) at org.
1096 0
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html><head><meta http-equiv="Cont
[root@hadoop058 ~]# mii-tool eth0: negotiated 100baseTx-FD, link ok 100M linux 下查看网卡工作速率 Ethtool是用于查询及设置网卡参数的命令。
661 0
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html><head><meta http-equiv="Cont
kafka.common.ConsumerRebalanceFailedException: group_dd-1446432618163-2746a209 can't rebalance after 10 retries  at kafka.
837 0
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html><head><meta http-equiv="Cont
生产服务器环境最小化安装后 Centos 6.5优化配置备忘 本文 centos 6.5 优化 的项有18处,列表如下: 1、centos6.
1594 0
AI助理

你好,我是AI助理

可以解答问题、推荐解决方案等