线上部分实时job是用storm开发的,为了监控数据的延迟,在storm处理日志的时候会把日志的时间插入到redis中,然后通过zabbix做延迟的监控。由于经常有新的job上线,手动配置监控项就变得比较麻烦,为了解放生产力,还是需要搞成自动化。
之前添加网卡和分区监控的时候用了LLD的功能,并用了其内置的宏变量,新版本的zabbix是支持custom LLD的,实现步骤如下:
1.在模板中设置一个discovery rule ( UserParameter Key),调用脚本,返回zabbix规定的json数据(返回自定义的宏变量),并正确设置的discovery(比如filter等)
这里通过官方文档并结合线上的agent日志,可以看到zabbix规定的数据格式
1
2
3
4
5
6
7
8
9
10
11
12
|
143085:20141127:000548.967 Requested [vfs.fs.discovery]
143085:20141127:000548.967 Sending back [{
"data"
:[
{
"{#FSNAME}"
:
"\/"
,
"{#FSTYPE}"
:
"rootfs"
},
{
"{#FSNAME}"
:
"\/proc\/sys\/fs\/binfmt_misc"
,
"{#FSTYPE}"
:
"binfmt_misc"
},
{
"{#FSNAME}"
:
"\/data"
,
"{#FSTYPE}"
:
"ext4"
}]}]
|
比如线上返回json数据的key:
1
|
UserParameter
=
storm.delay.discovery,python2.
6
/
apps
/
sh
/
zabbix_scripts
/
storm
/
storm_delay_discovery.py
|
并通过
1
|
zabbix_get -s 127.0.0.1 -k storm.delay.discovery
|
验证返回数据的准确性
storm_delay_discovery.py内容如下:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
|
#!/usr/bin/python2.6
#for storm job delay monitor auto discovery
#edit by ericni.ni
#2014-11-27
import
sys
import
redis
import
exceptions
import
traceback
_hashtables
=
[]
_continue
=
True
_alllist
=
[]
class
RedisException(Exception):
def
__init__(
self
, errorlog):
self
.errorlog
=
errorlog
def
__str__(
self
):
return
"error log is %s"
%
(
self
.errorlog)
def
scan_one(cursor,conn):
try
:
cursor_v
=
conn.scan(cursor)
cursor_next
=
cursor_v[
0
]
cursor_value
=
cursor_v[
1
]
for
line
in
cursor_value:
if
(line.startswith(
"com-vip-storm"
)
or
line.startswith(
"stormdelay_"
)):
_hashtables.append(line)
else
:
pass
return
cursor_next
except
Exception,e:
raise
RedisException(
str
(e))
def
scan_all(conn):
try
:
cursor1
=
scan_one(
'0'
,conn)
global
_continue
while
_continue:
cursor2
=
scan_one(cursor1,conn)
if
int
(cursor2)
=
=
0
:
_continue
=
False
else
:
cursor1
=
cursor2
_continue
=
True
except
Exception,e:
raise
RedisException(
str
(e))
def
hget_fields(conn,hashname):
fields
=
conn.hkeys(hashname)
re
=
"["
#print "hashname %s"%(hashname)
#print fields
for
field
in
fields:
aline
=
""
aline
+
=
"""{"{#STORMHASHNAME}": "%s", "{#STORMHASHFIELD}": "%s"}"""
%
(hashname,field)
_alllist.append(aline)
if
__name__
=
=
'__main__'
:
re
=
""
try
:
r
=
redis.StrictRedis(host
=
'xxx'
, port
=
xxx, db
=
0
)
scan_all(r)
for
hashtable
in
_hashtables:
hget_fields(r,hashtable)
re
+
=
"""{"data": """
re
+
=
str
(_alllist).replace(
"'",'')
re += "}"
print re.replace("'"
,'"')
except
Exception,e:
print
-
1
|
2.设置item/graph/trigger prototypes:
这里以item为例,定义item prototypes (同样需要定义key),key的参数为宏变量
比如Free inodes on {#FSNAME} (percentage)--->vfs.fs.inode[{#FSNAME},pfree]
本例中,在item中使用上面返回的宏变量即可,
1
|
storm_delay[hget,{
#STORMHASHNAME},{#STORMHASHFIELD}]
|
最后,把包含LLD的template链接到host上即可。
最后再配合screen.create/screenitem.update api就可以实现监控添加/screen添加,更新的自动化了。
本文转自菜菜光 51CTO博客,原文链接:http://blog.51cto.com/caiguangguang/1583536,如需转载请自行联系原作者