使用Telegraf+GrayLog实现Linux业务系统服务异常时自动推送钉钉告警
实现过程参考Telegraf官方文档
https://docs.influxdata.com/telegraf/v1.24/get_started/ https://github.com/influxdata/telegraf/blob/release-1.24/plugins/inputs/exec/README.md https://sbcode.net/grafana/telegraf-inputs-exec-monitor-ssh-sessions/
一、GrayLog上配置Telegraf的GELF UDP方式接入Input和Stream
步骤较简单,下面只展示配置时的一些截图
记得GrayLog上开放input设置的端口
firewall-cmd --permanent --zone=public --add-port=12201/udp firewall-cmd --reload
二、业务服务器上安装Telegraf并配置telegraf.conf
1、业务服务器上创建一个XX服务检测脚本
vim /opt/service_check.sh #!/bin/sh status=$(/usr/bin/systemctl status sshd | grep Active | awk -F "since" '{print $1}') echo $status chmod 777 /opt/service_check.sh
- 2、telegraf.conf配置文件生成并修改
rpm -ivh telegraf-1.24.3-1.x86_64.rpm telegraf --sample-config --input-filter exec --output-filter graylog > telegraf.conf vim telegraf.conf
最终的telegraf.conf 配置文件如下
#cat telegraf.conf | grep -v ^# | grep -v ^$ | grep -v ^.*## | grep -v ^.*# [global_tags] [agent] interval = "10s" round_interval = true metric_batch_size = 1000 metric_buffer_limit = 10000 collection_jitter = "0s" flush_interval = "10s" flush_jitter = "0s" precision = "0s" hostname = "" omit_hostname = false [[outputs.graylog]] servers = ["udp://192.168.31.170:12201"] [[inputs.exec]] commands = [ "sh /opt/service_check.sh" ] timeout = "10s" name_override = "sshd_service_status_check" data_format = "value" data_type = "string" interval = "45s"
cd /etc/telegraf/ mv telegraf.conf telegraf.conf_default cp /root/telegraf.conf ./ chmod 644 telegraf.conf systemctl start telegraf
启动报错,原因为telegraf.conf的权根
三、GrayLog上查看telegraf日志并配置告警
其中 PrometheusAlert告警模板
## [Graylog告警信息](.check_result.Event.Source) ### <font color=#FF0000>告警描述:{{.event_definition_description}}</font> {{ range $k,$v:=.backlog }} ##### <font color="#FF0000">告警时间</font>:{{GetCSTtime $v.timestamp}} </br> ##### <font color="#FF0000">告警服务器名称</font>:{{$v.source}} </br> ##### <font color="#FF0000">告警服务器IP地址</font>:{{$v.fields.gl2_remote_ip}} </br> ##### <font color="#FF0000">服务目前状态</font>:{{$v.fields.value}} </br> {{end}}
GrayLog告警配置过程截图