一、前言
为什么想到要用golang来编写metrics呢?这主要是我们的一个客户那里,k8s网络使用了ovs,并且做了bond,即bond0和bond1,每个bond下面2张网卡。在上了生产后,让我每天都要检查一下网卡是否正常,因为之前就有网卡DOWN了。而我呢,比较懒,不想手动去检查。想着通过prometheus最终展示到grafana,我就在grafana上看看有没有处于异常的网卡就好了。其次呢,我最近刚好在学习go,也想练练手;同时也问了一下研发同学,说很简单,叫我试试,遇到困难时也愿意帮助我。所以,我就打算试试了。
二、环境
组件 | 版本 | 备注 |
k8s | v1.14 | |
ovs | v2.9.5 | |
go | 1.14.1 |
三、目标
目标就是要通过prometheus去拉取我的ovs bond的网卡状态指标,那么这里我需要编写一个go程序去获取我主机的ovs bond信息,并最终以metrics方式暴露供prometheus来拉取,在grafana上展示。示例如下:
# 现获取当前bond信息 [root@test~]$ ovs-appctl bond/show |grep '^slave' |grep -v grep |awk '{print $2""$3}' a1-b1:enabled a2-b2:enabled a3-b3:enabled a4-b4:disabled # 最终组件暴露的数据如下 5代表获取bond信息的命令执行执行失败了,0-4表示有几张处于disabled状态的网卡 curl http://$IP:$PORT/metrics ovs_bond_status{component="ovs"} 5 ovs_bond_status{component="ovs","a1b1"="enabled","a2b2"="disabled","a3b3"="enabled",a4b4="disabled“} 2
四、构想
- 由于要通过prometheus来抓取指标,所以bond 信息肯定要以metrics格式进行暴露。metrics格式可以参考prometheus官网。
- bond有两个,每个下面有两张网卡,每张网卡的状态只有enabled和disabled,因此用数字0-4来告诉用户有几张网卡disabled了,用数字5来表示命令执行有问题或没有bond,需要人工介入。可以通过命令去获取bond信息,因此还是采取命令方式去获取。
- 要对执行命令获取的输出结果进行处理并放到metrics中去。注:metrics的label不能有【-】。
- shell命令返回的bond正确信息用map去接收,key为网卡名,value为网卡状态
- 可以参考client_golang/prometheus
五、实践
先执行shell命令去获取bond信息
# 现获取当前bond信息 [root@test~]$ ovs-appctl bond/show |grep '^slave' |grep -v grep |awk '{print $2""$3}' a1-b1:enabled a2-b2:enabled a3-b3:enabled a4-b4:disabled
要针对shell的输出结果进行处理
# 执行shell命令,并对输出进行处理,记录相关日志 // return map // 一种是执行命令错误,一种是执行命令成功,但是返回null func getBondStatus() (m map[string]string) { result, err := exec.Command("bash", "-c", "ovs-appctl bond/show | grep '^slave' | grep -v grep | awk '{print $2\"\"$3}'").Output() if err != nil { log.Error("result: ", string(result)) log.Error("command failed: ", err.Error()) m = make(map[string]string) m["msg"] = "failure" return m } else if len(result) == 0 { log.Error("command exec failed, result is null") m = make(map[string]string) m["msg"] = "return null" return m } // 对结果进行进行处理,先去除两边空格 ret := strings.TrimSpace(string(result)) // 通过换行符切割 tt := strings.Split(ret, "\n") //tt := []string{"a1-b1:enabled","a2-b2:disabled"} //如果key带有【-】,则需要去掉 var nMap = make(map[string]string) for i := 0; i < len(tt); i++ { // if key contains "-" if strings.Contains(tt[i], "-") == true { nKey := strings.Split(strings.Split(tt[i], ":")[0], "-") nMap[strings.Join(nKey, "")] = (strings.Split(tt[i], ":"))[1] } else { nMap[(strings.Split(tt[i], ":"))[0]] = (strings.Split(tt[i], ":"))[1] } } return nMap }
定义metrics指标
// define a struct type ovsCollector struct { // 可以定义多个 ovsMetric *prometheus.Desc } func (collector *ovsCollector) Describe(ch chan<- *prometheus.Desc) { ch <- collector.ovsMetric } // 网卡名 var vLable = []string{} // 网卡状态 var vValue = []string{} // 固定label,表明是ovs var constLabel = prometheus.Labels{"component": "ovs"} // define metric func newOvsCollector() *ovsCollector { var rm = make(map[string]string) rm = getBondStatus() if _, ok := rm["msg"]; ok { log.Error("command execute failed:", rm["msg"]) } else { //只获取网卡名 for k, _ := range rm { // get the net vLable = append(vLable, k) } } // metric return &ovsCollector{ ovsMetric: prometheus.NewDesc("ovs_bond_status", "Show ovs bond status", vLable, constLabel), } }
指标对应值
// 命令执行正确则将对应的网卡、网卡状态以及处于异常的网卡数量注入到到metrics中去 func (collector *ovsCollector) Collect(ch chan<- prometheus.Metric) { var metricValue float64 var rm = make(map[string]string) rm = getBondStatus() if _, ok := rm["msg"]; ok { log.Error("command exec failed") metricValue = 5 ch <- prometheus.MustNewConstMetric(collector.ovsMetric, prometheus.CounterValue, metricValue) } else { vValue = vValue[0:0] //只取value for _, v := range rm { // get the net vValue = append(vValue, v) // 针对disabled计数 if v == "disabled" { metricValue++ } } ch <- prometheus.MustNewConstMetric(collector.ovsMetric, prometheus.CounterValue, metricValue, vValue...) } }
程序入口
func main() { ovs := newOvsCollector() prometheus.MustRegister(ovs) http.Handle("/metrics", promhttp.Handler()) log.Info("begin to server on port 8080") // listen on port 8080 log.Fatal(http.ListenAndServe(":8080", nil)) }
完整代码
package main import ( "github.com/prometheus/client_golang/prometheus" "github.com/prometheus/client_golang/prometheus/promhttp" log "github.com/sirupsen/logrus" "net/http" "os/exec" "strings" ) // define a struct from prometheus's struct named Desc type ovsCollector struct { ovsMetric *prometheus.Desc } func (collector *ovsCollector) Describe(ch chan<- *prometheus.Desc) { ch <- collector.ovsMetric } var vLable = []string{} var vValue = []string{} var constLabel = prometheus.Labels{"component": "ovs"} // get the value of the metric from a function who would execute a command and return a float64 value func (collector *ovsCollector) Collect(ch chan<- prometheus.Metric) { var metricValue float64 var rm = make(map[string]string) rm = getBondStatus() if _, ok := rm["msg"]; ok { log.Error("command exec failed") metricValue = 5 ch <- prometheus.MustNewConstMetric(collector.ovsMetric, prometheus.CounterValue, metricValue) } else { vValue = vValue[0:0] for _, v := range rm { // get the net vValue = append(vValue, v) if v == "disabled" { metricValue++ } } ch <- prometheus.MustNewConstMetric(collector.ovsMetric, prometheus.CounterValue, metricValue, vValue...) } } // define metric's name、help func newOvsCollector() *ovsCollector { var rm = make(map[string]string) rm = getBondStatus() if _, ok := rm["msg"]; ok { log.Error("command execute failed:", rm["msg"]) } else { for k, _ := range rm { // get the net vLable = append(vLable, k) } } return &ovsCollector{ ovsMetric: prometheus.NewDesc("ovs_bond_status", "Show ovs bond status", vLable, constLabel), } } func getBondStatus() (m map[string]string) { result, err := exec.Command("bash", "-c", "ovs-appctl bond/show | grep '^slave' | grep -v grep | awk '{print $2\"\"$3}'").Output() if err != nil { log.Error("result: ", string(result)) log.Error("command failed: ", err.Error()) m = make(map[string]string) m["msg"] = "failure" return m } else if len(result) == 0 { log.Error("command exec failed, result is null") m = make(map[string]string) m["msg"] = "return null" return m } ret := strings.TrimSpace(string(result)) tt := strings.Split(ret, "\n") var nMap = make(map[string]string) for i := 0; i < len(tt); i++ { // if key contains "-" if strings.Contains(tt[i], "-") == true { nKey := strings.Split(strings.Split(tt[i], ":")[0], "-") nMap[strings.Join(nKey, "")] = (strings.Split(tt[i], ":"))[1] } else { nMap[(strings.Split(tt[i], ":"))[0]] = (strings.Split(tt[i], ":"))[1] } } return nMap } func main() { ovs := newOvsCollector() prometheus.MustRegister(ovs) http.Handle("/metrics", promhttp.Handler()) log.Info("begin to server on port 8080") // listen on port 8080 log.Fatal(http.ListenAndServe(":8080", nil)) }
六、部署
因为最终要部署到k8s环境中, 先构建镜像,参考如下Dockerfile
FROM golang:1.14.1 AS builder WORKDIR /go/src COPY ./ . RUN go build -o ovs_check main.go # runtime FROM centos:7.7 COPY --from=builder /go/src/ovs_check /xiyangxixia/ovs_check ENTRYPOINT ["/xiyangxixia/ovs_check"]
我这里部署使用的yaml如下所示:
--- apiVersion: apps/v1 kind: DaemonSet metadata: name: ovs-agent namespace: kube-system spec: minReadySeconds: 5 selector: matchLabels: name: ovs-agent template: metadata: annotations: # 这里三个都要加上,告诉promethue抓取路径 prometheus.io/scrape: "true" prometheus.io/port: "8080" prometheus.io/path: "/metrics" labels: name: ovs-agent spec: containers: - name: ovs-agent image: ovs_bond:v1 imagePullPolicy: IfNotPresent resources: limits: cpu: 100m memory: 200Mi requests: cpu: 100m memory: 200Mi securityContext: privileged: true procMount: Default volumeMounts: - mountPath: /lib/modules name: lib-modules readOnly: true - mountPath: /var/run/openvswitch name: ovs-run - mountPath: /usr/bin/ovs-appctl name: ovs-bin subPath: ovs-appctl serviceAccountName: xiyangxixia hostPID: true hostIPC: true volumes: - hostPath: path: /lib/modules type: "" name: lib-modules - hostPath: path: /var/run/openvswitch type: "" name: ovs-run - hostPath: path: /usr/bin/ type: "" name: ovs-bin updateStrategy: type: RollingUpdate
七、测试
[root@test ~]$ kubectl get po -n kube-system -o wide |grep ovs ovs-agent-h8zc6 1/1 Running 0 2d14h 10.211.55.41 master-1 <none> <none> [root@test ~]$ curl 10.211.55.41:8080/metrics |grep ovs_bond # HELP ovs_bond_status Show ovs bond status # TYPE ovs_bond_status counter ovs_bond_status{component="ovs",a1b1="enabled",a2b2="enabled",a3b3="enabled",a4b4="enabled"} 0
八、总结
以上就是这篇文章的所有了,原谅我学艺不精只能粗糙的介绍一下。感谢一直以来关注公众号的朋友们!