前提条件

ACK专有集群
安装应用市场的ack-prometheus-operator ：https://help.aliyun.com/document_detail/94622.html

本文环境基于阿里云ACK专有集群1.22以及ACK应用市场的ack-prometheus-operator chart 12.0.0。

概念区分

Prometheus-Operator vs Prometheus vs Kube-Prometheus

这是三个不同的开源项目，概念上不要混淆，简单看下三者关联：

Prometheus：用于监控k8s集群，收集容器集群的监控指标，可以作为三方展示平台比如grafana的数据源。

Prometheus Operator：可用于管理Prometheus，operator本身并不做监控，可以看成是prometheus的自动化运维工具，类似是一个翻译器，用于将便捷的配置翻译成标准prometheus配置文件。

Kube-Prometheus ：目前k8s集群监控的主流项目，主要使用Prometheus做集群监控，使用Prometheus Operator做监控的运维管理，也就是以上二者的结合。

ACK应用市场的ACK-Prometheus-Operator 新版本是集成了kube-prometheus的。本文基于ACK-Prometheus-Operator chart 12.0.0，也就是kube-prometheus展开解析。

ACK-Prometheus-Operator 架构

一句话总结，ACK-Prometheus-Operator （Kube-Prometheus）主要包含Prometheus-operator控制器、prometheus server监控系统、各种exporter做指标采集组件和prometheus alert报警系统。本系列不涉及alert组件。

本文重点章节：

最后一节，《Prometheus-Operator 配置更新流程》, 分别介绍了：

标准prometheus配置解读

Operator CRD配置之间的匹配关系

Prometheus如何reload新配置

配置更新生效流程

Prometheus-Operator 架构

分别梳理prometheus以及prometheus-operator两部分。

prometheus官网给出的架构图如下:

Prometheus 本身就是一个k8s 监控系统，负责实现对监控数据的获取，存储以及查询。Prometheus server 的基本原理是通过 HTTP/HTTPS 周期性抓取被监控目标的 metrics(指标)数据，任意组件只要提供对应的 HTTP/HTTPS 接口并且符合 Prometheus 定义的数据格式，就可以接入 Prometheus 监控。在监控领域，有PUSH/PULL两种概念，prom-operator 这种由prom server 调用被监控对象（targets）获取监控数据的方式被称为 Pull(拉) 。

ACK-prometheus便是采用pull方式（此处对比的是pushgateway，ACK-prometheus-operator 组件没有单独部署pushgateway做指标的“push”）。其次，Prometheus Server 需要对采集到的监控数据进行存储，Prometheus Server 本身就是一个时序数据库，将采集到的监控数据按照时间序列的方式存储在本地磁盘当中。

Prometheus-Operator官网给出的架构图。

可以直观的看出Prometheus-Operator负责部署和管理prometheus server.

Operator 是 Kubernetes 的一种自定义资源的控制器，就像KCM组件内置资源控制器如deployment controller 管理deployment一样，Prometheus-Operator会管理控制prometheus相关资源，可以参考CNCFOperator 白皮书。

Prometheus-Operator根据相关CRD(servicemonitor等)配置及时更新prometheus server的监控对象（targets）使其可以按照实际配置预期运行,可以将Prometheus-Operator看成一个代码化的自动化的简化prometheus 部署维护操作的高级运维人员。

Prometheus-Operator既然是一种k8s controller,则具备 k8s controller的通用性架构。通用k8s控制器内部可以分informer -worker queue -control loop三部分,不在本文做赘述，只是了解下prometheus-operator 也是基于这种架构不断的watch apiserver中相关crd的变更去推动prometheus 的配置更新。

图来自《https://kingjcy.github.io/post/cloud/paas/base/kubernetes/k8s-controller/》

ACK-Prometheus-Operator 部署

ACK-Prometheus-Operator 集成 kube-prometheus，架构中的组件详细介绍请各自参考超链接中的官网，不做赘述。ACK应用市场基于helm chart 部署后，主要部署了以下几个组件:

ThePrometheus Operator
Highly available Prometheus
Highly available Alertmanager
Prometheus node-exporter
Prometheus Adapter for Kubernetes Metrics APIs （默认未开启，部署resource metrics, custom metrics, and external metrics APIs，其中resource metrics可替代metrcis-server (apis/metrics.k8s.io),本文不涉及）
kube-state-metrics
Grafana
admission webhook 用于确保PrometheusRule和AlertmanagerConfig的有效配置。

在ACK-prometheus-operator 架构中，各个组件对应形式如下：

部署组件	映射在ACK中的资源	类型	怎么部署到ack集群的
Prometheus Operator	ack-prometheus-operator-operator	Deployment	helm部署
Prometheus server	prometheus-ack-prometheus-operator-prometheus	statefulset	prometheus-operator 根据helm部署的CRD prometheus 部署 statefulset
Alertmanager	alertmanager-ack-prometheus-operator-alertmanager	statefulset	prometheus-operator 根据helm部署的CRD alertmanagers 部署statefulset
Prometheus node-exporter	ack-prometheus-operator-prometheus-node-exporter	DaemonSet	helm部署
Grafana	ack-prometheus-operator-grafana	Deployment	helm部署
kube-state-metrics	ack-prometheus-operator-kube-state-metrics	Deployment	helm部署

Prometheus-Operator CRD资源

Prometheus-Operator 是本文的重点，以deployment的形式存在，其中包含了所有CRD的schema定义、模型，以及各CRD的处理逻辑。它负责部署并管理相关CRD资源，定期循环watch apiserver，将相关的CRD更新/配置更新及时应用到运行中的prometheus系统中。

这些CRD都是一些方便使用人员配置的项目，以k8s用户熟悉的方式(label、selectors)定义prometheus的scrap job, 支持多种label比较方式: In, NotIn, Exists, DoesNotExists 。

k8s用户更新CRD后，最终会被operator翻译成prometheus标准的配置文件，prometheus.yml完成自动更新和生效，无需重启prometheus实例。

每一个CRD的标准格式不赘述，在配置环节会有具体示例。以下CRD可以通过kubectl get xx 获取具体信息。

CRD 名称	作用
Prometheus	最核心的一个CRD, 控制prometheus server的statefulset状态。该CRD用于部署、管理prometheus stateful实例，以及配置该prometheus实例与ServiceMonitor(通过serviceMonitorNamespaceSelector标签)、Altermanager(通过alertmanagers标签)、PromtheusRule(通过ruleSelector标签)之间的关联。一个Prometheus crd 资源创建后，promtheus-operator会自动创建一个prometheus stateful实例。
ServiceMonitor	纯配置，Operator告诉prometheus server , 要监控的 targets是基于k8s service动态发现。 Operator基于servicemonitor的配置生成promtheus的标准配置文件promtheus.yml。注意的是，ServiceMonitor中的endpoint被转换为prometheus.yml中的kubernetes_sd_configs标签，即服务发现仍然是通过prometheus的原生能力完成的，ServiceMonitor或prometheus-operator并不具备服务发现能力，仅仅是配置转换与应用能力。
PodMonitor	纯配置，类似于ServiceMonitor，只不过要监控的 targets是基于k8s pod label 动态发现，是针对pod级别的scrap job。
Alertmanager	用于部署和管理promtheus的Altermanager实例.一个Altermanager资源定义会对应于一个stateful实例，prometheus-opertaor会根据Alertmanager中指定replicas、image、RBAC等信息将promtheus的altermanager pod部署，prometheus实例会自动与该Alertmanager相关联，共同完成监控->告警的链路。
PrometheusRule	用于生成promtheus的告警规则文件.纯配置项。promtheus-operator会将该资源转换为prometheus的rule文件，挂在于prometheus实例的文件系统中:
alertmanagerconfigs	Alertmanager配置, 默认无配置。
probes	默认无配置。
thanosrulers	控制Thanos deployment, 默认无配置。

各个CRD以及operator之间的关系：

图来自《https://v1-0.choerodon.io/zh/blog/prometheus-operator-introduce/》

Prometheus-Operator 配置更新流程

Prometheus Server 可以通过静态配置static_configs管理监控目标，也可以配合使用 Service/Pod Discovery 的方式（sd_config）动态管理监控目标，并从这些监控目标中获取数据。

Prometheus-operator 通过定期循环watch apiserver，获取到CRD资源（比如servicemonitor）的创建或者更新，将配置更新及时应用到运行中的prometheus pod中转换成标准promethesu配置文件供prom server使用。

标准prometheus配置解读

标准配置文件参考网上现成的分析就行，随手找了一个：https://developer.aliyun.com/article/830280。

标准配置官方解析： https://prometheus.io/docs/prometheus/latest/configuration/configuration/，示例。

不过我们使用ack-prometheus-operator,其实是想尽量避免手写这种及其不易读容器出错的的配置，所以会利用operator CRD去做配置。

Operator CRD配置之间的匹配关系

使用CRD做prometheus配置，“匹配”是一个很重要的细节，详细匹配关系如图，任何地方匹配失败会导致转化成的标配prometheus文件无法识别到targets。

配置间的匹配总结就是：

ServiceMonitor注意事项：

ServiceMonitor的label 需要跟prometheus中定义的serviceMonitorSelector一致
ServiceMonitor的endpoints中port时对应k8s service资源中的portname, 不是port number.
ServiceMonitor的selector.matchLabels需要匹配k8s service中的label
ServiceMonitor资源创建在prometheus的namespace下，使用namespaceSelector匹配要监控的k8s svc的ns.
servicemonitor若匹配多个svc,会发生数据重复

PodMonitor注意事项：

PodMonitor的label 需要跟prometheus中定义的podMonitorSelector一致
PodMonitor的spec.podMetricsEndpoints.port 需要写pod中定义的port name,不能写port number。
PodMonitor的selector.matchLabels需要匹配k8s pod 中的label
PodMonitor资源创建在prometheus的namespace下，使用namespaceSelector匹配要监控的k8s pod的ns.

Prometheus如何reload新配置

简单讲就是重启进程或者调用HTTP /-/reload 接口：

A configuration reload is triggered by sending a SIGHUP to the Prometheus process or sending a HTTP POST request to the /-/reload endpoint (when the --web.enable-lifecycle flag is enabled). This will also reload any configured rule files.

Prometheus stateful pod 包含config-reloader 和 prometheus两个container。其中，config-reloader 用于定期调用reload接口去刷新配置。有几个时间间隔值需要了解下，新配置部署后需要等待的生效时间，参考官网。可以看到，默认每3分钟reload一次新配置。

const (
 defaultWatchInterval = 3 * time.Minute // 3 minutes was the value previously hardcoded in github.com/thanos-io/thanos/pkg/reloader.
 defaultDelayInterval = 1 * time.Second // 1 second seems a reasonable amount of time for the kubelet to update the secrets/configmaps.
 defaultRetryInterval = 5 * time.Second // 5 seconds was the value previously hardcoded in github.com/thanos-io/thanos/pkg/reloader.
#
 watchInterval := app.Flag("watch-interval", "how often the reloader re-reads the configuration file and directories; when set to 0, the program runs only once and exits").Default(defaultWatchInterval.String()).Duration()
 delayInterval := app.Flag("delay-interval", "how long the reloader waits before reloading after it has detected a change").Default(defaultDelayInterval.String()).Duration()
 retryInterval := app.Flag("retry-interval", "how long the reloader waits before retrying in case the endpoint returned an error").Default(defaultRetryInterval.String()).Duration()
 watchedDir := app.Flag("watched-dir", "directory to watch non-recursively").Strings()

那如果测试期间不想等3分钟呢？除了重启，可以手动触发reload 么？（其实没必要）

先说结论：手动模拟方式：curl -X POST http://localhost:9090/-/reload （localhost可以换成prom pod ip）

扩展：

pod中的多容器，为何netstat可以看到全部容器进程监听，ps 看不到？思考一下，不在此处赘述。

config-reloader container: 自己的监听是8080，9090不是自己container的监听。

prometheus container，自己的监听是9090.

看下yaml定义进一步理解两个容器如何合作：

config-reloader容器启动参数可以看出，他会定期发出这个reload接口调用，由于9090是prometheus server container中的进程监听，其实是prometheus server响应的这个接口。 config-reload 进程负责的是定期call 9090端口的reload接口.

默认 prom server命令行开启了 --web.enable-lifecycle ，因此可以使用reload/quit 接口（不开启这个参数则无法使用这俩接口）。prometheus container收到reload 接口调用后，便会reload 启动参数里定义的--config.file。

但是prometheus server container relaod的是固定文件--config.file，这个文件是如何获取/发现最新的配置内容呢？看下文。

配置更新生效流程

1. helm或者kubectl更新servicemonitor、podmonitor或者prometheus CRD 中的配置（包含addtionalscrapeconfig）
2. Prometheus-operator watch apiserver获取到以上CRD 中的更新
3. Prometheus-operator 将捕捉到的更新都翻译成prometheus标准配置格式prometheus.yaml
4. Prometheus-operator 打包prometheus.yaml 到secret “prometheus-ack-prometheus-operator-prometheus ” 中，存储为secret.data字段: prometheus.yaml.gz，
5. 以上secret的data是被mount 给prometheus pod 的config-reloader container中的。
6. 由config-reloader 解压gz文件后读取出标准 prometheus.yml文件，通过empty-dir共享给prometheus container 。
7. Prometheus 进程读取参数 --config.file 指定的文件路径，就是通过emptydir的共享文件，即config-reloader输出的config yaml文件。
8. Prometheus server 根据配置中的relabel等信息，动态/静态发现要监控的targets, 对targets发起http/https请求抓取metrics数据。
9. 抓取metrics数据存储后提供给UI做query查询，或者提供给grafana等做展示。

看下第六步config-reloader中的处理：

从secret中查看标准prom config配置：

kubectl get secret prometheus-ack-prometheus-operator-prometheus   -n monitoring -ojson|jq -r '.data["prometheus.yaml.gz"]' |base64 -d| gunzip

看下prometheus pod yaml中关于secret mount的定义：

containers:
#prometheus 启动参数读取emptydir实现的共享文件，即config-reloader输出的config yaml文件
          name: prometheus
        - args:
            - '--config.file=/etc/prometheus/config_out/prometheus.env.yaml'
          volumeMounts:
            - mountPath: /etc/prometheus/config_out
              name: config-out
#config-reloader 读取secret中的配置内容并gunzip为标准prom config yaml格式输出到config_out，config_out通过emptydir实现共享。
    name: config-reloader
        - args:
            - '--config-file=/etc/prometheus/config/prometheus.yaml.gz'
            - >-
              --config-envsubst-file=/etc/prometheus/config_out/prometheus.env.yaml
          volumeMounts:
            - mountPath: /etc/prometheus/config
              name: config
            - mountPath: /etc/prometheus/config_out
              name: config-out
#volumes
  volumes:
        - name: config
          secret:
            defaultMode: 420
            secretName: prometheus-ack-prometheus-operator-prometheus
        - emptyDir: {}
          name: config-out
        - name: config
          secret:
            defaultMode: 420
            secretName: prometheus-ack-prometheus-operator-prometheus

参考：

https://yuque.antfin.com/xingyu.cxy/gz5g1e/uzs2k8

https://zhuanlan.zhihu.com/p/342823695

https://yunlzheng.gitbook.io/prometheus-book/part-iii-prometheus-shi-zhan/operator/what-is-prometheus-operator

（一）ACK prometheus-operator 之架构梳理

前提条件

概念区分

ACK-Prometheus-Operator 架构

Prometheus-Operator 架构

ACK-Prometheus-Operator 部署

Prometheus-Operator CRD资源

Prometheus-Operator 配置更新流程

参考：

热门文章

最新文章

相关课程

相关电子书

相关实验场景

推荐镜像

热门

活动广场

任务中心

开发者评测

高校计划

乘风者计划

训练营

阿里云MVP

话题

直播

下载

镜像站

技术资料

插件

（一）ACK prometheus-operator 之架构梳理

前提条件

概念区分

ACK-Prometheus-Operator 架构

Prometheus-Operator 架构

ACK-Prometheus-Operator 部署

Prometheus-Operator CRD资源

Prometheus-Operator 配置更新流程

参考：

热门文章

最新文章

相关课程

相关电子书

相关实验场景

推荐镜像