01 引言

本文主要讲解在k8s（kubernetes)下安装kube-prometheus。

kube-prometheus的github地址：https://github.com/prometheus-operator/kube-prometheus

kube-promethues本质就是以下内容的集合：

Prometheus Operator
Prometheus
Alertmanager
node-exporter
Prometheus Adapter for Kubernetes Metrics APIs
kube-state-metrics
Grafana

注意kube-promethues与kubernetes的版本对应关系如下：

微信截图_20221012182910.png

由于目前系统使用的k8s版本是v1.16.9，根据上面的列表，是找不到的，因此查阅了其它的资料，发现是支持的：

因此，从版本对应关系表，可以看出，采用release-0.4版本，下载地址：https://github.com/prometheus-operator/kube-prometheus/releases/tag/v0.4.0

02 kube-prometheus安装

2.1 step1：分类yaml

首选使用ssh工具把kube-prometheus压缩包上传至服务器。

解压：

# tar -zxvf kube-prometheus-0.4.0.tar.gz

按yaml分类：

cd kube-prometheus-0.4.0/manifests
# 创建文件夹
mkdir -p node-exporter alertmanager grafana kube-state-metrics prometheus serviceMonitor adapter
# 移动 yaml 文件，进行分类到各个文件夹下
mv *-serviceMonitor* serviceMonitor/
mv grafana-* grafana/
mv kube-state-metrics-* kube-state-metrics/
mv alertmanager-* alertmanager/
mv node-exporter-* node-exporter/
mv prometheus-adapter* adapter/
mv prometheus-* prometheus/

最终结构如下：

➜  manifests tree .
.
├── adapter
│   ├── prometheus-adapter-apiService.yaml
│   ├── prometheus-adapter-clusterRole.yaml
│   ├── prometheus-adapter-clusterRoleAggregatedMetricsReader.yaml
│   ├── prometheus-adapter-clusterRoleBinding.yaml
│   ├── prometheus-adapter-clusterRoleBindingDelegator.yaml
│   ├── prometheus-adapter-clusterRoleServerResources.yaml
│   ├── prometheus-adapter-configMap.yaml
│   ├── prometheus-adapter-deployment.yaml
│   ├── prometheus-adapter-roleBindingAuthReader.yaml
│   ├── prometheus-adapter-service.yaml
│   └── prometheus-adapter-serviceAccount.yaml
├── alertmanager
│   ├── alertmanager-alertmanager.yaml
│   ├── alertmanager-secret.yaml
│   ├── alertmanager-service.yaml
│   └── alertmanager-serviceAccount.yaml
├── grafana
│   ├── grafana-dashboardDatasources.yaml
│   ├── grafana-dashboardDefinitions.yaml
│   ├── grafana-dashboardSources.yaml
│   ├── grafana-deployment.yaml
│   ├── grafana-service.yaml
│   └── grafana-serviceAccount.yaml
├── kube-state-metrics
│   ├── kube-state-metrics-clusterRole.yaml
│   ├── kube-state-metrics-clusterRoleBinding.yaml
│   ├── kube-state-metrics-deployment.yaml
│   ├── kube-state-metrics-service.yaml
│   └── kube-state-metrics-serviceAccount.yaml
├── node-exporter
│   ├── node-exporter-clusterRole.yaml
│   ├── node-exporter-clusterRoleBinding.yaml
│   ├── node-exporter-daemonset.yaml
│   ├── node-exporter-service.yaml
│   └── node-exporter-serviceAccount.yaml
├── operator
│   ├── 0namespace-namespace.yaml
│   ├── prometheus-operator-0alertmanagerConfigCustomResourceDefinition.yaml
│   ├── prometheus-operator-0alertmanagerCustomResourceDefinition.yaml
│   ├── prometheus-operator-0podmonitorCustomResourceDefinition.yaml
│   ├── prometheus-operator-0probeCustomResourceDefinition.yaml
│   ├── prometheus-operator-0prometheusCustomResourceDefinition.yaml
│   ├── prometheus-operator-0prometheusruleCustomResourceDefinition.yaml
│   ├── prometheus-operator-0servicemonitorCustomResourceDefinition.yaml
│   ├── prometheus-operator-0thanosrulerCustomResourceDefinition.yaml
│   ├── prometheus-operator-clusterRole.yaml
│   ├── prometheus-operator-clusterRoleBinding.yaml
│   ├── prometheus-operator-deployment.yaml
│   ├── prometheus-operator-service.yaml
│   └── prometheus-operator-serviceAccount.yaml
├── other
├── prometheus
│   ├── prometheus-clusterRole.yaml
│   ├── prometheus-clusterRoleBinding.yaml
│   ├── prometheus-prometheus.yaml
│   ├── prometheus-roleBindingConfig.yaml
│   ├── prometheus-roleBindingSpecificNamespaces.yaml
│   ├── prometheus-roleConfig.yaml
│   ├── prometheus-roleSpecificNamespaces.yaml
│   ├── prometheus-rules.yaml
│   ├── prometheus-service.yaml
│   └── prometheus-serviceAccount.yaml
└── serviceMonitor
    ├── alertmanager-serviceMonitor.yaml
    ├── grafana-serviceMonitor.yaml
    ├── kube-state-metrics-serviceMonitor.yaml
    ├── node-exporter-serviceMonitor.yaml
    ├── prometheus-adapter-serviceMonitor.yaml
    ├── prometheus-operator-serviceMonitor.yaml
    ├── prometheus-serviceMonitor.yaml
    ├── prometheus-serviceMonitorApiserver.yaml
    ├── prometheus-serviceMonitorCoreDNS.yaml
    ├── prometheus-serviceMonitorKubeControllerManager.yaml
    ├── prometheus-serviceMonitorKubeScheduler.yaml
    └── prometheus-serviceMonitorKubelet.yaml
9 directories, 67 files

2.2 step2：修改数据持久化存储

prometheus 实际上是通过 emptyDir 进行挂载的，我们知道 emptyDir 挂载的数据的生命周期和 Pod 生命周期一致的，如果 Pod 挂掉了，那么数据也就丢失了，这也就是为什么我们重建 Pod 后之前的数据就没有了的原因，所以这里修改它的持久化配置。

本文默认已经安装好了openebs，不再进行讲述，请自行百度。

使用命令查询当前StorageClass的名称（需要安装openebs）：

## 查询当前的storeclass名称
kubectl get sc

可以看到StorageClass的名称为openebs-hostpath，接下来可以修改配置文件了

2.2.1 修改Prometheus 持久化

prometheus是一种 StatefulSet 有状态集的部署模式，所以直接将 StorageClass 配置到里面，在下面的 yaml 中最下面添加持久化配置：

目录：manifests/prometheus/prometheus-prometheus.yaml

在文件末尾新增：

...
  serviceMonitorSelector: {}
  version: v2.11.0
  retention: 3d
  storage:
    volumeClaimTemplate:
      spec:
        storageClassName: openebs-hostpath
        resources:
          requests:
            storage: 5Gi

2.2.2 修改grafana持久化配置

由于 Grafana 是部署模式为 Deployment，所以我们提前为其创建一个 grafana-pvc.yaml 文件，加入下面 PVC 配置。

目录：manifests/grafana/grafana-pvc.yaml

完整内容如下：

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: grafana
  namespace: monitoring  #---指定namespace为monitoring
spec:
  storageClassName: openebs-hostpath #---指定StorageClass
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 5Gi

接着修改 grafana-deployment.yaml 文件设置持久化配置，应用上面的 PVC（目录：manifests/grafana/grafana-deployment.yaml）

修改内容如下：

      serviceAccountName: grafana
      volumes:
      - name: grafana-storage       # 新增持久化配置
        persistentVolumeClaim:
          claimName: grafana        # 设置为创建的PVC名称
#      - emptyDir: {}               # 注释旧的注释
#        name: grafana-storage
      - name: grafana-datasources
        secret:
          secretName: grafana-datasources

2.3 step3：修改 Service 端口设置

2.3.1 修改 Prometheus Service

修改prometheus Service端口类型为 NodePort，设置 NodePort 端口为 32101：

目录：manifests/prometheus/prometheus-service.yaml

修改 prometheus-service.yaml 文件：

apiVersion: v1
kind: Service
metadata:
  labels:
    prometheus: k8s
  name: prometheus-k8s
  namespace: monitoring
spec:
  type: NodePort
  ports:
  - name: web
    port: 9090
    targetPort: web
    nodePort: 32101
  selector:
    app: prometheus
    prometheus: k8s
  sessionAffinity: ClientIP

2.3.2 修改 Grafana Service

修改 garafana service 端口类型为 NodePort，设置 NodePort 端口为 32102

目录：manifests/grafana/grafana-service.yaml

修改 grafana-service.yaml 文件：

apiVersion: v1
kind: Service
metadata:
  labels:
    app: grafana
  name: grafana
  namespace: monitoring
spec:
  type: NodePort
  ports:
  - name: http
    port: 3000
    targetPort: http
    nodePort: 32102
  selector:
    app: grafana

2.4 step4：安装promethues-operator

注意：以下操作均在manifest目录下操作！

cd kube-prometheus-0.4.0/manifests

开始安装 Operator：

kubectl apply -f setup/

查看 Pod，等 pod 创建起来在进行下一步：

kubectl get pods -n monitoring

接下来安装其它组件：

kubectl apply -f adapter/
kubectl apply -f alertmanager/
kubectl apply -f node-exporter/
kubectl apply -f kube-state-metrics/
kubectl apply -f grafana/
kubectl apply -f prometheus/
kubectl apply -f serviceMonitor/

查看 Pod 状态，等待所有状态均为Running：

kubectl get pods -n monitoring

2.5 step5：验证

打开地址：http://10.194.188.101:32101/targets，看看各个服务状态有没有问题：

可以看到已经监控上了很多指标数据了，上面我们可以看到 Prometheus 是两个副本，我们这里通过 Service 去访问，按正常来说请求是会去轮询访问后端的两个 Prometheus 实例的，但实际上我们这里访问的时候始终是路由到后端的一个实例上去，因为这里的 Service 在创建的时候添加了 SessionAffinity:ClientIP 这样的属性，会根据 ClientIP 来做 Session 亲和性，所以我们不用担心请求会到不同的副本上去。

有可能grafna会没有启动成功，提示如下：

执行如下语句去删除pod，然后会自动创建pod：

 kubectl delete pod/grafana-689d64c6db-wcvq2 -nmonitoring

打开地址：http://10.194.188.101:32102，默认账号密码为admin：

可以看到所有组件都安装成功了！

03 文末

至此，我们在k8s上安装kube-promethues已经成功了！

参考资料：https://bbs.huaweicloud.com/blogs/303137

k8s安装kube-promethues（超详细）