Prometheus监控k8s

本文涉及的产品
EMR Serverless StarRocks,5000CU*H 48000GB*H
可观测监控 Prometheus 版,每月50GB免费额度
容器服务 Serverless 版 ACK Serverless,317元额度 多规格
简介: Prometheus监控k8s

监控k8s节点

安装node_exporter,我这边是采用的DaemonSet来确保每个节点都有对应的pod

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: node-exporter
  namespace: monitor
  labels:
    k8s-app: node-exporter
spec:
  selector:
    matchLabels:
      k8s-app: node-exporter
  template:
    metadata:
      labels:
        k8s-app: node-exporter
    spec:
      hostPID: true     #这几项是定义了该pod直接共享node的资源,这样也不需要用svc来暴露端口了
      hostIPC: true
      hostNetwork: true
      containers:
      - image: bitnami/node-exporter:latest
        args: 
        - --web.listen-address=$(HOSTIP):9100
        - --path.procfs=/host/proc
        - --path.sysfs=/host/sys
        - --path.rootfs=/host/root
        - --collector.filesystem.ignored-mount-points=^/(dev|proc|sys|var/lib/docker/.+)($|/)
        - --collector.filesystem.ignored-fs-types=^(autofs|binfmt_misc|cgroup|configfs|debugfs|devpts|devtmpfs|fusectl|hugetlbfs|mqueue|overlay|proc|procfs|pstore|rpc_pipefs|securityfs|sysfs|tracefs)$
        name: node-exporter
        env:
        - name: HOSTIP
          valueFrom:
            fieldRef:
              fieldPath: status.hostIP
        resources: 
          requests: 
            cpu: 150m
            memory: 180Mi
          limits:
            cpu: 150m
            memory: 180Mi
        securityContext:
          runAsNonRoot: true
          runAsUser: 65534
        volumeMounts:
        - name: proc    # 我对应的卷都是用的hostPath,直接将宿主机卷挂给pod避免pod无法正常获取node信息
          mountPath: /host/proc
        - name: sys
          mountPath: /host/sys
        - name: root
          mountPath: /host/root
          mountPropagation: HostToContainer
          readOnly: true
        ports:
        - containerPort: 9100
          protocol: TCP
          name: http
      tolerations:  #这里是为了让pod能在master上运行,加了容忍度
      - key: node-role.kubernetes.io/control-plane  
        operator: Exists
        effect: NoSchedule
      volumes: 
      - name: proc
        hostPath:
          path: /proc
      - name: dev
        hostPath:
          path: /dev
      - name: sys
        hostPath:
          path: /sys
      - name: root
        hostPath:
          path: /
root@master1:~/k8s-prometheus# kubectl apply -f node-export.yaml 
root@master1:~/k8s-prometheus# kubectl get pod -n monitor 
NAME                            READY   STATUS    RESTARTS      AGE
grafana-core-5c68549dc7-t92fv   1/1     Running   0             2d17h
node-exporter-6fqvt             1/1     Running   0             2d19h
node-exporter-bgrjn             1/1     Running   0             2d19h
node-exporter-bk7m4             1/1     Running   0             2d19h
node-exporter-m7wgx             1/1     Running   0             2d19h
node-exporter-mbgtg             1/1     Running   0             2d19h
node-exporter-rdtcs             1/1     Running   0             2d19h
prometheus-7d659686d7-x62vt     1/1     Running   0             2d16h
root@master1:~/k8s-prometheus# ss -lntp |grep node_exporter
LISTEN    0         4096          10.10.21.170:9100             0.0.0.0:*        users:(("node_exporter",pid=2597275,fd=3))   
#可以看到它直接使用了宿主机的9100端口

监控coreDns服务

coredns自身提供了/metrics接口,我们直接配置prometheus去它的9153拿数据即可

root@master1:~/k8s-prometheus# kubectl -n kube-system get svc 
NAME       TYPE        CLUSTER-IP    EXTERNAL-IP   PORT(S)                  AGE
kube-dns   ClusterIP   10.100.0.10   <none>        53/UDP,53/TCP,9153/TCP   22d
root@master1:~/k8s-prometheus# kubectl -n kube-system get cm
NAME                                 DATA   AGE
coredns                              1      22d
extension-apiserver-authentication   6      22d
kube-proxy                           2      22d
kube-root-ca.crt                     1      22d
kubeadm-config                       1      22d
kubelet-config                       1      22d
root@master1:~/k8s-prometheus# kubectl -n kube-system get cm coredns -o yaml
apiVersion: v1
data:
  Corefile: |
    .:53 {
        errors
        health {
           lameduck 5s
        }
        ready
        kubernetes cluster.local in-addr.arpa ip6.arpa {
           pods insecure
           fallthrough in-addr.arpa ip6.arpa
           ttl 30
        }
        prometheus :9153
        forward . /etc/resolv.conf {
           max_concurrent 1000
        }
        cache 30
        loop
        reload
        loadbalance
    }
kind: ConfigMap
metadata:
  creationTimestamp: "2022-10-11T10:23:37Z"
  name: coredns
  namespace: kube-system
  resourceVersion: "228"
  uid: 632b41d8-6dda-40b7-81c1-b6cffefaf2e8

监控Ingress-nginx

Ingress-nginx自身默认也提供了/metrics接口,但是需要我们先给他暴露10254端口,暴露之后我们直接配置prometheus去它的9153拿数据即可

暴露Ingress端口有多种方法,比如把对应deployment的网络模式改为hostNetwork,直接将node的10254给到Ingress

或者先在deployment中将端口加上,再去Ingress对应的svc里面将10254端口代理一下

这边我用的是第一种方式

root@master1:~# kubectl -n ingress-nginx edit deployments.apps ingress-nginx-controller 

需要改的地方有两处

root@master1:~# kubectl -n ingress-nginx get deployments.apps ingress-nginx-controller -o yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
    deployment.kubernetes.io/revision: "8"
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"apps/v1","kind":"Deployment","metadata":{"annotations":{},"labels":{"app.kubernetes.io/component":"controller","app.kubernetes.io/instance":"ingress-nginx","app.kubernetes.io/managed-by":"Helm","app.kubernetes.io/name":"ingress-nginx","app.kubernetes.io/version":"1.1.1","helm.sh/chart":"ingress-nginx-4.0.15"},"name":"ingress-nginx-controller","namespace":"ingress-nginx"},"spec":{"minReadySeconds":0,"revisionHistoryLimit":10,"selector":{"matchLabels":{"app.kubernetes.io/component":"controller","app.kubernetes.io/instance":"ingress-nginx","app.kubernetes.io/name":"ingress-nginx"}},"strategy":{"rollingUpdate":{"maxUnavailable":1},"type":"RollingUpdate"},"template":{"metadata":{"labels":{"app.kubernetes.io/component":"controller","app.kubernetes.io/instance":"ingress-nginx","app.kubernetes.io/name":"ingress-nginx"}},"spec":{"containers":[{"args":["/nginx-ingress-controller","--election-id=ingress-controller-leader","--controller-class=k8s.io/ingress-nginx","--configmap=$(POD_NAMESPACE)/ingress-nginx-controller","--validating-webhook=:8443","--validating-webhook-certificate=/usr/local/certificates/cert","--validating-webhook-key=/usr/local/certificates/key","--watch-ingress-without-class=true","--publish-status-address=localhost"],"env":[{"name":"POD_NAME","valueFrom":{"fieldRef":{"fieldPath":"metadata.name"}}},{"name":"POD_NAMESPACE","valueFrom":{"fieldRef":{"fieldPath":"metadata.namespace"}}},{"name":"LD_PRELOAD","value":"/usr/local/lib/libmimalloc.so"}],"image":"registry.cn-qingdao.aliyuncs.com/kubernetes_xingej/nginx-ingress-controller:v1.1.1","imagePullPolicy":"IfNotPresent","lifecycle":{"preStop":{"exec":{"command":["/wait-shutdown"]}}},"livenessProbe":{"failureThreshold":5,"httpGet":{"path":"/healthz","port":10254,"scheme":"HTTP"},"initialDelaySeconds":10,"periodSeconds":10,"successThreshold":1,"timeoutSeconds":1},"name":"controller","ports":[{"containerPort":80,"hostPort":80,"name":"http","protocol":"TCP"},{"containerPort":443,"hostPort":443,"name":"https","protocol":"TCP"},{"containerPort":8443,"name":"webhook","protocol":"TCP"}],"readinessProbe":{"failureThreshold":3,"httpGet":{"path":"/healthz","port":10254,"scheme":"HTTP"},"initialDelaySeconds":10,"periodSeconds":10,"successThreshold":1,"timeoutSeconds":1},"resources":{"requests":{"cpu":"100m","memory":"90Mi"}},"securityContext":{"allowPrivilegeEscalation":true,"capabilities":{"add":["NET_BIND_SERVICE"],"drop":["ALL"]},"runAsUser":101},"volumeMounts":[{"mountPath":"/usr/local/certificates/","name":"webhook-cert","readOnly":true}]}],"dnsPolicy":"ClusterFirst","nodeSelector":{"ingress-ready":"true","kubernetes.io/os":"linux"},"serviceAccountName":"ingress-nginx","terminationGracePeriodSeconds":0,"tolerations":[{"effect":"NoSchedule","key":"node-role.kubernetes.io/master","operator":"Equal"}],"volumes":[{"name":"webhook-cert","secret":{"secretName":"ingress-nginx-admission"}}]}}}}
  creationTimestamp: "2022-10-31T08:30:33Z"
  generation: 8
  labels:
    app.kubernetes.io/component: controller
    app.kubernetes.io/instance: ingress-nginx
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: ingress-nginx
    app.kubernetes.io/version: 1.1.1
    helm.sh/chart: ingress-nginx-4.0.15
  name: ingress-nginx-controller
  namespace: ingress-nginx
  resourceVersion: "3780917"
  uid: e4b28ba0-eb0b-45e3-a0b8-c03797c8f416
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app.kubernetes.io/component: controller
      app.kubernetes.io/instance: ingress-nginx
      app.kubernetes.io/name: ingress-nginx
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 1
    type: RollingUpdate
  template:
    metadata:
      creationTimestamp: null
      labels:
        app.kubernetes.io/component: controller
        app.kubernetes.io/instance: ingress-nginx
        app.kubernetes.io/name: ingress-nginx
    spec:
      containers:
      - args:
        - /nginx-ingress-controller
        - --election-id=ingress-controller-leader
        - --controller-class=k8s.io/ingress-nginx
        - --configmap=$(POD_NAMESPACE)/ingress-nginx-controller
        - --validating-webhook=:8443
        - --validating-webhook-certificate=/usr/local/certificates/cert
        - --validating-webhook-key=/usr/local/certificates/key
        - --watch-ingress-without-class=true
        - --publish-status-address=localhost
        env:
        - name: POD_NAME
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: metadata.name
        - name: POD_NAMESPACE
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
        - name: LD_PRELOAD
          value: /usr/local/lib/libmimalloc.so
        image: bitnami/nginx-ingress-controller:1.1.1
        imagePullPolicy: IfNotPresent
        lifecycle:
          preStop:
            exec:
              command:
              - /wait-shutdown
        livenessProbe:
          failureThreshold: 5
          httpGet:
            path: /healthz
            port: 10254
            scheme: HTTP
          initialDelaySeconds: 10
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1
        name: controller
        ports:
        - containerPort: 10254   #这两行是将10254暴露出来
          hostPort: 10254
          name: prometheus
          protocol: TCP
        - containerPort: 80
          hostPort: 80
          name: http
          protocol: TCP
        - containerPort: 443
          hostPort: 443
          name: https
          protocol: TCP
        - containerPort: 8443
          hostPort: 8443
          name: webhook
          protocol: TCP
        readinessProbe:
          failureThreshold: 3
          httpGet:
            path: /healthz
            port: 10254
            scheme: HTTP
          initialDelaySeconds: 10
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1
        resources:
          requests:
            cpu: 100m
            memory: 90Mi
        securityContext:
          allowPrivilegeEscalation: true
          capabilities:
            add:
            - NET_BIND_SERVICE
            drop:
            - ALL
          runAsUser: 101
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /usr/local/certificates/
          name: webhook-cert
          readOnly: true
      dnsPolicy: ClusterFirst
      hostNetwork: true   # 这里是声明直接将宿主机的端口给Ingress使用
      nodeSelector:
        ingress-ready: "true"
        kubernetes.io/os: linux
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      serviceAccount: ingress-nginx
      serviceAccountName: ingress-nginx
      terminationGracePeriodSeconds: 0
      tolerations:
      - effect: NoSchedule
        key: node-role.kubernetes.io/master
        operator: Equal
      volumes:
      - name: webhook-cert
        secret:
          defaultMode: 420
          secretName: ingress-nginx-admission
status:
  availableReplicas: 1
  conditions:
  - lastTransitionTime: "2022-10-31T08:30:33Z"
    lastUpdateTime: "2022-10-31T08:30:33Z"
    message: Deployment has minimum availability.
    reason: MinimumReplicasAvailable
    status: "True"
    type: Available
  - lastTransitionTime: "2022-10-31T08:46:21Z"
    lastUpdateTime: "2022-11-03T10:25:05Z"
    message: ReplicaSet "ingress-nginx-controller-87bc8b9c7" has successfully progressed.
    reason: NewReplicaSetAvailable
    status: "True"
    type: Progressing
  observedGeneration: 8
  readyReplicas: 1
  replicas: 1
  updatedReplicas: 1

这时候可以检查一下宿主机的端口是否被ingress占用

root@node1:~# ss -lntp |grep nginx-ingress
LISTEN    0         4096             127.0.0.1:10245            0.0.0.0:*        users:(("nginx-ingress-c",pid=1398881,fd=8))

监控k8s-state-metrics

root@master1:~# git clone https://github.com/kubernetes/kube-state-metrics.git
root@master1:~# cd kube-state-metrics/examples/standard/
root@master1:~/kube-state-metrics/examples/standard# ls
cluster-role-binding.yaml  cluster-role.yaml  deployment.yaml  service-account.yaml  service.yaml
# 克隆下来之后直接有几个yaml文件,稍微修改一下就可以直接用了

这边改一下service.yaml

root@master1:~/kube-state-metrics/examples/standard# cat service.yaml
apiVersion: v1
kind: Service
metadata:
  labels:
    app.kubernetes.io/component: exporter
    app.kubernetes.io/name: kube-state-metrics
    app.kubernetes.io/version: 2.6.0
  name: kube-state-metrics
  namespace: kube-system
spec:
  type: ClusterIP   # 刚clone下来是无头服务,我这边改为了ClusterIP
  ports:
  - name: http-metrics
    port: 8080
    targetPort: http-metrics
  - name: telemetry
    port: 8081
    targetPort: telemetry
  selector:
    app.kubernetes.io/name: kube-state-metrics

然后apply一下并进行检查

root@master1:~/kube-state-metrics/examples/standard# kubectl apply -f .   #.不要掉了,指的apply当前目录下的yaml
root@master1:~/kube-state-metrics/examples/standard# kubectl get svc,pod -n kube-system 
NAME                         TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                  AGE
service/kube-dns             ClusterIP   10.100.0.10      <none>        53/UDP,53/TCP,9153/TCP   23d
service/kube-state-metrics   ClusterIP   10.100.162.224   <none>        8080/TCP,8081/TCP        91m  #这就是刚刚生成的svc
NAME                                         READY   STATUS    RESTARTS       AGE
pod/coredns-c676cc86f-7bd45                  1/1     Running   5 (2d2h ago)   22d
pod/coredns-c676cc86f-7l2c8                  1/1     Running   0              22d
pod/coredns-c676cc86f-gl6qc                  1/1     Running   0              22d
pod/coredns-c676cc86f-rxcjn                  1/1     Running   0              22d
pod/kube-apiserver-master1.org               1/1     Running   6 (2d2h ago)   17d
pod/kube-apiserver-master2.org               1/1     Running   0              17d
pod/kube-apiserver-master3.org               1/1     Running   0              17d
pod/kube-controller-manager-master1.org      1/1     Running   7 (2d2h ago)   23d
pod/kube-controller-manager-master2.org      1/1     Running   0              17d
pod/kube-controller-manager-master3.org      1/1     Running   0              17d
pod/kube-proxy-5pnh5                         1/1     Running   0              21d
pod/kube-proxy-vws54                         1/1     Running   0              21d
pod/kube-proxy-wlknz                         1/1     Running   0              21d
pod/kube-proxy-wtnnf                         1/1     Running   5 (2d2h ago)   21d
pod/kube-proxy-xkkhp                         1/1     Running   0              21d
pod/kube-proxy-zf4vg                         1/1     Running   0              21d
pod/kube-scheduler-master1.org               1/1     Running   7 (2d2h ago)   23d
pod/kube-scheduler-master2.org               1/1     Running   0              22d
pod/kube-scheduler-master3.org               1/1     Running   0              22d
pod/kube-state-metrics-757ff7b448-lwwlg      1/1     Running   0              90m   #这个就是刚刚生成的pod

验证一下metrics是否有数据

root@node1:~# curl  -I  10.100.162.224:8080/metrics
HTTP/1.1 200 OK
Content-Type: text/plain; version=0.0.4
Date: Thu, 03 Nov 2022 16:25:21 GMT

修改Prometheus的configmap并重载Prometheus

root@master1:~/k8s-prometheus# cat prometheus-configmap.yaml 
apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-config
  namespace: monitor
data:
  prometheus.yml: |
    global:
      scrape_interval: 15s
      scrape_timeout: 15s
    scrape_configs:
    - job_name: 'prometheus'
      static_configs:
      - targets: ['localhost:9090']
    - job_name: 'Linux Server'
      static_configs:
      - targets:      
        - 'xxx.xxx.xxx.xxx:9100'
        - 'xxx.xxx.xxx.xxx:9100'
        - 'xxx.xxx.xxx.xxx:9100'
        - 'xxx.xxx.xxx.xxx:9100'
        - 'xxx.xxx.xxx.xxx:9100'
        - 'xxx.xxx.xxx.xxx:9100'
    - job_name: 'coredns'
      static_configs: #我这边基本上都是用IP写的target,实际上也可以用域名,默认的域名书写方式为 kube-dns.kube-system.svc.cluster.local
      - targets: ['xxx.xxx.xxx.xxx:9153']
    - job_name: 'ingress-nginx'
      static_configs:
        - targets: ['xxx.xxx.xxx.xxx:10254']
    - job_name: 'k8s-state-metrics'
      static_configs:
      - targets:
        - '10.100.162.224:8080'
        - '10.100.162.224:8081'
root@master1:~/k8s-prometheus# kubectl apply -f prometheus-configmap.yaml
root@master1:~/k8s-prometheus# curl -XPOST 10.100.122.13:9090/-/reload

查看是否生效

9d012a09e9094973bc3bd39cecd23912.png

Grafana展示监控的数据

基本上大部分需要import的都可以直接去Grafana官网搜索,搜索完成之后可以copy 对应id,也可以下载下来

下面演示导入ID的操作(如果是导入json的话,只需要复制json文件粘贴到对应的框里面即可)

29219bc9a0b34241b23d0743dce9bb9a.png

2c7e7020fda243cca09083fef6bde390.png

节点基础信息监控

这个dashboard导入的是9276

b8cc322c886e4735872fa3bbcb5aec1a.png

coredns监控

这里导入的是5926,效果如下

3c7dea1952c240119bacb78a364fd822.png

Ingress监控

这里导入的是9614,效果如下

2ac233be1548484f86a42ea2abeb9d91.png

k8s信息监控

这个dashboard导入的是13105

a42fd147203f4f9eac252a4f6a65e445.png

相关实践学习
通过Ingress进行灰度发布
本场景您将运行一个简单的应用,部署一个新的应用用于新的发布,并通过Ingress能力实现灰度发布。
容器应用与集群管理
欢迎来到《容器应用与集群管理》课程,本课程是“云原生容器Clouder认证“系列中的第二阶段。课程将向您介绍与容器集群相关的概念和技术,这些概念和技术可以帮助您了解阿里云容器服务ACK/ACK Serverless的使用。同时,本课程也会向您介绍可以采取的工具、方法和可操作步骤,以帮助您了解如何基于容器服务ACK Serverless构建和管理企业级应用。 学习完本课程后,您将能够: 掌握容器集群、容器编排的基本概念 掌握Kubernetes的基础概念及核心思想 掌握阿里云容器服务ACK/ACK Serverless概念及使用方法 基于容器服务ACK Serverless搭建和管理企业级网站应用
目录
相关文章
|
20天前
|
Prometheus Kubernetes 监控
Prometheus 与 Kubernetes 的集成
【8月更文第29天】随着容器化应用的普及,Kubernetes 成为了管理这些应用的首选平台。为了有效地监控 Kubernetes 集群及其上的应用,Prometheus 提供了一个强大的监控解决方案。本文将详细介绍如何在 Kubernetes 集群中部署和配置 Prometheus,以便对容器化应用进行有效的监控。
35 1
|
1月前
|
Kubernetes 监控 Cloud Native
"解锁K8s新姿势!Cobra+Client-go强强联手,打造你的专属K8s监控神器,让资源优化与性能监控尽在掌握!"
【8月更文挑战第14天】在云原生领域,Kubernetes以出色的扩展性和定制化能力引领潮流。面对独特需求,自定义插件成为必要。本文通过Cobra与Client-go两大利器,打造一款监测特定标签Pods资源使用的K8s插件。Cobra简化CLI开发,Client-go则负责与K8s API交互。从初始化项目到实现查询逻辑,一步步引导你构建个性化工具,开启K8s集群智能化管理之旅。
35 2
|
1月前
|
Prometheus Kubernetes 监控
Kubernetes(K8S) 监控 Prometheus + Grafana
Kubernetes(K8S) 监控 Prometheus + Grafana
121 2
|
14天前
|
运维 Kubernetes 监控
Loki+Promtail+Grafana监控K8s日志
综上,Loki+Promtail+Grafana 监控组合对于在 K8s 环境中优化日志管理至关重要,它不仅提供了强大且易于扩展的日志收集与汇总工具,还有可视化这些日志的能力。通过有效地使用这套工具,可以显著地提高对应用的运维监控能力和故障诊断效率。
31 0
|
1月前
|
Prometheus Kubernetes Cloud Native
使用prometheus来避免Kubernetes CPU Limits造成的事故
使用prometheus来避免Kubernetes CPU Limits造成的事故
42 7
|
1月前
|
人工智能 运维 Kubernetes
智能化运维:KoPylot为k8S带来AI监控诊断
智能化运维:KoPylot为k8S带来AI监控诊断
|
30天前
|
Prometheus 监控 Kubernetes
在k8S中,状态码监控是怎么做的?
在k8S中,状态码监控是怎么做的?
|
30天前
|
Prometheus 监控 Kubernetes
在k8S中,blackbox主要是监控什么的?
在k8S中,blackbox主要是监控什么的?
|
30天前
|
Prometheus Kubernetes 监控
在k8S中,etcd是怎么监控的?
在k8S中,etcd是怎么监控的?