一文搞懂基于 Helm 部署 Prometheus Stack 全家桶

本文涉及的产品
容器服务 Serverless 版 ACK Serverless,317元额度 多规格
容器服务 Serverless 版 ACK Serverless,952元额度 多规格
可观测监控 Prometheus 版,每月50GB免费额度
简介: Hello folks! 今天我们介绍一下基于 Helm 快速部署安装 Prometheus Stack 的文章,在本文中,我们将讨论 Prometheus 和 Grafana,以及如何使用 Helm 图表为任何 Kubernetes 集群设置监控。我们还将学习如何将 Prometheus 和 Grafana 连接在一起,并在 Grafana 上设置一个基本的仪表板来监控 Kubernetes 集群上的资源。

    Hello folks!  今天我们介绍一下基于 Helm 快速部署安装 Prometheus Stack 的文章,在本文中,我们将讨论 Prometheus 和 Grafana,以及如何使用 Helm 图表为任何 Kubernetes 集群设置监控。我们还将学习如何将 Prometheus 和 Grafana 连接在一起,并在 Grafana 上设置一个基本的仪表板来监控 Kubernetes 集群上的资源。

    在进入本文正题之前,我们先简单介绍一下 Prometheus Stack 相关生态概念。


Prometheus Stack 概述


    Prometheus Stack,通常指的是 Prometheus 和 Grafana 以及相关关联集成组件的统称。在实际的业务场景中,Prometheus 和 Grafana 往往都是协同工作进行监控渲染:Prometheus 负责作为数据源获取数据,并将该数据提供给 Grafana,Grafana 则用于借助其吸引力的仪表板进行可视化数据展示。

    基于 Prometheus,

    1、Prometheus 是一个开源系统监控和警报工具包;

    2、Prometheus 收集指标并将其存储为时间序列数据,它为 Kubernetes 等容器编排平台提供开箱即用的监控功能。

    基于 Grafana,

    1、Grafana 是一个多平台开源分析和交互式可视化 Web 应用程序;

    2、当连接到支持的数据服务时,Grafana 会为网络提供图表、图形和警报;

    3、Grafana 允许我们查询、可视化、提醒和理解我们的指标,无论它们存储在何处。除了Prometheus 之外,一些受支持的数据源还有 AWS Cloud Watch、Azure Monitor、PostgreSQL、 Elasticsearch 等等;

    4、除此之外,我们可以基于实际的场景需求创建自己的仪表板或使用 Grafana 提供的现有仪表板定义个性化仪表板等。

    2018 年 8 月 CNCF 毕业,Prometheus 作为下一代开源解决方案,其建设思路与 Google SRE 理念不谋而合。我们来看一下其架构图:

    基于上述架构组件图所示,我们对核心的组件进行简要的解析,具体如下:

    1、Prometheus Server:存储和抓取时间序列数据的主要服务器。

    2、TSDB(时间序列数据库):指标是任何系统了解其健康状况和运行状态的关键方面。任何系统的设计都需要收集、存储和报告指标,以提供系统的脉搏。数据存储在一系列时间间隔内,需要一个高效的数据库来存储和检索这些数据。OpenTSDB 时序数据库就是这样一种可以满足这种需求的时序数据库。

    3、PromQL: Prometheus 以 PromQL 的形式定义了一种丰富的查询语言,用于从时序数据库中查询数据。

    4、Pushgateway:可用于支持短期工作。

    5、导出器:它们用于将指标数据提升到普罗米修斯服务器。

    6、Alertmanager:用于将通知发送到各种通信渠道,如 Slack、Email 以通知用户。

    基于 Kubernetes Cluster 环境下的生态组件交互架构图,如下所示:



Kube-Prometheus-Stack 解析


    Kube-Prometheus-Stack 仓库收集 Kubernetes 清单、Grafana 仪表板和 Prometheus 规则,结合相关文档和脚本,基于 Prometheus Operator 提供易于操作的端到端 Kubernetes 集群监控。

    此项目基于 jsonnet 编写,既可以被描述为一个包,也可以被描述为一个库。主要包含如下组件:

  • The Prometheus Operator
  • Highly available Prometheus
  • Highly available Alertmanager
  • Prometheus node-exporter
  • Prometheus Adapter for Kubernetes Metrics APIs
  • kube-state-metrics
  • Grafana

    kube-prometheus-stack 主要用于集群监控,因此它被预先配置为从所有 Kubernetes 组件收集指标。除此之外,它还提供一组默认的仪表板和警报规则,它提供可组合的 jsonnet 作为库,供用户根据自己的需要进行定制。

    在进行部署之前,我们需要评估当前 Kubernetes 集群环境与 Kube-Prometheus-Stack 组件的版本兼容性,具体可参考如下:

kube-prometheus stack Kubernetes 1.20 Kubernetes 1.21 Kubernetes 1.22 Kubernetes 1.23 Kubernetes 1.24
release-0.8

release-0.9




release-0.10




release-0.11




main



部署安装


    本次的 Kube-Prometheus-Stack 全家桶组件,我们主要基于 Helm 进行快速、高效部署。在部署之前我们需要先部署 Kubernetes 集群环境,这里主要基于 Minikube 部署单机版环境,具体如下所示:


[leonli@Leon k8s ] % kubectl get po -A -o wide
NAMESPACE     NAME                                  READY   STATUS    RESTARTS   AGE   IP             NODE          NOMINATED NODE   READINESS GATES
kube-system   coredns-64897985d-v9jcf               1/1     Running   0          38s   172.17.0.2     k8s-cluster   <none>           <none>
kube-system   etcd-k8s-cluster                      1/1     Running   0          51s   192.168.49.2   k8s-cluster   <none>           <none>
kube-system   kube-apiserver-k8s-cluster            1/1     Running   0          51s   192.168.49.2   k8s-cluster   <none>           <none>
kube-system   kube-controller-manager-k8s-cluster   1/1     Running   0          51s   192.168.49.2   k8s-cluster   <none>           <none>
kube-system   kube-proxy-rpvg8                      1/1     Running   0          38s   192.168.49.2   k8s-cluster   <none>           <none>
kube-system   kube-scheduler-k8s-cluster            1/1     Running   0          51s   192.168.49.2   k8s-cluster   <none>           <none>
kube-system   storage-provisioner                   1/1     Running   0          50s   192.168.49.2   k8s-cluster   <none>           <none>

    接下来,我们来安装 Helm 组件,为了保证后续的全家桶正确部署,需要基于当前的 Kubernetes 集群环境部署兼容的 Helm 组件,如下为 Helm 组件版本与 Kubernetes 版本的对应关系:

Helm Version Supported Kubernetes Versions
3.9.x 1.24.x - 1.21.x
3.8.x 1.23.x - 1.20.x
3.7.x 1.22.x - 1.19.x
3.6.x 1.21.x - 1.18.x
3.5.x 1.20.x - 1.17.x
3.4.x 1.19.x - 1.16.x
... ...

    我们 Check 一下所准备的环境信息,具体如下所示:


[leonli@Leon kube-prometheus ] % helm version
version.BuildInfo{Version:"v3.8.1", GitCommit:"5cb9af4b1b271d11d7a97a71df3ac337dd94ad37", GitTreeState:"clean", GoVersion:"go1.17.8"}
[leonli@Leon kube-prometheus ] % kubectl version
Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.4", GitCommit:"e6c093d87ea4cbb530a7b2ae91e54c0842d8308a", GitTreeState:"clean", BuildDate:"2022-02-16T12:30:48Z", GoVersion:"go1.17.6", Compiler:"gc", Platform:"darwin/arm64"}
Server Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.3", GitCommit:"816c97ab8cff8a1c72eccca1026f7820e93e0d25", GitTreeState:"clean", BuildDate:"2022-01-25T21:19:12Z", GoVersion:"go1.17.6", Compiler:"gc", Platform:"linux/arm64"}

    接下来,我们正式进入安装 kube-prometheus-stack 环节,具体如下所示:


[leonli@Leon minikube ] % kubectl create ns monitoring
namespace/monitoring created
[leonli@Leon minikube ] % helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
"prometheus-community" has been added to your repositories
[leonli@Leon minikube ] % helm repo update
Hang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "traefik" chart repository
...Successfully got an update from the "komodorio" chart repository
...Successfully got an update from the "traefik-hub" chart repository
...Successfully got an update from the "prometheus-community" chart repository
Update Complete. ⎈Happy Helming!⎈
[leonli@Leon minikube ] % helm install prometheus-community/kube-prometheus-stack --namespace monitoring --generate-name 
Error: INSTALLATION FAILED: failed to download "prometheus-community/kube-prometheus-stack"

    在通过 “ helm install ” 进行操作时可能因网络原因导致安装失败,我们可以尝试多试几次或者直接从 GitHub 上下载文件进行,具体如下所示


[leonli@Leon minikube ] % git clone https://github.com/prometheus-operator/kube-prometheus.git -b release-0.10
Cloning into 'kube-prometheus'...
remote: Enumerating objects: 17291, done.
remote: Counting objects: 100% (197/197), done.
remote: Compressing objects: 100% (99/99), done.
remote: Total 17291 (delta 126), reused 146 (delta 91), pack-reused 17094
Receiving objects: 100% (17291/17291), 9.18 MiB | 6.19 MiB/s, done.
Resolving deltas: 100% (11319/11319), done.

    此时,进入  kube-prometheus 目录下,安装 manifest/setup 目录下的所有 yaml 文件,具体如下:


[leonli@Leon kube-prometheus ] % kubectl apply --server-side -f manifests/setup --force-conflicts                                 
customresourcedefinition.apiextensions.k8s.io/alertmanagerconfigs.monitoring.coreos.com serverside-applied
customresourcedefinition.apiextensions.k8s.io/alertmanagers.monitoring.coreos.com serverside-applied
customresourcedefinition.apiextensions.k8s.io/podmonitors.monitoring.coreos.com serverside-applied
customresourcedefinition.apiextensions.k8s.io/probes.monitoring.coreos.com serverside-applied
customresourcedefinition.apiextensions.k8s.io/prometheuses.monitoring.coreos.com serverside-applied
customresourcedefinition.apiextensions.k8s.io/prometheusrules.monitoring.coreos.com serverside-applied
customresourcedefinition.apiextensions.k8s.io/servicemonitors.monitoring.coreos.com serverside-applied
customresourcedefinition.apiextensions.k8s.io/thanosrulers.monitoring.coreos.com serverside-applied
namespace/monitoring serverside-applied

    kube-prometheus 默认安装在 monitoring 命名空间中,有些镜像在国外,故此安装过程是非常缓慢的,有时会因为网络原因拉取不到而安装失败。


[leonli@Leon kube-prometheus ] % cd manifests/setup
[leonli@Leon setup ] % ls -l
total 3040
-rw-r--r--  1 leonli  admin  169131 Dec  2 14:53 0alertmanagerConfigCustomResourceDefinition.yaml
-rw-r--r--  1 leonli  admin  377495 Dec  2 14:53 0alertmanagerCustomResourceDefinition.yaml
-rw-r--r--  1 leonli  admin   30361 Dec  2 14:53 0podmonitorCustomResourceDefinition.yaml
-rw-r--r--  1 leonli  admin   31477 Dec  2 14:53 0probeCustomResourceDefinition.yaml
-rw-r--r--  1 leonli  admin  502646 Dec  2 14:53 0prometheusCustomResourceDefinition.yaml
-rw-r--r--  1 leonli  admin    4101 Dec  2 14:53 0prometheusruleCustomResourceDefinition.yaml
-rw-r--r--  1 leonli  admin   31881 Dec  2 14:53 0servicemonitorCustomResourceDefinition.yaml
-rw-r--r--  1 leonli  admin  385790 Dec  2 14:53 0thanosrulerCustomResourceDefinition.yaml
-rw-r--r--  1 leonli  admin      60 Dec  2 14:53 namespace.yaml
[leonli@Leon setup ] % until kubectl get servicemonitors --all-namespaces ; do date; sleep 1; echo ""; done
[leonli@Leon setup ] % cd ../..
[leonli@Leon kube-prometheus ] % kubectl apply -f manifests/
alertmanager.monitoring.coreos.com/main created
poddisruptionbudget.policy/alertmanager-main created
prometheusrule.monitoring.coreos.com/alertmanager-main-rules created
secret/alertmanager-main created
service/alertmanager-main created
serviceaccount/alertmanager-main created
servicemonitor.monitoring.coreos.com/alertmanager-main created
clusterrole.rbac.authorization.k8s.io/blackbox-exporter created
clusterrolebinding.rbac.authorization.k8s.io/blackbox-exporter created
configmap/blackbox-exporter-configuration created
deployment.apps/blackbox-exporter created
service/blackbox-exporter created
serviceaccount/blackbox-exporter created
servicemonitor.monitoring.coreos.com/blackbox-exporter created
secret/grafana-config created
secret/grafana-datasources created
configmap/grafana-dashboard-alertmanager-overview created
configmap/grafana-dashboard-apiserver created
configmap/grafana-dashboard-cluster-total created
configmap/grafana-dashboard-controller-manager created
configmap/grafana-dashboard-k8s-resources-cluster created
configmap/grafana-dashboard-k8s-resources-namespace created
configmap/grafana-dashboard-k8s-resources-node created
configmap/grafana-dashboard-k8s-resources-pod created
configmap/grafana-dashboard-k8s-resources-workload created
configmap/grafana-dashboard-k8s-resources-workloads-namespace created
configmap/grafana-dashboard-kubelet created
configmap/grafana-dashboard-namespace-by-pod created
configmap/grafana-dashboard-namespace-by-workload created
configmap/grafana-dashboard-node-cluster-rsrc-use created
configmap/grafana-dashboard-node-rsrc-use created
configmap/grafana-dashboard-nodes created
configmap/grafana-dashboard-persistentvolumesusage created
configmap/grafana-dashboard-pod-total created
configmap/grafana-dashboard-prometheus-remote-write created
configmap/grafana-dashboard-prometheus created
configmap/grafana-dashboard-proxy created
configmap/grafana-dashboard-scheduler created
configmap/grafana-dashboard-workload-total created
configmap/grafana-dashboards created
deployment.apps/grafana created
service/grafana created
serviceaccount/grafana created
servicemonitor.monitoring.coreos.com/grafana created
prometheusrule.monitoring.coreos.com/kube-prometheus-rules created
clusterrole.rbac.authorization.k8s.io/kube-state-metrics created
clusterrolebinding.rbac.authorization.k8s.io/kube-state-metrics created
deployment.apps/kube-state-metrics created
prometheusrule.monitoring.coreos.com/kube-state-metrics-rules created
service/kube-state-metrics created
serviceaccount/kube-state-metrics created
servicemonitor.monitoring.coreos.com/kube-state-metrics created
prometheusrule.monitoring.coreos.com/kubernetes-monitoring-rules created
servicemonitor.monitoring.coreos.com/kube-apiserver created
servicemonitor.monitoring.coreos.com/coredns created
servicemonitor.monitoring.coreos.com/kube-controller-manager created
servicemonitor.monitoring.coreos.com/kube-scheduler created
servicemonitor.monitoring.coreos.com/kubelet created
clusterrole.rbac.authorization.k8s.io/node-exporter created
clusterrolebinding.rbac.authorization.k8s.io/node-exporter created
daemonset.apps/node-exporter created
prometheusrule.monitoring.coreos.com/node-exporter-rules created
service/node-exporter created
serviceaccount/node-exporter created
servicemonitor.monitoring.coreos.com/node-exporter created
clusterrole.rbac.authorization.k8s.io/prometheus-k8s created
clusterrolebinding.rbac.authorization.k8s.io/prometheus-k8s created
poddisruptionbudget.policy/prometheus-k8s created
prometheus.monitoring.coreos.com/k8s created
prometheusrule.monitoring.coreos.com/prometheus-k8s-prometheus-rules created
rolebinding.rbac.authorization.k8s.io/prometheus-k8s-config created
rolebinding.rbac.authorization.k8s.io/prometheus-k8s created
rolebinding.rbac.authorization.k8s.io/prometheus-k8s created
rolebinding.rbac.authorization.k8s.io/prometheus-k8s created
role.rbac.authorization.k8s.io/prometheus-k8s-config created
role.rbac.authorization.k8s.io/prometheus-k8s created
role.rbac.authorization.k8s.io/prometheus-k8s created
role.rbac.authorization.k8s.io/prometheus-k8s created
service/prometheus-k8s created
serviceaccount/prometheus-k8s created
servicemonitor.monitoring.coreos.com/prometheus-k8s created
apiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io created
clusterrole.rbac.authorization.k8s.io/prometheus-adapter created
clusterrole.rbac.authorization.k8s.io/system:aggregated-metrics-reader created
clusterrolebinding.rbac.authorization.k8s.io/prometheus-adapter created
clusterrolebinding.rbac.authorization.k8s.io/resource-metrics:system:auth-delegator created
clusterrole.rbac.authorization.k8s.io/resource-metrics-server-resources created
configmap/adapter-config created
deployment.apps/prometheus-adapter created
poddisruptionbudget.policy/prometheus-adapter created
rolebinding.rbac.authorization.k8s.io/resource-metrics-auth-reader created
service/prometheus-adapter created
serviceaccount/prometheus-adapter created
servicemonitor.monitoring.coreos.com/prometheus-adapter created
clusterrole.rbac.authorization.k8s.io/prometheus-operator created
clusterrolebinding.rbac.authorization.k8s.io/prometheus-operator created
deployment.apps/prometheus-operator created
prometheusrule.monitoring.coreos.com/prometheus-operator-rules created
service/prometheus-operator created
serviceaccount/prometheus-operator created
servicemonitor.monitoring.coreos.com/prometheus-operator created

    此时,查看所创建的资源进度情况,如下所示:


[leonli@Leon kube-prometheus ] % kubectl get all -n monitoring
NAME                                       READY   STATUS              RESTARTS   AGE
pod/blackbox-exporter-6b79c4588b-xxjkm     0/3     ContainerCreating   0          14s
pod/grafana-7fd69887fb-rn65j               0/1     ContainerCreating   0          14s
pod/kube-state-metrics-55f67795cd-xlxw6    0/3     ContainerCreating   0          13s
pod/node-exporter-kxdrw                    0/2     ContainerCreating   0          13s
pod/prometheus-adapter-5565cc8d76-6tnfr    0/1     ContainerCreating   0          13s
pod/prometheus-adapter-5565cc8d76-jkcwp    0/1     ContainerCreating   0          13s
pod/prometheus-operator-6dc9f66cb7-lkfdl   0/2     ContainerCreating   0          13s
NAME                          TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)              AGE
service/alertmanager-main     ClusterIP   10.101.159.117   <none>        9093/TCP,8080/TCP    14s
service/blackbox-exporter     ClusterIP   10.96.196.202    <none>        9115/TCP,19115/TCP   14s
service/grafana               ClusterIP   10.102.241.149   <none>        3000/TCP             14s
service/kube-state-metrics    ClusterIP   None             <none>        8443/TCP,9443/TCP    14s
service/node-exporter         ClusterIP   None             <none>        9100/TCP             13s
service/prometheus-adapter    ClusterIP   10.103.77.245    <none>        443/TCP              13s
service/prometheus-k8s        ClusterIP   10.107.208.95    <none>        9090/TCP,8080/TCP    13s
service/prometheus-operator   ClusterIP   None             <none>        8443/TCP             13s
NAME                           DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR            AGE
daemonset.apps/node-exporter   1         1         0       1            0           kubernetes.io/os=linux   13s
NAME                                  READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/blackbox-exporter     0/1     1            0           14s
deployment.apps/grafana               0/1     1            0           14s
deployment.apps/kube-state-metrics    0/1     1            0           14s
deployment.apps/prometheus-adapter    0/2     2            0           13s
deployment.apps/prometheus-operator   0/1     1            0           13s
NAME                                             DESIRED   CURRENT   READY   AGE
replicaset.apps/blackbox-exporter-6b79c4588b     1         1         0       14s
replicaset.apps/grafana-7fd69887fb               1         1         0       14s
replicaset.apps/kube-state-metrics-55f67795cd    1         1         0       14s
replicaset.apps/prometheus-adapter-5565cc8d76    2         2         0       13s
replicaset.apps/prometheus-operator-6dc9f66cb7   1         1         0       13s
[leonli@Leon manifests ] % kubectl get pods -n monitoring -o wide
NAME                                   READY   STATUS             RESTARTS   AGE     IP             NODE             NOMINATED NODE   READINESS GATES
alertmanager-main-0                    2/2     Running            0          13m     172.17.0.11    devops-cluster   <none>           <none>
alertmanager-main-1                    2/2     Running            0          13m     172.17.0.9     devops-cluster   <none>           <none>
alertmanager-main-2                    2/2     Running            0          13m     172.17.0.10    devops-cluster   <none>           <none>
blackbox-exporter-6b79c4588b-xxjkm     3/3     Running            0          14m     172.17.0.2     devops-cluster   <none>           <none>
grafana-7fd69887fb-rn65j               1/1     Running            0          14m     172.17.0.3     devops-cluster   <none>           <none>
kube-state-metrics-55f67795cd-xlxw6    2/3     ImagePullBackOff   0          14m     172.17.0.5     devops-cluster   <none>           <none>
kube-state-metrics-7ff75cff8b-qg57k    2/3     ImagePullBackOff   0          2m24s   172.17.0.7     devops-cluster   <none>           <none>
node-exporter-kxdrw                    2/2     Running            0          14m     192.168.49.2   devops-cluster   <none>           <none>
prometheus-adapter-5698bb779b-2p5mm    1/1     Running            0          5m17s   172.17.0.6     devops-cluster   <none>           <none>
prometheus-adapter-5698bb779b-2ptn4    1/1     Running            0          5m17s   172.17.0.12    devops-cluster   <none>           <none>
prometheus-k8s-0                       0/2     Pending            0          13m     <none>         <none>           <none>           <none>
prometheus-k8s-1                       0/2     Pending            0          13m     <none>         <none>           <none>           <none>
prometheus-operator-6dc9f66cb7-lkfdl   2/2     Running            0          14m     172.17.0.8     devops-cluster   <none>           <none>

    在 Pod 安装过程中,可能因为不同的原因导致其状态无法处于健康状态,我们可以通过命令行 kubecl describe pod [pod_name] -n [namespace] 去查看其安装详情,以定位到底是什么原因导致 Pod 状态无法正常运行。我们以 “Pending” 状态为例的 prometheus-k8s-0 为例,具体如下:


[leonli@Leon manifests ] % kubectl describe pod prometheus-k8s-0 -n monitoring
Name:           prometheus-k8s-0
Namespace:      monitoring
Priority:       0
Node:           <none>
Labels:         app.kubernetes.io/component=prometheus
                app.kubernetes.io/instance=k8s
                app.kubernetes.io/managed-by=prometheus-operator
                app.kubernetes.io/name=prometheus
                app.kubernetes.io/part-of=kube-prometheus
                app.kubernetes.io/version=2.32.1
                controller-revision-hash=prometheus-k8s-5f9554b8cd
                operator.prometheus.io/name=k8s
                operator.prometheus.io/shard=0
                prometheus=k8s
                statefulset.kubernetes.io/pod-name=prometheus-k8s-0
Annotations:    kubectl.kubernetes.io/default-container: prometheus
Status:         Pending
IP:             
IPs:            <none>
Controlled By:  StatefulSet/prometheus-k8s
Init Containers:
  init-config-reloader:
    Image:      quay.io/prometheus-operator/prometheus-config-reloader:v0.53.1
    Port:       8080/TCP
    Host Port:  0/TCP
    Command:
      /bin/prometheus-config-reloader
    Args:
      --watch-interval=0
      --listen-address=:8080
      --config-file=/etc/prometheus/config/prometheus.yaml.gz
      --config-envsubst-file=/etc/prometheus/config_out/prometheus.env.yaml
      --watched-dir=/etc/prometheus/rules/prometheus-k8s-rulefiles-0
    Limits:
      cpu:     100m
      memory:  50Mi
    Requests:
      cpu:     100m
      memory:  50Mi
    Environment:
      POD_NAME:  prometheus-k8s-0 (v1:metadata.name)
      SHARD:     0
    Mounts:
      /etc/prometheus/config from config (rw)
      /etc/prometheus/config_out from config-out (rw)
      /etc/prometheus/rules/prometheus-k8s-rulefiles-0 from prometheus-k8s-rulefiles-0 (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-nv7qs (ro)
Containers:
  prometheus:
    Image:      quay.io/prometheus/prometheus:v2.32.1
    Port:       9090/TCP
    Host Port:  0/TCP
    Args:
      --web.console.templates=/etc/prometheus/consoles
      --web.console.libraries=/etc/prometheus/console_libraries
      --config.file=/etc/prometheus/config_out/prometheus.env.yaml
      --storage.tsdb.path=/prometheus
      --storage.tsdb.retention.time=24h
      --web.enable-lifecycle
      --web.route-prefix=/
      --web.config.file=/etc/prometheus/web_config/web-config.yaml
    Requests:
      memory:     400Mi
    Readiness:    http-get http://:web/-/ready delay=0s timeout=3s period=5s #success=1 #failure=3
    Startup:      http-get http://:web/-/ready delay=0s timeout=3s period=15s #success=1 #failure=60
    Environment:  <none>
    Mounts:
      /etc/prometheus/certs from tls-assets (ro)
      /etc/prometheus/config_out from config-out (ro)
      /etc/prometheus/rules/prometheus-k8s-rulefiles-0 from prometheus-k8s-rulefiles-0 (rw)
      /etc/prometheus/web_config/web-config.yaml from web-config (ro,path="web-config.yaml")
      /prometheus from prometheus-k8s-db (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-nv7qs (ro)
  config-reloader:
    Image:      quay.io/prometheus-operator/prometheus-config-reloader:v0.53.1
    Port:       8080/TCP
    Host Port:  0/TCP
    Command:
      /bin/prometheus-config-reloader
    Args:
      --listen-address=:8080
      --reload-url=http://localhost:9090/-/reload
      --config-file=/etc/prometheus/config/prometheus.yaml.gz
      --config-envsubst-file=/etc/prometheus/config_out/prometheus.env.yaml
      --watched-dir=/etc/prometheus/rules/prometheus-k8s-rulefiles-0
    Limits:
      cpu:     100m
      memory:  50Mi
    Requests:
      cpu:     100m
      memory:  50Mi
    Environment:
      POD_NAME:  prometheus-k8s-0 (v1:metadata.name)
      SHARD:     0
    Mounts:
      /etc/prometheus/config from config (rw)
      /etc/prometheus/config_out from config-out (rw)
      /etc/prometheus/rules/prometheus-k8s-rulefiles-0 from prometheus-k8s-rulefiles-0 (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-nv7qs (ro)
Conditions:
  Type           Status
  PodScheduled   False 
Volumes:
  config:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  prometheus-k8s
    Optional:    false
  tls-assets:
    Type:                Projected (a volume that contains injected data from multiple sources)
    SecretName:          prometheus-k8s-tls-assets-0
    SecretOptionalName:  <nil>
  config-out:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     
    SizeLimit:  <unset>
  prometheus-k8s-rulefiles-0:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      prometheus-k8s-rulefiles-0
    Optional:  false
  web-config:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  prometheus-k8s-web-config
    Optional:    false
  prometheus-k8s-db:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     
    SizeLimit:  <unset>
  kube-api-access-nv7qs:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              kubernetes.io/os=linux
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason            Age                From               Message
  ----     ------            ----               ----               -------
  Warning  FailedScheduling  7s (x13 over 13m)  default-scheduler  0/1 nodes are available: 1 Insufficient memory.

    可以看到,是因为没有足够的内存分配导致 prometheus-k8s-0 此 Pod 无法正常创建,此时,我们可以将所有的 Pod 副本数量都定义为 1,从而减少资源消耗。

     我们可以通命令行查看相关 副本信息,如下所示:


[leonli@Leon manifests ] % grep -rn 'replicas:' *
alertmanager-alertmanager.yaml:23:  replicas: 3
blackboxExporter-deployment.yaml:12:  replicas: 1
grafana-deployment.yaml:12:  replicas: 1
kubeStateMetrics-deployment.yaml:12:  replicas: 1
prometheus-prometheus.yaml:35:  replicas: 2
prometheusAdapter-deployment.yaml:12:  replicas: 2
prometheusOperator-deployment.yaml:12:  replicas: 1
setup/0alertmanagerCustomResourceDefinition.yaml:3547:              replicas:
setup/0alertmanagerCustomResourceDefinition.yaml:6010:              replicas:
setup/0thanosrulerCustomResourceDefinition.yaml:3670:              replicas:
setup/0thanosrulerCustomResourceDefinition.yaml:6172:              replicas:
setup/0prometheusCustomResourceDefinition.yaml:5119:              replicas:
setup/0prometheusCustomResourceDefinition.yaml:8271:              replicas:

    然后将其 replicas定义为 “1”,重新运行 yaml 文件,如下所示:


[leonli@Leon kube-prometheus ] % kubectl apply -f manifests/
[leonli@Leon manifests ] % kubectl get pods -n monitoring -o wide
NAME                                   READY   STATUS    RESTARTS   AGE     IP             NODE             NOMINATED NODE   READINESS GATES
alertmanager-main-0                    2/2     Running   0          7m23s   172.17.0.9     devops-cluster   <none>           <none>
blackbox-exporter-6b79c4588b-xxjkm     3/3     Running   0          26m     172.17.0.2     devops-cluster   <none>           <none>
grafana-7fd69887fb-rn65j               1/1     Running   0          26m     172.17.0.3     devops-cluster   <none>           <none>
kube-state-metrics-9d449c7f4-8kh8x     3/3     Running   0          16s     172.17.0.5     devops-cluster   <none>           <none>
node-exporter-kxdrw                    2/2     Running   0          26m     192.168.49.2   devops-cluster   <none>           <none>
prometheus-adapter-5698bb779b-2p5mm    1/1     Running   0          17m     172.17.0.6     devops-cluster   <none>           <none>
prometheus-k8s-0                       2/2     Running   0          24m     172.17.0.10    devops-cluster   <none>           <none>
prometheus-operator-6dc9f66cb7-lkfdl   2/2     Running   0          26m     172.17.0.8     devops-cluster   <none>           <none>

    此时,我们基于 Minikube Dashboard 查看所部署组件情况,以验证是否正常运行,具体如下所示:

    此时,再一次查看 svc,如下所示:


[leonli@Leon kube-prometheus ] % kubectl get svc -n monitoring
NAME                    TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                         AGE
alertmanager-main       ClusterIP   10.105.71.195   <none>        9093:30076/TCP,8080:31251/TCP   27h
alertmanager-operated   ClusterIP   None            <none>        9093/TCP,9094/TCP,9094/UDP      27h
blackbox-exporter       ClusterIP   10.110.173.86   <none>        9115/TCP,19115/TCP              27h
grafana                 ClusterIP   10.102.213.39   <none>        3000:32540/TCP                  27h
kube-state-metrics      ClusterIP   None            <none>        8443/TCP,9443/TCP               27h
node-exporter           ClusterIP   None            <none>        9100/TCP                        27h
prometheus-adapter      ClusterIP   10.100.47.174   <none>        443/TCP                         27h
prometheus-k8s          ClusterIP   10.104.130.93   <none>        9090/TCP,8080/TCP               27h
prometheus-operated     ClusterIP   None            <none>        9090/TCP                        27h
prometheus-operator     ClusterIP   None            <none>        8443/TCP                        27h

    至此,整个 Prometheus Stack 相关组件已成功部署完成。


Web 访问


    基于 svc,我们可以看到,默认情况下所有 Service 都是基于 Cluster IP 类型,所有资源只能在集群内部相互访问。因此,如果需要外部 Web 访问,需要调整 Service 类型为 NodePort ,才能保证节点可正常访问。

    这里,我们可以有如下方法进行操作:

    1、Kubernetes Dashboard UI 界面操作

    2、Yaml 文件调整

    无论基于哪种方式,无非需要重新定义 type: NodePort,使其能够基于此访问进行交互。执行完如下所示:


[leonli@Leon kube-prometheus ] % kubectl get svc -n monitoring
NAME                    TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                         AGE
alertmanager-main       NodePort    10.105.71.195   <none>        9093:30076/TCP,8080:31251/TCP   27h
alertmanager-operated   ClusterIP   None            <none>        9093/TCP,9094/TCP,9094/UDP      27h
blackbox-exporter       ClusterIP   10.110.173.86   <none>        9115/TCP,19115/TCP              27h
grafana                 NodePort    10.102.213.39   <none>        3000:32540/TCP                  27h
kube-state-metrics      ClusterIP   None            <none>        8443/TCP,9443/TCP               27h
node-exporter           ClusterIP   None            <none>        9100/TCP                        27h
prometheus-adapter      ClusterIP   10.100.47.174   <none>        443/TCP                         27h
prometheus-k8s          ClusterIP   10.104.130.93   <none>        9090/TCP,8080/TCP               27h
prometheus-operated     ClusterIP   None            <none>        9090/TCP                        27h
prometheus-operator     ClusterIP   None            <none>        8443/TCP                        27h

    此时,我们通过 http://localhost:3000 等 Web 方式访问 Grafana,账户 admin/admin ,如下所示:

    此时,进入首页,我们可以看到,如下信息:

    如上为 Helm 快速部署 Prometheus Stack 实践解析,希望对大家有用。关于更多需要了解的信息,欢迎大家交流!

    Adiós !

相关实践学习
容器服务Serverless版ACK Serverless 快速入门:在线魔方应用部署和监控
通过本实验,您将了解到容器服务Serverless版ACK Serverless 的基本产品能力,即可以实现快速部署一个在线魔方应用,并借助阿里云容器服务成熟的产品生态,实现在线应用的企业级监控,提升应用稳定性。
相关文章
|
1月前
|
Prometheus Kubernetes 监控
k8s部署针对外部服务器的prometheus服务
通过上述步骤,您不仅成功地在Kubernetes集群内部署了Prometheus,还实现了对集群外服务器的有效监控。理解并实施网络配置是关键,确保监控数据的准确无误传输。随着监控需求的增长,您还可以进一步探索Prometheus生态中的其他组件,如Alertmanager、Grafana等,以构建完整的监控与报警体系。
121 60
|
1月前
|
Prometheus Kubernetes 监控
k8s部署针对外部服务器的prometheus服务
通过上述步骤,您不仅成功地在Kubernetes集群内部署了Prometheus,还实现了对集群外服务器的有效监控。理解并实施网络配置是关键,确保监控数据的准确无误传输。随着监控需求的增长,您还可以进一步探索Prometheus生态中的其他组件,如Alertmanager、Grafana等,以构建完整的监控与报警体系。
210 62
|
1月前
|
自然语言处理 PyTorch 算法框架/工具
掌握从零到一的进阶攻略:让你轻松成为BERT微调高手——详解模型微调全流程,含实战代码与最佳实践秘籍,助你应对各类NLP挑战!
【10月更文挑战第1天】随着深度学习技术的进步,预训练模型已成为自然语言处理(NLP)领域的常见实践。这些模型通过大规模数据集训练获得通用语言表示,但需进一步微调以适应特定任务。本文通过简化流程和示例代码,介绍了如何选择预训练模型(如BERT),并利用Python库(如Transformers和PyTorch)进行微调。文章详细说明了数据准备、模型初始化、损失函数定义及训练循环等关键步骤,并提供了评估模型性能的方法。希望本文能帮助读者更好地理解和实现模型微调。
69 2
掌握从零到一的进阶攻略:让你轻松成为BERT微调高手——详解模型微调全流程,含实战代码与最佳实践秘籍,助你应对各类NLP挑战!
|
3月前
|
Prometheus Kubernetes 监控
快速部署 Prometheus 社区版
Prometheus 是一个开源的系统监控和报警系统,最初由 SoundCloud 开发,并在 2012 年作为开源项目发布。它现在是 Cloud Native Computing Foundation(CNCF)的一部分,与 Kubernetes 等其他知名项目一起,成为云原生生态系统的重要组成部分。本文介绍通过计算巢快速部署 Prometheus 社区版。
快速部署 Prometheus 社区版
|
3月前
|
Prometheus 监控 Cloud Native
|
4月前
|
JSON Prometheus 监控
Prometheus+Grafana 部署
Prometheus 和 Grafana 组成监控解决方案。Prometheus 是开源系统监控工具,Grafana 则用于数据可视化。要连接 Prometheus 数据源,登录 Grafana,点击设置,选择“连接”,添加新数据源,选择 Prometheus 类型,并填入 Prometheus 服务器的 HTTP 地址,如 `http://192.168.1.1:9090`,验证连接。之后,从 Grafana 官方仪表板库导入监控面板,如主机监控模板,以可视化系统状态。完成这些步骤后,便建立了有效的监控系统。
136 1
|
6月前
|
Prometheus Kubernetes 监控
|
19天前
|
Prometheus 运维 监控
智能运维实战:Prometheus与Grafana的监控与告警体系
【10月更文挑战第26天】Prometheus与Grafana是智能运维中的强大组合,前者是开源的系统监控和警报工具,后者是数据可视化平台。Prometheus具备时间序列数据库、多维数据模型、PromQL查询语言等特性,而Grafana支持多数据源、丰富的可视化选项和告警功能。两者结合可实现实时监控、灵活告警和高度定制化的仪表板,广泛应用于服务器、应用和数据库的监控。
101 3
|
3月前
|
Prometheus 监控 Cloud Native
【监控】prometheus传统环境监控告警常用配置
【监控】prometheus传统环境监控告警常用配置
【监控】prometheus传统环境监控告警常用配置
|
9天前
|
Prometheus 监控 Cloud Native
在 HBase 集群中,Prometheus 通常监控哪些类型的性能指标?
在 HBase 集群中,Prometheus 监控关注的核心指标包括 Master 和 RegionServer 的进程存在性、RPC 请求数、JVM 内存使用率、磁盘和网络错误、延迟和吞吐量、资源利用率及 JVM 使用信息。通过 Grafana 可视化和告警规则,帮助管理员实时监控集群性能和健康状况。