环境准备
系统Centos 7.9
k8s集群:
版本:1.21.5
节点:
192.168.10.201 master
192.168.10.202 work
k8s集群监控方案:
k8s部署prometheus
文件准备:
prometheus-rbac.ymal : 创建角色,权限配置:
apiVersion v1 kind ServiceAccount metadata name prometheus namespace kube-system labels kubernetes.io/cluster-service"true" addonmanager.kubernetes.io/mode Reconcile ---apiVersion rbac.authorization.k8s.io/v1 kind ClusterRole metadata name prometheus labels kubernetes.io/cluster-service"true" addonmanager.kubernetes.io/mode Reconcile rulesapiGroups"" resources nodes nodes/metrics services endpoints pods verbs get list watch apiGroups"" resources configmaps verbs get nonResourceURLs"/metrics" verbs get ---apiVersion rbac.authorization.k8s.io/v1 kind ClusterRoleBinding metadata name prometheus labels kubernetes.io/cluster-service"true" addonmanager.kubernetes.io/mode Reconcile roleRef apiGroup rbac.authorization.k8s.io kind ClusterRole name prometheus subjectskind ServiceAccount name prometheus namespace kube-system
prometheus-configmap.yaml
prometheus 监控服务动态发现配置
监控服务包括:
kubernetes-apiservers
kubernetes-nodes-kubelet
kubernetes-nodes-cadvisor
kubernetes-service-endpoints
kubernetes-services
kubernetes-pods
apiVersion v1 kind ConfigMap metadata name prometheus-config namespace kube-system labels kubernetes.io/cluster-service"true" addonmanager.kubernetes.io/mode EnsureExists data prometheus.yml scrape_configs: - job_name: prometheus static_configs: - targets: - localhost:9090 - job_name: kubernetes-apiservers kubernetes_sd_configs: - role: endpoints relabel_configs: - action: keep regex: default;kubernetes;https source_labels: - __meta_kubernetes_namespace - __meta_kubernetes_service_name - __meta_kubernetes_endpoint_port_name scheme: https tls_config: ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt insecure_skip_verify: true bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/tokenjob_name kubernetes-nodes-kubelet kubernetes_sd_configsrole node relabel_configsaction labelmap regex __meta_kubernetes_node_label_(.+) scheme https tls_config ca_file /var/run/secrets/kubernetes.io/serviceaccount/ca.crt insecure_skip_verifytrue bearer_token_file /var/run/secrets/kubernetes.io/serviceaccount/token job_name kubernetes-nodes-cadvisor kubernetes_sd_configsrole node relabel_configsaction labelmap regex __meta_kubernetes_node_label_(.+) target_label __metrics_path__ replacement /metrics/cadvisor scheme https tls_config ca_file /var/run/secrets/kubernetes.io/serviceaccount/ca.crt insecure_skip_verifytrue bearer_token_file /var/run/secrets/kubernetes.io/serviceaccount/token job_name kubernetes-service-endpoints kubernetes_sd_configsrole endpoints relabel_configsaction keep regextrue source_labels __meta_kubernetes_service_annotation_prometheus_io_scrape action replace regex (https?) source_labels __meta_kubernetes_service_annotation_prometheus_io_scheme target_label __scheme__ action replace regex (.+) source_labels __meta_kubernetes_service_annotation_prometheus_io_path target_label __metrics_path__ action replace regex ( ^ +)(? \d+)?;(\d+) replacement $1 $2 source_labels __address__ __meta_kubernetes_service_annotation_prometheus_io_port target_label __address__ action labelmap regex __meta_kubernetes_service_label_(.+) action replace source_labels __meta_kubernetes_namespace target_label kubernetes_namespace action replace source_labels __meta_kubernetes_service_name target_label kubernetes_name job_name kubernetes-services kubernetes_sd_configsrole service metrics_path /probe params module http_2xx relabel_configsaction keep regextrue source_labels __meta_kubernetes_service_annotation_prometheus_io_probe source_labels __address__ target_label __param_target replacement blackbox target_label __address__ source_labels __param_target target_label instance action labelmap regex __meta_kubernetes_service_label_(.+) source_labels __meta_kubernetes_namespace target_label kubernetes_namespace source_labels __meta_kubernetes_service_name target_label kubernetes_name job_name kubernetes-pods kubernetes_sd_configsrole pod relabel_configsaction keep regextrue source_labels __meta_kubernetes_pod_annotation_prometheus_io_scrape action replace regex (.+) source_labels __meta_kubernetes_pod_annotation_prometheus_io_path target_label __metrics_path__ action replace regex ( ^ +)(? \d+)?;(\d+) replacement $1 $2 source_labels __address__ __meta_kubernetes_pod_annotation_prometheus_io_port target_label __address__ action labelmap regex __meta_kubernetes_pod_label_(.+) action replace source_labels __meta_kubernetes_namespace target_label kubernetes_namespace action replace source_labels __meta_kubernetes_pod_name target_label kubernetes_pod_name
prometheus-statefulset.yaml
创建有状态的服务
apiVersion apps/v1 kind StatefulSet metadata name prometheus namespace kube-system labels k8s-app prometheus kubernetes.io/cluster-service"true" addonmanager.kubernetes.io/mode Reconcile version latest spec serviceName"prometheus" replicas1 podManagementPolicy"Parallel" updateStrategy type"RollingUpdate" selector matchLabels k8s-app prometheus template metadata labels k8s-app prometheus annotations scheduler.alpha.kubernetes.io/critical-pod'' spec priorityClassName system-cluster-critical serviceAccountName prometheus initContainersname"init-chown-data" image"busybox:latest" imagePullPolicy"IfNotPresent" command"chown""-R""65534:65534""/data" volumeMountsname prometheus-data mountPath /data subPath"" containersname prometheus-server-configmap-reload image"jimmidyson/configmap-reload:latest" imagePullPolicy"IfNotPresent" args --volume-dir=/etc/config --webhook-url=http://localhost:9090/-/reload volumeMountsname config-volume mountPath /etc/config readOnlytrue resources limits cpu 10m memory 10Mi requests cpu 10m memory 10Mi name prometheus-server image"prom/prometheus:latest" imagePullPolicy"IfNotPresent" args --config.file=/etc/config/prometheus.yml --storage.tsdb.path=/data --web.console.libraries=/etc/prometheus/console_libraries --web.console.templates=/etc/prometheus/consoles --web.enable-lifecycle portscontainerPort9090 readinessProbe httpGet path /-/ready port9090 initialDelaySeconds30 timeoutSeconds30 livenessProbe httpGet path /-/healthy port9090 initialDelaySeconds30 timeoutSeconds30 resources limits cpu 200m memory 1000Mi requests cpu 200m memory 1000Mi volumeMountsname config-volume mountPath /etc/config name prometheus-data mountPath /data subPath"" terminationGracePeriodSeconds300 volumesname config-volume configMap name prometheus-config volumeClaimTemplatesmetadata name prometheus-data spec storageClassName nfs-csi accessModes ReadWriteOnce resources requests storage"16Gi"
注意: prometheus的data,采用pvc,动态存储。
需要自行创建 pv 和pvc 。这里的pvc 为 nfs-csi
prometheus-service.yaml
kind Service apiVersion v1 metadata name prometheus namespace kube-system labels kubernetes.io/name"Prometheus" kubernetes.io/cluster-service"true" addonmanager.kubernetes.io/mode Reconcile spec type NodePort portsname http port9090 protocol TCP targetPort9090 nodePort30090 selector k8s-app prometheus
注意:指定服务类型为NodePort
创建prometheus:
kubectlapply-f.
访问prometheus 控制台
动态服务发现: