搭建过程
- 获取kube-prometheus项目
git clone https://github.com/prometheus-operator/kube-prometheus.git
AI 代码解读
复制代码
- 修改Prometheus资源文件信息
找到文件:kube-prometheus/manifests/prometheus-prometheus.yaml文件,修改如下信息
找到manifests/prometheus-prometheus.yaml文件,修改如下的信息
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
labels:
app.kubernetes.io/component: prometheus
app.kubernetes.io/instance: k8s
app.kubernetes.io/name: prometheus
app.kubernetes.io/part-of: kube-prometheus
app.kubernetes.io/version: 2.35.0
name: k8s
namespace: monitoring
spec:
alerting:
alertmanagers:
- apiVersion: v2
name: alertmanager-main
namespace: monitoring
port: web
enableFeatures: []
externalLabels: {}
image: quay.io/prometheus/prometheus:v2.35.0
nodeSelector:
kubernetes.io/os: linux
podMetadata:
labels:
app.kubernetes.io/component: prometheus
app.kubernetes.io/instance: k8s
app.kubernetes.io/name: prometheus
app.kubernetes.io/part-of: kube-prometheus
app.kubernetes.io/version: 2.35.0
podMonitorNamespaceSelector: {}
podMonitorSelector: {}
probeNamespaceSelector: {}
probeSelector: {}
replicas: 2 #副本数量,为了做高可用的
resources:
requests:
memory: 400Mi
ruleNamespaceSelector: {}
ruleSelector: {}
securityContext:
fsGroup: 2000
runAsNonRoot: true
runAsUser: 1000
serviceAccountName: prometheus-k8s
serviceMonitorNamespaceSelector: {}
serviceMonitorSelector: {}
version: 2.35.0
AI 代码解读
#新增加的配置信息####################################################
retention: '15d' #指标保留的天数 24h:24小时 10d:10天 12w:12周 2y:2年
storage: #持久化存储的配置
volumeClaimTemplate:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: prometheus-data
labels:
app: prometheus
spec:
storageClassName: csi-nas #kubectl get sc获取storeClass的名字
accessModes:
- ReadWriteMany #访问方式,ReadWriteMany或者是ReadWriteOnce
resources:
requests:
storage: 300Gi #按照自己的监控的指标每天的数据量和保留的天数配置
AI 代码解读
#挂载本地机器的时间配置,处理pod里面的时区不一致的问题(这个要保证机器的时间要一致,之前碰到过有个机器node重启,导致快了30分钟,但是就算是不设置时区,两个prometheus也会差距30分钟)
volumes:
- name: date-config
hostPath:
path: /etc/localtime
volumeMounts:
- mountPath: /etc/localtime
name: date-config
AI 代码解读
如果不做持久化,那就不配置storage信息,但是当Prometheus被驱逐的时候,数据会丢失掉
- 执行安装命令
进入kube-prometheus目录,之后执行如下的命令,请按照顺序执行
kubectl create -f manifests/setup/ kubectl create -f manifests/
AI 代码解读
- 查看结果
等待五分钟作用(根据镜像下载的速度来定),通过如下命令查看结果:
kubectl get pods -n monitoring
AI 代码解读
NAME READY STATUS RESTARTS AGE
alertmanager-main-0 2/2 Running 0 37m
alertmanager-main-1 2/2 Running 0 37m
alertmanager-main-2 2/2 Running 0 37m
alertmanager-relay-deployment-795859b658-pzzcq 1/1 Running 0 14m
blackbox-exporter-7d89b9b799-2pr2v 3/3 Running 0 8d
grafana-6bd85b474-74hsk 1/1 Running 9 (7d18h ago) 27d
kube-state-metrics-d5754d6dc-64wnj 3/3 Running 0 8d
node-exporter-46qw2 2/2 Running 4 (67d ago) 118d
node-exporter-48zlb 2/2 Running 47 (2d6h ago) 106d
node-exporter-6pkck 2/2 Running 4 (8d ago) 106d
node-exporter-8n2ms 2/2 Running 51 (83m ago) 2d19h
node-exporter-94k6t 2/2 Running 0 55d
node-exporter-9qrcb 2/2 Running 19 (29d ago) 118d
node-exporter-c5nkc 2/2 Running 0 4d2h
node-exporter-fbbpx 2/2 Running 0 26d
node-exporter-flk4l 2/2 Running 6 (27d ago) 106d
node-exporter-kpnzq 2/2 Running 10 118d
node-exporter-tmjtq 2/2 Running 10 (7d18h ago) 76d
node-exporter-ztvjc 2/2 Running 2 (84d ago) 86d
prometheus-adapter-6998fcc6b5-7cvsv 1/1 Running 0 19d
prometheus-adapter-6998fcc6b5-zsnvw 1/1 Running 0 8d
prometheus-k8s-0 2/2 Running 9 (2m52s ago) 66m
prometheus-k8s-1 2/2 Running 0 71m
prometheus-operator-59647c66cf-pl92j 2/2 Running 0 38m
AI 代码解读
复制代码
如果都是处于Running状态,那么就代表部署成功了,可以看到部署了node_exporter,prometheus,grafana,alertmanager,blackbox-exporter等组件
- 注意点
一定要仔细查看每个步骤执行之后是否有错误信息,不然后面会出现各种各样的问题,如果有错误信息,直接执行如下命令做彻底删除动作
kubectl delete -f manifests/setup/
kubectl delete -f manifests/
AI 代码解读
- 具体使用
部署的Grafana是已经配置好了Prometheus作为数据源的,不需要自己手动配置了,具体怎么使用dashboard在grafana上面显示,不在本文档的介绍范围当中,请参考Grafana的官网和Dashboard的官网:Grafana Dashboards - discover and share dashboards for Grafana. | Grafana Labs
kube-prometheus的逻辑
逻辑图
例子
下面通过构建一个redis-exporter的监控来讲解如何使用kube-prometheus的自定义资源来做配置监控数据被Prometheus抓取!
下面三个文件直接使用:kubectl apply -f 文件名的方式执行,执行顺序为Deployment->Service->ServiceMonitor
- 部署redis_exporter的Deployment
redis-exporter的deployment的部署配置
apiVersion: apps/v1
kind: Deployment
metadata:
name: redis-exporter
namespace: monitoring
labels:
app.kubernetes.io/component: exporter
app.kubernetes.io/name: redis-exporter
app.kubernetes.io/part-of: kube-prometheus
app.kubernetes.io/version: 1.37.0
spec:
selector:
matchLabels:
app.kubernetes.io/component: exporter
app.kubernetes.io/name: redis-exporter
app.kubernetes.io/part-of: kube-prometheus
replicas: 1
template:
metadata:
labels:
app.kubernetes.io/component: exporter
app.kubernetes.io/name: redis-exporter
app.kubernetes.io/part-of: kube-prometheus
app.kubernetes.io/version: 1.37.0
spec:
containers:
- name: redis-exporter
image: oliver006/redis_exporter:v1.37.0
args: ["-redis.addr","redis://redis的IP或者域名:6379","-redis.password","密码"]
resources:
requests:
cpu: 20m
memory: 20Mi
limits:
cpu: 100m
memory: 30Mi
ports:
- containerPort: 9121
name: http
volumeMounts:
- name: localtime
mountPath: /etc/localtime
volumes:
- name: localtime
hostPath:
path: /usr/share/zoneinfo/Asia/Shanghai
restartPolicy: Always
AI 代码解读
2.构建Service资源对象
配置redis-exporter的service
apiVersion: v1
kind: Service
metadata:
name: redis-exporter
namespace: monitoring
#labels定义很重要,后面的ServiceMonitor资源需要使用到
labels:
app.kubernetes.io/component: exporter
app.kubernetes.io/name: redis-exporter
app.kubernetes.io/part-of: kube-prometheus
app.kubernetes.io/version: 1.37.0
spec:
selector:
app.kubernetes.io/component: exporter
app.kubernetes.io/name: redis-exporter
app.kubernetes.io/part-of: kube-prometheus
type: ClusterIP
ports:
- name: http
port: 9121
targetPort: http
AI 代码解读
- 构建ServiceMonitor对象
ServiceMonitor资源是kube-prometheus自定的资源的(CRD)
创建Redis-exporter的服务监控对象资源
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
labels:
app.kubernetes.io/component: exporter
app.kubernetes.io/name: redis-exporter
app.kubernetes.io/part-of: kube-prometheus
app.kubernetes.io/version: 1.37.0
name: redis-exporter
namespace: monitoring
spec:
endpoints: #抓取的配置
- interval: 15s #每间隔15秒抓取一次指标数据
port: http #Service定义的port的名字
relabelings: #指标的重写
- action: replace
regex: (.*)
replacement: $1
sourceLabels:
- __meta_kubernetes_pod_node_name
targetLabel: instance
scheme: http
jobLabel: app.kubernetes.io/name
selector: #选择器,这个地方要配置对应Service的labels信息
matchLabels:
app.kubernetes.io/component: exporter
app.kubernetes.io/name: redis-exporter
app.kubernetes.io/part-of: kube-prometheus
AI 代码解读
执行逻辑
ServiceMonitor对象创建执行会被prometheus-operator监控到,并翻译成prometheus的配置信息,翻译结果如下:
- job_name: serviceMonitor/monitoring/redis-exporter/0
honor_timestamps: true
scrape_interval: 15s
scrape_timeout: 10s
metrics_path: /metrics
scheme: http
follow_redirects: true
enable_http2: true
relabel_configs:
- source_labels: [job]
separator: ;
regex: (.*)
target_label: __tmp_prometheus_job_name
replacement: $1
action: replace
......指标重写的操作省略
kubernetes_sd_configs:
- role: endpoints
kubeconfig_file: ""
follow_redirects: true
enable_http2: true
namespaces:
own_namespace: false
names:
- monitoring
AI 代码解读
prometheus根据配置信息按照endpoints进行信息抓取,这个endpoints的选取方式如下:首先根据ServiceMonitor定义的选择器属性:
selector: #选择器,这个地方要配置对应Service的labels信息
matchLabels:
app.kubernetes.io/component: exporter
app.kubernetes.io/name: redis-exporter
app.kubernetes.io/part-of: kube-prometheus
AI 代码解读
复制代码
按照上面的标签从monitoring命名空间下面选择服务条件的Service对象(注意是Service对象)
然后根据Service对象获取这个Service关联的Endpoints信息(也就是pod),所以如果在其他的命名空间创建ServiceMonitor对象,可能会出现Prometheus配置没问题,也没有报错,但是就是抓不到监控数据信息,解决方案如下,配置Role和RoleBinding对象,假设新命名空间为 logs:
定义角色,也就是说在本服务的命名空间内定义Role对象
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: prometheus-k8s #Role的名字
namespace: logs #Role所在的命名空间
AI 代码解读
rules: #Role授权
- apiGroups: [""] #不知道啥意思,看官网都是这么使用
resources: #授权的资源对象
- services #对应Service对象
- endpoints #对应Endpoint对象
- pods #对应Pod对象
verbs: #对以上资源对象的动作授权,这表示:拥有这个角色可以对default空间的Service,Endpoint,Pod
- get #进行get list watch操作
- list
- watch
- apiGroups:
- extensions
resources:
- ingresses
verbs:
- get
- list
- watch
AI 代码解读
绑定角色,很重要
就是把上面的Role对象给monitoring命名空间的prometheus-k8s绑定
monitoring命名空间和prometheus-k8s账号都是在搭建kube-prometheus的时候自动创建的
这个意思是让monitoring命名空间的prometheus-k8s账号拥有default空间的prometheus-k8s角色
这个monitoring的prometheus-k8s账号就能获取default空间内的Service,Endpoint,Pod资源了
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: prometheus-k8s
namespace: logs
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: prometheus-k8s
subjects:
- kind: ServiceAccount
name: prometheus-k8s
namespace: monitoring
AI 代码解读
上面这个文件的意思是在logs命名空间,创建了一个prometheus-k8s的角色让他具备能获取logs下面Service,Endpoints,Pod等资源的权限,并把这个角色和monitoring命名空间下面的prometheus-k8s账号绑定,使之具备monitoring下面的账号prometheus-k8s可以获取logs下面的Service,Endpoints,Pod等资源对象的权限