K8s + prometheus + vm(VictoriaMetrics)
Prometheus 是目前常用的监控软件,今天主要介绍下pronethes在k8s中的部署以及使用vm作为存储的使用
安装 prometheus
- 创建 ns
kubectl create ns prometheus
- 创建 node-exporter
#cat exporter.yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: node-exporter
labels:
app: node-exporter
spec:
selector:
matchLabels:
app: node-exporter
template:
metadata:
labels:
app: node-exporter
spec:
hostPID: true
hostIPC: true
hostNetwork: true
nodeSelector:
kubernetes.io/os: linux
containers:
- name: node-exporter
image: prom/node-exporter:v1.1.1
args:
- --web.listen-address=$(HOSTIP):9100
- --path.procfs=/host/proc
- --path.sysfs=/host/sys
- --path.rootfs=/host/root
- --collector.filesystem.ignored-mount-points=^/(dev|proc|sys|var/lib/docker/.+)($|/)
- --collector.filesystem.ignored-fs-types=^(autofs|binfmt_misc|cgroup|configfs|debugfs|devpts|devtmpfs|fusectl|hugetlbfs|mqueue|overlay|proc|procfs|pstore|rpc_pipefs|securityfs|sysfs|tracefs)$
ports:
- containerPort: 9100
env:
- name: HOSTIP
valueFrom:
fieldRef:
fieldPath: status.hostIP
resources:
requests:
cpu: 150m
memory: 180Mi
limits:
cpu: 150m
memory: 180Mi
securityContext:
runAsNonRoot: true
runAsUser: 65534
volumeMounts:
- name: proc
mountPath: /host/proc
- name: sys
mountPath: /host/sys
- name: root
mountPath: /host/root
mountPropagation: HostToContainer
readOnly: true
tolerations:
- operator: "Exists"
volumes:
- name: proc
hostPath:
path: /proc
- name: dev
hostPath:
path: /dev
- name: sys
hostPath:
path: /sys
- name: root
hostPath:
path: /
# kubectl -n prometheus apply -f exporter.yaml
prromethes的exproter配置我们放到comfig中管理
- 创建 promethesu ConfigMap
# cat prometheus-cm.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-config
data:
prometheus.yaml: |
global:
scrape_interval: 15s
scrape_timeout: 15s
scrape_configs:
- job_name: "nodes"
static_configs:
- targets: ['XXX:9100', 'XXX:9100', 'XXX:9100', 'XXX:9100'] #替换成自己的node ip
relabel_configs: # 通过 relabeling 从 __address__ 中提取 IP 信息,为了后面验证 VM 是否兼容 relabeling
- source_labels: [__address__]
regex: "(.*):(.*)"
replacement: "${1}"
target_label: 'ip'
action: replace
# kubectl -n prometheus apply -f prometheus-cm.yaml
- 创建 pvc
# cat prometheus-pvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: prometheus-data
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 2Gi
storageClassName: managed-nfs-storage
# kubectl -n prometheus apply -f prometheus-pvc.yaml
- 创建 service account
# cat sa.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: prometheus
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: prometheus
rules:
- apiGroups:
- ""
resources:
- nodes
- services
- endpoints
- pods
- nodes/proxy
verbs:
- get
- list
- watch
- apiGroups:
- "extensions"
resources:
- ingresses
verbs:
- get
- list
- watch
- apiGroups:
- ""
resources:
- configmaps
- nodes/metrics
verbs:
- get
- nonResourceURLs:
- /metrics
verbs:
- get
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
name: prometheus
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: prometheus
subjects:
- kind: ServiceAccount
name: prometheus
namespace: prometheus # 注意换成自己的ns
# kubectl -n prometheus apply -f sa.yaml
- 部署 prometheus
# cat prometheus.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: prometheus
spec:
selector:
matchLabels:
app: prometheus
template:
metadata:
labels:
app: prometheus
spec:
serviceAccountName: prometheus
volumes:
- name: data
persistentVolumeClaim:
claimName: prometheus-data
- name: config-volume
configMap:
name: prometheus-config
containers:
- image: prom/prometheus:v2.35.0
name: prometheus
args:
- "--config.file=/etc/prometheus/prometheus.yaml"
- "--storage.tsdb.path=/prometheus" # 指定tsdb数据路径
- "--storage.tsdb.retention.time=24h"
- "--web.enable-admin-api" # 控制对admin HTTP API的访问,其中包括删除时间序列等功能
- "--web.enable-lifecycle" # 支持热更新,直接执行localhost:9090/-/reload立即生效
ports:
- containerPort: 9090
name: http
securityContext:
runAsUser: 0
volumeMounts:
- mountPath: "/etc/prometheus"
name: config-volume
- mountPath: "/prometheus"
name: data
---
apiVersion: v1
kind: Service
metadata:
name: prometheus
spec:
selector:
app: prometheus
type: ClusterIP
ports:
- name: web
port: 9090
targetPort: http
# kubectl -n prometheus apply -f prometheus.yaml
- 创建 ingress (如果没有可以忽略)
# cat ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: prometheus
annotations:
kubernetes.io/ingress.class: "nginx"
kubernetes.io/ingress.rule-mix: "true"
cert-manager.io/cluster-issuer: letsencrypt-prod
spec:
tls:
- hosts:
- prometheus.xxx.cn # 指定需要证书的域名
secretName: prometheus.xxx.cn
rules:
- host: prometheus.xxx.cn
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: prometheus
port:
number: 9090
# kubectl -n prometheus apply -f ingress.yaml
- 部署 grafana
# vi grafana.yaml
kind: Deployment
metadata:
name: grafana
spec:
selector:
matchLabels:
app: grafana
template:
metadata:
labels:
app: grafana
spec:
volumes:
- name: storage
persistentVolumeClaim:
claimName: grafana-data
containers:
- name: grafana
image: grafana/grafana:main
imagePullPolicy: IfNotPresent
ports:
- containerPort: 3000
name: grafana
securityContext:
runAsUser: 0
env:
- name: GF_SECURITY_ADMIN_USER
value: admin
- name: GF_SECURITY_ADMIN_PASSWORD
value: admin321
volumeMounts:
- mountPath: /var/lib/grafana
name: storage
---
apiVersion: v1
kind: Service
metadata:
name: grafana
spec:
type: ClusterIP
ports:
- port: 3000
selector:
app: grafana
"grafana.yaml" [New] 57L, 1146C written
[root@k8s-master1 grafana]# cat grafana.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: grafana
spec:
selector:
matchLabels:
app: grafana
template:
metadata:
labels:
app: grafana
spec:
volumes:
- name: storage
persistentVolumeClaim:
claimName: grafana-data
containers:
- name: grafana
image: grafana/grafana:main
imagePullPolicy: IfNotPresent
ports:
- containerPort: 3000
name: grafana
securityContext:
runAsUser: 0
env:
- name: GF_SECURITY_ADMIN_USER
value: admin
- name: GF_SECURITY_ADMIN_PASSWORD
value: admin321
volumeMounts:
- mountPath: /var/lib/grafana
name: storage
---
apiVersion: v1
kind: Service
metadata:
name: grafana
spec:
type: ClusterIP
ports:
- port: 3000
selector:
app: grafana
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: grafana-data
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
storageClassName: managed-nfs-storage
#
kubectl -n prometheus apply -f grafana.yaml
- 创建 grafana ingress
# cat ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: grafana
annotations:
kubernetes.io/ingress.class: "nginx"
spec:
tls:
- hosts:
- grafana.wy1212.cn # 指定需要证书的域名
secretName: grafana.wy1212.cn
rules:
- host: grafana.wy1212.cn
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: grafana
port:
number: 3000
# kubectl -n prometheus apply -f ingress.yaml
安装 VM 并配置
- 创建 VM
# cat vm.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: victoria-metrics
spec:
selector:
matchLabels:
app: victoria-metrics
template:
metadata:
labels:
app: victoria-metrics
spec:
volumes:
- name: storage
persistentVolumeClaim:
claimName: victoria-metrics-data
containers:
- name: vm
image: victoriametrics/victoria-metrics:v1.76.1
imagePullPolicy: IfNotPresent
args:
- -storageDataPath=/var/lib/victoria-metrics-data
- -retentionPeriod=1d #数据保存时间
ports:
- containerPort: 8428
name: http
volumeMounts:
- mountPath: /var/lib/victoria-metrics-data
name: storage
---
apiVersion: v1
kind: Service
metadata:
name: victoria-metrics
spec:
type: ClusterIP
ports:
- port: 8428
selector:
app: victoria-metrics
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: victoria-metrics-data
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 2Gi
storageClassName: managed-nfs-storage #换成自己的nfs
# kubectl -n prometheus apply -f vm.yaml
等vm 运行起来后,修改 prometheus 的 configmap 修改数据存储位置
- 修改 prometheus cm
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-config
data:
prometheus.yaml: |
global:
scrape_interval: 15s
scrape_timeout: 15s
remote_write: # 新增 远程写入到远程 VM 存储
- url: http://victoria-metrics:8428/api/v1/write # 新增
scrape_configs:
因为我们部署prometheus的时候配置了 --web.enable-lifecycle 直接 curl -X POST localhost:9090/-/reload 即可生效
- 在grafana 配置vm作为数据源测试
- 创建完数据源,倒入一个dashboard (16098)测试
整个部署就完成了,关于其他如pod资源监控以及自动发现等配置,在下一篇文章中介绍。