架构说明:
prometheus是云原生系统内的事实上的监控标准,而kubernetes集群内部自然还是需要就地取材的部署prometheus服务了
那么,prometheus-server部署的方式其实是非常多的,比如,kubesphere集成方式,helm包方式,yaml文件清单方式,all in one 方式,在本例中,选择使用yaml文件清单方式
部署前需要考虑一个问题,那就是prometheus-server的时序数据库的数据存储问题,在本例中使用的是本地目录挂载方式,也就是host本地挂载,挂载目录 /data
kubernetes集群的版本如下(1.23.16版本,3master,1个工作节点,部署方式为kubekey):
[root@node4 yaml]# k get no -owide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME node1 Ready control-plane,master 10d v1.23.16 192.168.123.11 <none> CentOS Linux 7 (Core) 3.10.0-1062.el7.x86_64 docker://20.10.8 node2 Ready control-plane,master 10d v1.23.16 192.168.123.12 <none> CentOS Linux 7 (Core) 3.10.0-1062.el7.x86_64 docker://20.10.8 node3 Ready control-plane,master 10d v1.23.16 192.168.123.13 <none> CentOS Linux 7 (Core) 3.10.0-1062.el7.x86_64 docker://20.10.8 node4 Ready worker 10d v1.23.16 192.168.123.14 <none> CentOS Linux 7 (Core) 3.10.0-1062.el7.x86_64 docker://20.10.8
prometheus-server的版本为(v2.2.1):
[root@node4 yaml]# k get deployments.apps -n monitor-sa -owide NAME READY UP-TO-DATE AVAILABLE AGE CONTAINERS IMAGES SELECTOR prometheus-server 2/2 2 2 9d prometheus prom/prometheus:v2.2.1 app=prometheus,component=server
grafana的版本为(rpm 方式安装的9.4.3):
[root@node4 yaml]# rpm -qa |grep grafana grafana-enterprise-9.4.3-1.x86_64
node-exporter的版本为(v0.16,damonsets控制器):
[root@node4 yaml]# k get ds -n monitor-sa -owide NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE CONTAINERS IMAGES SELECTOR node-exporter 4 4 4 4 4 <none> 10d node-exporter prom/node-exporter:v0.16.0 name=node-exporter
部署成功的pod状态如下:
[root@node4 yaml]# k get po -n monitor-sa NAME READY STATUS RESTARTS AGE node-exporter-6ttbl 1/1 Running 1 (77m ago) 10d node-exporter-7ls5t 1/1 Running 1 (76m ago) 10d node-exporter-r287q 1/1 Running 3 (77m ago) 10d node-exporter-z85dm 1/1 Running 1 (77m ago) 10d prometheus-server-fb59774d6-bgmn7 1/1 Running 0 62m prometheus-server-fb59774d6-wrq27 1/1 Running 0 62m
下面就如何在kubernetes内 部署一个prometheus做一个介绍
一,
node-exporter的部署
这里需要说明一下,node-exporter是做数据收集工作的,因此,如何收集数据,哪些数据需要收集,哪些数据需要舍弃这些是应该考虑的,虽然exporter只是收集数据,数据并不主动推送到prometheus,而是由prometheus自己来抓取,因此,无需配置存储,但如果node-exporter什么数据都收集,那毫无疑问的,对prometheus会是一种负担。
本例中相关配置是(表示磁盘挂载点的信息不收集):
- --collector.filesystem.ignored-mount-points
- '"^/(sys|proc|dev|host|etc)($|/)"'
prometheus的优化部分,根据以下内容配置
--collector.arp 启用 arp 收集器(默认值:启用)。
--collector.bcache 启用 bcache 收集器(默认值:启用)。
--collector.bonding 启用绑定收集器(默认值:启用)。
--collector.btrfs 启用 btrfs 收集器(默认值:启用)。
--collector.buddyinfo 启用 buddyinfo 收集器(默认值:禁用)。
--collector.conntrack 启用 conntrack 收集器(默认值:启用)。
--collector.cpu 启用 CPU 收集器(默认值:启用)。
--collector.cpufreq 启用 cpufreq 收集器(默认值:启用)。
--collector.diskstats 启用 diskstats 收集器(默认值:启用)。
--collector.drbd 启用 drbd 收集器(默认值:禁用)。
--collector.edac 启用 edac 收集器(默认值:启用)。
--collector.entropy 启用熵收集器(默认值:启用)。
--collector.ethtool 启用 ethtool 收集器(默认值:禁用)。
--collector.fiberchannel 启用光纤通道收集器(默认值:启用)。
--collector.filefd 启用 filefd 收集器(默认值:启用)。
--collector.filesystem 启用文件系统收集器(默认值:启用)。
--collector.hwmon 启用 hwmon 收集器(默认值:启用)。
--collector.infiniband 启用 infiniband 收集器(默认值:启用)。
--collector.interrupts 启用中断收集器(默认值:禁用)。
--collector.ipvs 启用 ipvs 收集器(默认值:启用)。
--collector.ksmd 启用 ksmd 收集器(默认值:禁用)。
--collector.loadavg 启用 loadavg 收集器(默认值:启用)。
--collector.logind 启用登录收集器(默认值:禁用)。
--collector.mdadm 启用 mdadm 收集器(默认值:启用)。
--collector.meminfo 启用 meminfo 收集器(默认值:启用)。
--collector.meminfo_numa 启用 meminfo_numa 收集器(默认值:禁用)。
--collector.mountstats 启用 mountstats 收集器(默认值:禁用)。
--collector.netclass 启用网络类收集器(默认:启用)。
--collector.netdev 启用 netdev 收集器(默认值:启用)。
--collector.netstat 启用 netstat 收集器(默认值:启用)。
--collector.network_route 启用 network_route 收集器(默认值:禁用)。
--collector.nfs 启用 nfs 收集器(默认值:启用)。 --collector.nfsd 启用 nfsd 收集器(默认值:启用)。
--collector.ntp 启用 ntp 收集器(默认值:禁用)。 --collector.nvme 启用 nvme 收集器(默认值:启用)。
--collector.perf 启用性能收集器(默认值:禁用)。 --collector.powersupplyclass 启用 powersupplyclass 收集器(默认值:启用)。
--collector.pressure 启用压力收集器(默认值:启用)。 --collector.processes 启用进程收集器(默认值:禁用)。
--collector.qdisc 启用 qdisc 收集器(默认值:禁用)。 --collector.rapl 启用 rapl 收集器(默认值:启用)。
--collector.runit 启用 runit 收集器(默认值:禁用)。 --collector.schedstat 启用 schedstat 收集器(默认值:启用)。
--collector.sockstat 启用 sockstat 收集器(默认值:启用)。 --collector.softnet 启用软网络收集器(默认值:启用)。
--collector.stat 启用统计收集器(默认值:启用)。 --collector.supervisord 启用 supervisord 收集器(默认值:禁用)。
--collector.systemd 启用 systemd 收集器(默认值:禁用)。 --collector.tapestats 启用tapestats 收集器(默认值:启用)。
--collector.tcpstat 启用 tcpstat 收集器(默认值:禁用)。 --collector.textfile 启用文本文件收集器(默认值:启用)。
--collector.thermal_zone 启用热区收集器(默认值:启用)。 --collector.time 启用时间收集器(默认:启用)。
--collector.timex 启用 timex 收集器(默认值:启用)。 --collector.udp_queues 启用 udp_queues 收集器(默认值:启用)。
--collector.uname 启用 uname 收集器(默认值:启用)。 --collector.vmstat 启用 vmstat 收集器(默认值:启用)。
--collector.wifi 启用 wifi 收集器(默认值:禁用)。 --collector.xfs 启用 xfs 收集器(默认值:启用)。
--collector.zfs 启用 zfs 收集器(默认值:启用)。 --collector.zoneinfo 启用 zoneinfo 收集器(默认值:禁用)。
Example:
--collector.filesystem.mount-points-exclude=^/(dev|proc|sys|var/lib/docker/.+|var/lib/kubelet/.+)($|/)
List:
Collector | Scope | Include Flag | Exclude Flag |
arp | device | --collector.arp.device-include | --collector.arp.device-exclude |
cpu | bugs | --collector.cpu.info.bugs-include | N/A |
cpu | flags | --collector.cpu.info.flags-include | N/A |
diskstats | device | --collector.diskstats.device-include | --collector.diskstats.device-exclude |
ethtool | device | --collector.ethtool.device-include | --collector.ethtool.device-exclude |
ethtool | metrics | --collector.ethtool.metrics-include | N/A |
filesystem | fs-types | N/A | --collector.filesystem.fs-types-exclude |
filesystem | mount-points | N/A | --collector.filesystem.mount-points-exclude |
hwmon | chip | --collector.hwmon.chip-include | --collector.hwmon.chip-exclude |
netdev | device | --collector.netdev.device-include | --collector.netdev.device-exclude |
qdisk | device | --collector.qdisk.device-include | --collector.qdisk.device-exclude |
sysctl | all | --collector.sysctl.include | N/A |
systemd | unit | --collector.systemd.unit-include | --collector.systemd.unit-exclude |
Enabled by default
Name | Description | OS |
arp | Exposes ARP statistics from /proc/net/arp . |
Linux |
bcache | Exposes bcache statistics from /sys/fs/bcache/ . |
Linux |
bonding | Exposes the number of configured and active slaves of Linux bonding interfaces. | Linux |
btrfs | Exposes btrfs statistics | Linux |
boottime | Exposes system boot time derived from the kern.boottime sysctl. |
Darwin, Dragonfly, FreeBSD, NetBSD, OpenBSD, Solaris |
conntrack | Shows conntrack statistics (does nothing if no /proc/sys/net/netfilter/ present). |
Linux |
cpu | Exposes CPU statistics | Darwin, Dragonfly, FreeBSD, Linux, Solaris, OpenBSD |
cpufreq | Exposes CPU frequency statistics | Linux, Solaris |
diskstats | Exposes disk I/O statistics. | Darwin, Linux, OpenBSD |
dmi | Expose Desktop Management Interface (DMI) info from /sys/class/dmi/id/ |
Linux |
edac | Exposes error detection and correction statistics. | Linux |
entropy | Exposes available entropy. | Linux |
exec | Exposes execution statistics. | Dragonfly, FreeBSD |
fibrechannel | Exposes fibre channel information and statistics from /sys/class/fc_host/ . |
Linux |
filefd | Exposes file descriptor statistics from /proc/sys/fs/file-nr . |
Linux |
filesystem | Exposes filesystem statistics, such as disk space used. | Darwin, Dragonfly, FreeBSD, Linux, OpenBSD |
hwmon | Expose hardware monitoring and sensor data from /sys/class/hwmon/ . |
Linux |
infiniband | Exposes network statistics specific to InfiniBand and Intel OmniPath configurations. | Linux |
ipvs | Exposes IPVS status from /proc/net/ip_vs and stats from /proc/net/ip_vs_stats . |
Linux |
loadavg | Exposes load average. | Darwin, Dragonfly, FreeBSD, Linux, NetBSD, OpenBSD, Solaris |
mdadm | Exposes statistics about devices in /proc/mdstat (does nothing if no /proc/mdstat present). |
Linux |
meminfo | Exposes memory statistics. | Darwin, Dragonfly, FreeBSD, Linux, OpenBSD |
netclass | Exposes network interface info from /sys/class/net/ |
Linux |
netdev | Exposes network interface statistics such as bytes transferred. | Darwin, Dragonfly, FreeBSD, Linux, OpenBSD |
netisr | Exposes netisr statistics | FreeBSD |
netstat | Exposes network statistics from /proc/net/netstat . This is the same information as netstat -s . |
Linux |
nfs | Exposes NFS client statistics from /proc/net/rpc/nfs . This is the same information as nfsstat -c . |
Linux |
nfsd | Exposes NFS kernel server statistics from /proc/net/rpc/nfsd . This is the same information as nfsstat -s . |
Linux |
nvme | Exposes NVMe info from /sys/class/nvme/ |
Linux |
os | Expose OS release info from /etc/os-release or /usr/lib/os-release |
any |
powersupplyclass | Exposes Power Supply statistics from /sys/class/power_supply |
Linux |
pressure | Exposes pressure stall statistics from /proc/pressure/ . |
Linux (kernel 4.20+ and/or CONFIG_PSI) |
rapl | Exposes various statistics from /sys/class/powercap . |
Linux |
schedstat | Exposes task scheduler statistics from /proc/schedstat . |
Linux |
selinux | Exposes SELinux statistics. | Linux |
sockstat | Exposes various statistics from /proc/net/sockstat . |
Linux |
softnet | Exposes statistics from /proc/net/softnet_stat . |
Linux |
stat | Exposes various statistics from /proc/stat . This includes boot time, forks and interrupts. |
Linux |
tapestats | Exposes statistics from /sys/class/scsi_tape . |
Linux |
textfile | Exposes statistics read from local disk. The --collector.textfile.directory flag must be set. |
any |
thermal | Exposes thermal statistics like pmset -g therm . |
Darwin |
thermal_zone | Exposes thermal zone & cooling device statistics from /sys/class/thermal . |
Linux |
time | Exposes the current system time. | any |
timex | Exposes selected adjtimex(2) system call stats. | Linux |
udp_queues | Exposes UDP total lengths of the rx_queue and tx_queue from /proc/net/udp and /proc/net/udp6 . |
Linux |
uname | Exposes system information as provided by the uname system call. | Darwin, FreeBSD, Linux, OpenBSD |
vmstat | Exposes statistics from /proc/vmstat . |
Linux |
xfs | Exposes XFS runtime statistics. | Linux (kernel 4.4+) |
zfs | Exposes ZFS performance statistics. | FreeBSD, Linux, Solaris |
node-exporter的部署文件:
cat >node-export.yaml <<EOF apiVersion: apps/v1 kind: DaemonSet metadata: name: node-exporter namespace: monitor-sa labels: name: node-exporter spec: selector: matchLabels: name: node-exporter template: metadata: labels: name: node-exporter spec: hostPID: true hostIPC: true hostNetwork: true containers: - name: node-exporter image: prom/node-exporter:v0.16.0 ports: - containerPort: 9100 resources: requests: cpu: 0.15 securityContext: privileged: true args: - --path.procfs - /host/proc - --path.sysfs - /host/sys - --collector.filesystem.ignored-mount-points - '"^/(sys|proc|dev|host|etc)($|/)"' volumeMounts: - name: dev mountPath: /host/dev - name: proc mountPath: /host/proc - name: sys mountPath: /host/sys - name: rootfs mountPath: /rootfs tolerations: - key: "node-role.kubernetes.io/master" operator: "Exists" effect: "NoSchedule" volumes: - name: proc hostPath: path: /proc - name: dev hostPath: path: /dev - name: sys hostPath: path: /sys - name: rootfs hostPath: path: / EOF
二,
kube-state-metrics收集器的部署
kube-state-metrics是kubernetes内部专门收集pod,deployment,ds,sts等等资源的状态的收集器,该收集器收集到的数据由prometheus-server 服务自己主动来抓取
例如,我们查询该服务的日志可以看到,有一些资源它没有收集到,原因是sa权限不足,但这些无需担心,和node-exporter一样,某些数据我们是并不需要收集的:
E1202 13:10:33.591335 1 reflector.go:156] pkg/mod/k8s.io/client-go@v0.0.0-20191109102209-3c0d1af94be5/tools/cache/reflector.go:108: Failed to list *v1.Secret: secrets is forbidden: User "system:serviceaccount:kube-system:kube-state-metrics" cannot list resource "secrets" in API group "" at the cluster scope E1202 13:10:33.592118 1 reflector.go:156] pkg/mod/k8s.io/client-go@v0.0.0-20191109102209-3c0d1af94be5/tools/cache/reflector.go:108: Failed to list *v1beta1.MutatingWebhookConfiguration: mutatingwebhookconfigurations.admissionregistration.k8s.io is forbidden: User "system:serviceaccount:kube-system:kube-state-metrics" cannot list resource "mutatingwebhookconfigurations" in API group "admissionregistration.k8s.io" at the cluster scope E1202 13:10:33.593079 1 reflector.go:156] pkg/mod/k8s.io/client-go@v0.0.0-20191109102209-3c0d1af94be5/tools/cache/reflector.go:108: Failed to list *v1.Namespace: networkpolicies.networking.k8s.io is forbidden: User "system:serviceaccount:kube-system:kube-state-metrics" cannot list resource "networkpolicies" in API group "networking.k8s.io" at the cluster scope E1202 13:10:33.597030 1 reflector.go:156] pkg/mod/k8s.io/client-go@v0.0.0-20191109102209-3c0d1af94be5/tools/cache/reflector.go:108: Failed to list *v1.ReplicaSet: replicasets.apps is forbidden: User "system:serviceaccount:kube-system:kube-state-metrics" cannot list resource "replicasets" in API group "apps" at the cluster scope E1202 13:10:33.599890 1 reflector.go:156] pkg/mod/k8s.io/client-go@v0.0.0-20191109102209-3c0d1af94be5/tools/cache/reflector.go:108: Failed to list *v1beta1.ValidatingWebhookConfiguration: validatingwebhookconfigurations.admissionregistration.k8s.io is forbidden: User "system:serviceaccount:kube-system:kube-state-metrics" cannot list resource "validatingwebhookconfigurations" in API group "admissionregistration.k8s.io" at the cluster scope E1202 13:10:34.580372 1 reflector.go:156] pkg/mod/k8s.io/client-go@v0.0.0-20191109102209-3c0d1af94be5/tools/cache/reflector.go:108: Failed to list *v1.StorageClass: storageclasses.storage.k8s.io is forbidden: User "system:serviceaccount:kube-system:kube-state-metrics" cannot list resource "storageclasses" in API group "storage.k8s.io" at the cluster scope E1202 13:10:34.580373 1 reflector.go:156] pkg/mod/k8s.io/client-go@v0.0.0-20191109102209-3c0d1af94be5/tools/cache/reflector.go:108: Failed to list *v1.ConfigMap: configmaps is forbidden: User "system:serviceaccount:kube-system:kube-state-metrics" cannot list resource "configmaps" in API group "" at the cluster scope E1202 13:10:34.586583 1 reflector.go:156] pkg/mod/k8s.io/client-go@v0.0.0-20191109102209-3c0d1af94be5/tools/cache/reflector.go:108: Failed to list *v1beta1.PodDisruptionBudget: poddisruptionbudgets.policy is forbidden: User "system:serviceaccount:kube-system:kube-state-metrics" cannot list resource "poddisruptionbudgets" in API group "policy" at the cluster scope E1202 13:10:34.586669 1 reflector.go:156] pkg/mod/k8s.io/client-go@v0.0.0-20191109102209-3c0d1af94be5/tools/cache/reflector.go:108: Failed to list *v1.Deployment: deployments.apps is forbidden: User "system:serviceaccount:kube-system:kube-state-metrics" cannot list resource "deployments" in API group "apps" at the cluster scope E1202 13:10:34.587055 1 reflector.go:156] pkg/mod/k8s.io/client-go@v0.0.0-20191109102209-3c0d1af94be5/tools/cache/reflector.go:108: Failed to list *v1beta1.VolumeAttachment: volumeattachments.storage.k8s.io is forbidden: User "system:serviceaccount:kube-system:kube-state-metrics" cannot list resource "volumeattachments" in API group "storage.k8s.io" at the cluster scope
kube-state-metrics的RBAC:
这里上面的缺的收集cm的权限我已经补上了
cat> kube-state-metrics-rbac.yaml <<EOF --- apiVersion: v1 kind: ServiceAccount metadata: name: kube-state-metrics namespace: kube-system --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: kube-state-metrics rules: - apiGroups: [""] resources: ["nodes", "pods", "services", "resourcequotas", "replicationcontrollers", "limitranges", "persistentvolumeclaims", "persistentvolumes", "namespaces", "endpoints"] verbs: ["list", "watch"] - apiGroups: ["extensions"] resources: ["daemonsets", "deployments", "replicasets"] verbs: ["list", "watch"] - apiGroups: ["apps"] resources: ["statefulsets","daemonsets","replicasets","deployments"] verbs: ["list", "watch"] - apiGroups: ["batch"] resources: ["cronjobs", "jobs"] verbs: ["list", "watch"] - apiGroups: ["autoscaling"] resources: ["horizontalpodautoscalers"] verbs: ["list", "watch"] - apiGroups: [""] resources: ["configmaps","secrets"] verbs: ["list", "watch"] --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: kube-state-metrics roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: kube-state-metrics subjects: - kind: ServiceAccount name: kube-state-metrics namespace: kube-system EOF
kube-state-metrics的svc:
这里有一个注解,prometheus.io/scrape: 'true' 表示允许prometheus收集数据
cat> kube-state-metrics-svc.yaml <<EOF apiVersion: v1 kind: Service metadata: annotations: prometheus.io/scrape: 'true' name: kube-state-metrics namespace: kube-system labels: app: kube-state-metrics spec: ports: - name: kube-state-metrics port: 8080 protocol: TCP selector: app: kube-state-metrics EOF
kube-state-metrics的deployment:
cat >kube-state-metrics-deploy.yaml <<EOF apiVersion: apps/v1 kind: Deployment metadata: name: kube-state-metrics namespace: kube-system spec: replicas: 1 selector: matchLabels: app: kube-state-metrics template: metadata: labels: app: kube-state-metrics spec: serviceAccountName: kube-state-metrics containers: - name: kube-state-metrics # image: gcr.io/google_containers/kube-state-metrics-amd64:v1.3.1 image: quay.io/coreos/kube-state-metrics:v1.9.0 ports: - containerPort: 8080 EOF
三,
prometheus-server的部署
1,
prometheus-svc
cat >prometheus-cfg.yaml <<EOF --- kind: ConfigMap apiVersion: v1 metadata: labels: app: prometheus name: prometheus-config namespace: monitor-sa data: prometheus.yml: | global: scrape_interval: 15s scrape_timeout: 10s evaluation_interval: 1m scrape_configs: - job_name: 'kubernetes-node' kubernetes_sd_configs: - role: node relabel_configs: - source_labels: [__address__] regex: '(.*):10250' replacement: '${1}:9100' target_label: __address__ action: replace - action: labelmap regex: __meta_kubernetes_node_label_(.+) - job_name: 'kubernetes-node-cadvisor' kubernetes_sd_configs: - role: node scheme: https tls_config: ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token relabel_configs: - action: labelmap regex: __meta_kubernetes_node_label_(.+) - target_label: __address__ replacement: kubernetes.default.svc:443 - source_labels: [__meta_kubernetes_node_name] regex: (.+) target_label: __metrics_path__ replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor - job_name: 'kubernetes-apiserver' kubernetes_sd_configs: - role: endpoints scheme: https tls_config: ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token relabel_configs: - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name] action: keep regex: default;kubernetes;https EOF
2,
prometheus-svc:
cat >prometheus-svc.yaml <<EOF --- apiVersion: v1 kind: Service metadata: name: prometheus namespace: monitor-sa labels: app: prometheus spec: type: NodePort ports: - port: 9090 targetPort: 9090 protocol: TCP selector: app: prometheus component: server EOF
3,
cat >prometheus-deploy.yaml <<EOF --- apiVersion: apps/v1 kind: Deployment metadata: name: prometheus-server namespace: monitor-sa labels: app: prometheus spec: replicas: 2 selector: matchLabels: app: prometheus component: server #matchExpressions: #- {key: app, operator: In, values: [prometheus]} #- {key: component, operator: In, values: [server]} template: metadata: labels: app: prometheus component: server annotations: prometheus.io/scrape: 'false' spec: nodeName: node4 serviceAccountName: monitor containers: - name: prometheus image: prom/prometheus:v2.2.1 imagePullPolicy: IfNotPresent command: - prometheus - --config.file=/etc/prometheus/prometheus.yml - --storage.tsdb.path=/prometheus - --storage.tsdb.retention=720h ports: - containerPort: 9090 protocol: TCP volumeMounts: - mountPath: /etc/prometheus/prometheus.yml name: prometheus-config subPath: prometheus.yml - mountPath: /prometheus/ name: prometheus-storage-volume volumes: - name: prometheus-config configMap: name: prometheus-config items: - key: prometheus.yml path: prometheus.yml mode: 0644 - name: prometheus-storage-volume hostPath: path: /data type: Directory EOF
以上所有部署执行完毕后,查看prometheus-server的svc:
[root@node4 yaml]# k get svc -n monitor-sa NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE prometheus NodePort 10.96.0.120 <none> 9090:32661/TCP 10d
根据该port,打开浏览器,进入prometheus的web界面:
至此,kubernetes集群内的prometheus-server服务就安装完毕了!!!!!!
grafana默认安装就可以了,rpm方式安装,没什么好说的,主要是数据源设置如下: