3.1.5 创建prometheus svc
会生成一个CLUSTER-IP
进行集群内部的访问,CLUSTER-IP也可以自己指定。使用以下命令创建Prometheus要用的service:
$ kubectl create -f prometheus-service.yaml
prometheus-service.yaml文件内容如下:
apiVersion: v1 kind: Service metadata: name: prometheus namespace: monitoring labels: app: prometheus component: core annotations: prometheus.io/scrape: 'true' spec: ports: - port: 9090 targetPort: 9090 protocol: TCP name: webui selector: app: prometheus component: core
查看已创建的名为prometheus的service:
$ kubectl get svc prometheus -n monitoring NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE prometheus ClusterIP 10.98.66.13 <none> 9090/TCP 11s
3.1.6 deployment方式创建prometheus实例
可以根据自己的环境需求选择部署节点,我计划部署在node1
$ kubectl label node node1 app=prometheus $ kubectl label node node1 component=core $ kubectl create -f prometheus-deploy.yaml
prometheus-deploy.yaml文件内容如下:
apiVersion: apps/v1 kind: Deployment metadata: name: prometheus-core namespace: monitoring labels: app: prometheus component: core spec: replicas: 1 selector: matchLabels: app: prometheus component: core template: metadata: name: prometheus-main labels: app: prometheus component: core spec: serviceAccountName: prometheus-k8s # nodeSelector: # kubernetes.io/hostname: 192.168.211.40 containers: - name: prometheus image: zqdlove/prometheus:v2.0.0 args: - '--storage.tsdb.retention=15d' - '--config.file=/etc/prometheus/prometheus.yaml' - '--storage.tsdb.path=/home/prometheus_data' - '--web.enable-lifecycle' ports: - name: webui containerPort: 9090 resources: requests: cpu: 1000m memory: 1000M limits: cpu: 1000m memory: 1000M securityContext: privileged: true volumeMounts: - name: data mountPath: /home/prometheus_data - name: config-volume mountPath: /etc/prometheus - name: rules-volume mountPath: /etc/prometheus-rules - name: time mountPath: /etc/localtime volumes: - name: data hostPath: path: /home/cdnadmin/prometheus_data - name: config-volume configMap: name: prometheus-core - name: rules-volume configMap: name: prometheus-rules - name: time hostPath: path: /etc/localtime
使用以下命令查看已创建的名字为prometheus-core的deployment的状态:
$ kubectl get deployments.apps -n monitoring NAME READY UP-TO-DATE AVAILABLE AGE prometheus-core 1/1 1 1 75s $ kubectl get pods -n monitoring NAME READY STATUS RESTARTS AGE prometheus-core-6544fbc888-m58hf 1/1 Running 0 78s
返回信息表示部署期望的pod有1个,当前有1个,更新到最新状态的有1个,可用的有1个,pod当前的年龄是1天。
3.1.7 创建prometheus ingress实现外部域名访问
使用以下命令创建Ingress:
$ kubectl create -f prometheus_Ingress.yaml
prometheus_Ingress.yaml文件内容如下:
apiVersion: extensions/v1beta1 kind: Ingress metadata: name: traefik-prometheus namespace: monitoring spec: rules: - host: prometheus.test.com http: paths: - path: / backend: serviceName: prometheus servicePort: 9090
将prometheus.test.com域名解析到Ingress服务器,此时可以通过prometheus.test.com访问Prometheus的监控数据的界面了。
使用以下命令查看已创建Ingress的状态:
$ kubectl get ing traefik-prometheus -n monitoring NAME CLASS HOSTS ADDRESS PORTS AGE traefik-prometheus <none> prometheus.test.com 80 52s
3.1.8 测试登录prometheus
将prometheus.test.com
解析到Ingress服务器,此时可以通过grafana.test.com
访问Grafana的监控展示的界面。
linux文件/etc/hosts添加:
#任意node_ip 192.168.211.41 prometheus.test.com
执行:30304
是nginx-ingress的统一对外开方端口,【kubernets如何集群安装ningx-ingress】
$ curl prometheus.test.com:30304 <a href="/graph">Found</a>.
windows添加C:\Windows\System32\drivers\etc\hosts
192.168.211.41 prometheus.test.com
3.2 Kubernetes部署kube-state-metrics
kube-state-metrics
使用名为monitoring的命名空间,在上节已创建,不需要再次创建。创
3.2.1 创建RBAC
包含ServiceAccount
、ClusterRole
、ClusterRoleBinding
三类YAML文件,本节RBAC内容结构和上节中内容类似。使用以下命令创建kube-state-metrics RBAC:
$ kubectl create -f kube-state-metrics-rbac.yaml
kube-state-metrics-rbac.yaml
文件内容如下:
apiVersion: v1 kind: ServiceAccount metadata: name: kube-state-metrics namespace: monitoring --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: kube-state-metrics rules: - apiGroups: [""] resources: ["nodes","pods","services","resourcequotas","replicationcontrollers","limitranges"] verbs: ["list", "watch"] - apiGroups: ["extensions"] resources: ["daemonsets","deployments","replicasets"] verbs: ["list", "watch"] - apiGroups: ["batch/v1"] resources: ["job"] verbs: ["list", "watch"] - apiGroups: ["v1"] resources: ["persistentvolumeclaim"] verbs: ["list", "watch"] - apiGroups: ["apps"] resources: ["statefulset"] verbs: ["list", "watch"] - apiGroups: ["batch/v2alpha1"] resources: ["cronjob"] verbs: ["list", "watch"] --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: kube-state-metrics roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: kube-state-metrics # name: cluster-admin subjects: - kind: ServiceAccount name: kube-state-metrics namespace: monitoring
使用以下命令确认RBAC是否创建成功,命令分别获取已创建的ServiceAccount、ClusterRole、ClusterRoleBinding:
$ kubectl get sa kube-state-metrics -n monitoring NAME SECRETS AGE kube-state-metrics 1 20s $ kubectl get clusterrole kube-state-metrics NAME CREATED AT kube-state-metrics 2021-01-19T08:44:39Z $ kubectl get clusterrolebinding kube-state-metrics NAME ROLE AGE kube-state-metrics ClusterRole/kube-state-metrics 32s
3.2.2 创建kube-state-metrics Service
kubectl create -f kube-state-metrics-service.yaml
kube-state-metrics-service.yaml
文件内容如下:
apiVersion: v1 kind: Service metadata: annotations: prometheus.io/scrape: 'true' name: kube-state-metrics namespace: monitoring labels: app: kube-state-metrics spec: ports: - name: kube-state-metrics port: 8080 protocol: TCP selector: app: kube-state-metrics
使用以下命令查看名为kube-state-metrics的Service:
$ kubectl get svc kube-state-metrics -n monitoring NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE kube-state-metrics ClusterIP 10.101.0.15 <none> 8080/TCP 41s
3.2.3 创建kube-state-metrics的deployment
用来部署kube-state-metrics 容器:
可以根据自己的环境需求选择部署节点,我计划部署在node2
$ kubectl label node node2 app=kube-state-metrics $ kubectl create -f kube-state-metrics-deploy.yaml
kube-state-metrics-deploy.yaml
文件内容如下:
apiVersion: apps/v1 kind: Deployment metadata: name: kube-state-metrics namespace: monitoring spec: replicas: 1 selector: matchLabels: app: kube-state-metrics template: metadata: labels: app: kube-state-metrics spec: serviceAccountName: kube-state-metrics nodeSelector: type: k8smaster containers: - name: kube-state-metrics image: zqdlove/kube-state-metrics:v1.0.1 ports: - containerPort: 8080
使用以下命令查看monitoring命名空间下名为kube-state-metrics的deployment的状态信息:
kubectl get deployment kube-state-metrics -n monitoring NAME READY UP-TO-DATE AVAILABLE AGE kube-state-metrics 1/1 1 1 8m9s
3.2.4 prometheus配置kube-state-metrics 的target
prometheus relabel_configs 实现自定义标签及分类
3.3 Kubernetes部署node-exporter
在Prometheus中负责数据汇报的程序统一称为Exporter,而不同的Exporter负责不同的业务。它们具有统一命名格式,即xx_exporter,例如,负责主机信息收集的node_exporter。本节为安装node_exporter的教程。node_exporter主要用于*NIX系统监控,用Golang编写。
node-exporter使用名为monitoring的命名空间,上节已创建。
3.3.1 部署node-exporter service
$ kubectl create -f node_exporter-service.yaml
node_exporter-service.yaml
文件内容如下:
apiVersion: v1 kind: Service metadata: annotations: prometheus.io/scrape: 'true' name: prometheus-node-exporter namespace: monitoring labels: app: prometheus component: node-exporter spec: clusterIP: None ports: - name: prometheus-node-exporter port: 9100 protocol: TCP selector: app: prometheus component: node-exporter type: ClusterIP
使用以下命令查看monitoring命名空间下名为prometheus-node-exporter的service:
$ kubectl get svc prometheus-node-exporter -n monitoring NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE prometheus-node-exporter ClusterIP None <none> 9100/TCP 20s
3.3.2 daemonset方式创建node-exporter容器
$ kubectl label node --all node-exporter=node-exporter $ kubectl create -f node_exporter-daemonset.yaml
查看monitoring命令空间下名为prometheus-node-exporter的daemonset的状态,命令如下:
$ kubectl get ds prometheus-node-exporter -n monitoring NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE prometheus-node-exporter 2 2 2 2 2
node_exporter-daemonset.yaml
文件详细内容如下:
apiVersion: apps/v1 kind: DaemonSet metadata: name: prometheus-node-exporter namespace: monitoring labels: # app: prometheus # component: node-exporter node-exporter: node-exporter spec: selector: matchLabels: # app: prometheus node-exporter: node-exporter #component: node-exporter template: metadata: name: prometheus-node-exporter labels: node-exporter: node-exporter #app: prometheus #component: node-exporter spec: nodeSelector: node-exporter: node-exporter containers: - image: zqdlove/node-exporter:v0.16.0 name: prometheus-node-exporter ports: - name: prom-node-exp #^ must be an IANA_SVC_NAME (at most 15 characters, ..) containerPort: 9100 # hostPort: 9100 resources: requests: # cpu: 20000m cpu: "0.6" memory: 100M limits: cpu: "0.6" #cpu: 20000m memory: 100M securityContext: privileged: true command: - /bin/node_exporter - --path.procfs - /host/proc - --path.sysfs - /host/sys - --collector.filesystem.ignored-mount-points - ^/(sys|proc|dev|host|etc)($|/) volumeMounts: - name: dev mountPath: /host/dev - name: proc mountPath: /host/proc - name: sys mountPath: /host/sys - name: root mountPath: /rootfs tolerations: - key: "node-role.kubernetes.io/master" operator: "Exists" effect: "NoSchedule" volumes: - name: dev hostPath: path: /dev - name: proc hostPath: path: /proc - name: sys hostPath: path: /sys - name: root hostPath: path: / # affinity: # nodeAffinity: # requiredDuringSchedulingIgnoredDuringExecution: # nodeSelectorTerms: # - matchExpressions: # - key: kubernetes.io/hostname # operator: NotIn # values: # - $YOUR_IP # hostNetwork: true hostIPC: true hostPID: true
node_exporter-daemonset.yaml
文件说明:
- hostPID:true - hostIPC:true - hostNetwork:true
这三个配置主要用于主机的PID namespace
、IPC namespace
以及主机网络
tolerations: - key: "node-role.kubernetes.io/master" operator: "Exists" effect: "NoSchedule"
$ kubectl describe nodes |grep Taint Taints: node-role.kubernetes.io/master:NoSchedule Taints: <none> Taints: <none>
查看有污点的节点为master,如果想把daemonset pod部署到master,需要容忍这个节点的污点,也可以称之为过滤。具体关于容忍度与污点详解请参考:kubernetes 【调度和驱逐】【1】污点和容忍度
3.3.3 prometheus配置node-exporter的target
prometheus relabel_configs 实现自定义标签及分类
3.4 Kubernetes部署Grafana
Grafana使用名为monitoring的命名空间,前面小节已经创建,不需要再次创建.
3.4.1 创建Grafana Service
$ kubectl create -f grafana-service.yaml
grafana-service.yaml
文件内容如下:
apiVersion: v1 kind: Service metadata: name: grafana namespace: monitoring labels: app: grafana component: core spec: ports: - port: 3000 selector: app: grafana component: core
使用以下命令查看monitoring命令空间下名为grafana的service的信息:
$ kubectl get svc grafana -n monitoring NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE grafana ClusterIP 10.109.232.15 <none> 3000/TCP 18s
3.4.2 deployment方式部署Grafana
根据自己需求选择部署的节点,我计划在node2
$ kubectl label node node2 grafana=grafana #要预先配置好grafana的配置文件
node2执行
$ docker run -tid zqdlove/grafana:v5.0.0 --name grafana-tmp bash $ docker cp practical_morse:/et/grafana/grafana.ini /etc/grafana/ $ docker kill grafana-tmp $ docker rm grafana-tmp $ useradd grafana $ chown grafana:grafana /etc/grafana/grafana.ini $ chmod 777 /etc/grafana/grafana.ini
master执行
$ kubectl create -f grafana-deploy.yaml
grafana-deploy.yaml
文件内容如下:
apiVersion: apps/v1 kind: Deployment metadata: name: grafana-core namespace: monitoring labels: app: grafana component: core spec: replicas: 1 selector: matchLabels: app: grafana component: core template: metadata: labels: app: grafana component: core spec: nodeSelector: #kubernetes.io/hostname: 192.168.211.42 grafana: grafana containers: - image: zqdlove/grafana:v5.0.0 name: grafana-core imagePullPolicy: IfNotPresent #securityContext: # privileged: true # env: resources: # keep request = limit to keep this container in guaranteed class limits: cpu: 500m memory: 1200Mi requests: cpu: 500m memory: 1200Mi env: # The following env variables set up basic auth twith the default admin user and admin password. - name: GF_AUTH_BASIC_ENABLED value: "true" - name: GF_AUTH_ANONYMOUS_ENABLED value: "false" # - name: GF_AUTH_ANONYMOUS_ORG_ROLE # value: Admin # does not really work, because of template variables in exported dashboards: # - name: GF_DASHBOARDS_JSON_ENABLED # value: "true" readinessProbe: httpGet: path: /login port: 3000 # initialDelaySeconds: 30 # timeoutSeconds: 1 volumeMounts: - name: grafana-persistent-storage mountPath: /var - name: grafana mountPath: /etc/grafana imagePullSecrets: - name: bjregistry volumes: - name: grafana-persistent-storage emptyDir: {} - name: grafana hostPath: path: /etc/grafana
查看monitoring命令空间下名为grafana-core的deployment的状态,信息如下:
$ kubectl get deployment grafana-core -n monitoring NAME READY UP-TO-DATE AVAILABLE AGE grafana-core 1/1 1 1 8m32s
3.4.3 创建grafana ingress实现外部域名访问
$ kubectl create -f grafana-ingress.yaml
grafana-ingress.yaml文件内容如下:
apiVersion: extensions/v1beta1 kind: Ingress metadata: name: traefik-grafana namespace: monitoring spec: rules: - host: grafana.test.com http: paths: - path: / backend: serviceName: grafana servicePort: 3000
查看monitoring命名空间下名为traefik-grafana的Ingress,使用以下命令:
$ kubectl get ingress traefik-grafana -n monitoring NAME CLASS HOSTS ADDRESS PORTS AGE traefik-grafana <none> grafana.test.com 80 30s
3.4.4 测试登录grafana
将grafana.test.com
解析到Ingress服务器,此时可以通过grafana.test.com访问Grafana的监控展示的界面。
linux文件/etc/hosts添加:
192.168.211.41 grafana.test.com
执行:30304
是nginx-ingress的统一对外开方端口
$ curl grafana.test.com:30304 <a href="/login">Found</a>.
windows添加C:\Windows\System32\drivers\etc\hosts
192.168.211.41 grafana.test.com