Hello folks,今天我们介绍一下如何在 Kubernetes 集群环境中进行服务组件的升级与回滚,此处,我们以 Nginx 组件为例,基于 K3d 所搭建的环境进行。
通常意义上来讲,Kubernetes 应用部署中的滚动更新指的是一次只更新少量的 Pod,成功后再批量更新更多的 Pod,最后完成所有副本的更新。在实际的业务场景中,滚动更新具有重要意义,其的最大好处莫过于“零停机”,并且在整个更新过程中始终有一个副本在运行,减少停机风险,从而保证了业务的可持续性。
在本文中,我们将首先部署版本 Nginx v1.20.2,然后滚动更新至 v1.21.6,最后再回滚至 v1.21.4。(备注:此处所选取的 Nginx 版本目前官网已发布)现在,我们先搭建所需的 Kubernetes 集群环境,具体如下所示:
[leonli@192 ~ ] % k3d cluster create devops-cluster --port 8080:80@loadbalancer --port 8443:443@loadbalancer --api-port 6443 --servers 1 --agents 3 INFO[0000] portmapping '8080:80' targets the loadbalancer: defaulting to [servers:*:proxy agents:*:proxy] INFO[0000] portmapping '8443:443' targets the loadbalancer: defaulting to [servers:*:proxy agents:*:proxy] INFO[0000] Prep: Network INFO[0000] Re-using existing network 'k3d-devops-cluster' (6c31290d788a6e62783a5588d9b5b11bf24441c2d5f18d565952d58774f76e91) INFO[0000] Created image volume k3d-devops-cluster-images INFO[0000] Starting new tools node... INFO[0000] Starting Node 'k3d-devops-cluster-tools' INFO[0007] Creating node 'k3d-devops-cluster-server-0' INFO[0007] Creating node 'k3d-devops-cluster-agent-0' INFO[0007] Creating node 'k3d-devops-cluster-agent-1' INFO[0007] Creating node 'k3d-devops-cluster-agent-2' INFO[0007] Creating LoadBalancer 'k3d-devops-cluster-serverlb' INFO[0007] Using the k3d-tools node to gather environment information INFO[0008] Starting cluster 'devops-cluster' INFO[0008] Starting servers... INFO[0008] Starting Node 'k3d-devops-cluster-server-0' INFO[0012] Starting agents... INFO[0012] Starting Node 'k3d-devops-cluster-agent-2' INFO[0012] Starting Node 'k3d-devops-cluster-agent-1' INFO[0012] Starting Node 'k3d-devops-cluster-agent-0' INFO[0022] Starting helpers... INFO[0022] Starting Node 'k3d-devops-cluster-serverlb' INFO[0028] Injecting records for hostAliases (incl. host.k3d.internal) and for 5 network members into CoreDNS configmap... INFO[0030] Cluster 'devops-cluster' created successfully! INFO[0030] You can now use it like this: kubectl cluster-info
[leonli@192 ~ ] % kubectl get po -A -o wide NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES kube-system helm-install-traefik-crd--1-zpqr7 0/1 Completed 0 19h 10.42.0.2 k3d-devops-cluster-agent-0 <none> <none> kube-system helm-install-traefik--1-bh52c 0/1 Completed 2 19h 10.42.1.2 k3d-devops-cluster-server-0 <none> <none> kube-system svclb-traefik-5wwk7 2/2 Running 2 (69s ago) 19h 10.42.1.4 k3d-devops-cluster-server-0 <none> <none> kube-system svclb-traefik-rq5cd 2/2 Running 2 (59s ago) 19h 10.42.3.4 k3d-devops-cluster-agent-1 <none> <none> kube-system local-path-provisioner-84bb864455-ss7nn 1/1 Running 1 (59s ago) 19h 10.42.3.5 k3d-devops-cluster-agent-1 <none> <none> kube-system svclb-traefik-bsnf4 2/2 Running 2 (58s ago) 19h 10.42.0.5 k3d-devops-cluster-agent-0 <none> <none> kube-system svclb-traefik-9vm94 2/2 Running 2 (58s ago) 19h 10.42.2.5 k3d-devops-cluster-agent-2 <none> <none> kube-system coredns-96cc4f57d-vkv45 1/1 Running 1 (58s ago) 19h 10.42.2.7 k3d-devops-cluster-agent-2 <none> <none> kube-system traefik-55fdc6d984-wjlgr 1/1 Running 1 (58s ago) 19h 10.42.0.6 k3d-devops-cluster-agent-0 <none> <none> kube-system metrics-server-ff9dbcb6c-pzb2x 1/1 Running 1 (58s ago) 19h 10.42.2.6 k3d-devops-cluster-agent-2 <none> <none>
现在,我们为我们的应用程序创建部署文件,具体如下:
[leonli@192 update ] % vi nginx-update-roll.yml apiVersion: apps/v1 kind: Deployment metadata: name: nginx-dev namespace: default spec: replicas: 3 selector: matchLabels: app: nginx-dev template: metadata: name: nginx-dev labels: app: nginx-dev spec: containers: - name: nginx image: nginx:1.20.2 imagePullPolicy: IfNotPresent ports: - name: http containerPort: 80
现在,我们来部署 Nginx 应用程序,具体如下所示:
[leonli@192 update ] % kubectl create -f nginx-update-roll.yml deployment.apps/nginx-dev created
接下来,我们检查一下部署结果,具体如下:
[leonli@192 update ] % kubectl get deployment -owide NAME READY UP-TO-DATE AVAILABLE AGE CONTAINERS IMAGES SELECTOR nginx-dev 3/3 3 3 91s nginx nginx:1.20.2 app=nginx-dev
[leonli@192 update ] % kubectl get po NAME READY STATUS RESTARTS AGE nginx-dev-774658df4-kdkns 1/1 Running 0 2m16s nginx-dev-774658df4-6vkcr 1/1 Running 0 2m16s nginx-dev-774658df4-6fmpx 1/1 Running 0 2m16s
[leonli@192 update ] % kubectl get replicaset -o wide NAME DESIRED CURRENT READY AGE CONTAINERS IMAGES SELECTOR nginx-dev-774658df4 3 3 3 2m55s nginx nginx:1.20.2 app=nginx-dev,pod-template-hash=774658df4
现在让我们在 nginx-update-roll.yml 文件中将 Nginx 版本升级为 v1.21.6 并进行应用部署更新,具体操作如下:
[leonli@192 update ] % vi nginx-update-roll.yml apiVersion: apps/v1 kind: Deployment metadata: name: nginx-dev namespace: default spec: replicas: 3 selector: matchLabels: app: nginx-dev template: metadata: name: nginx-dev labels: app: nginx-dev spec: containers: - name: nginx image: nginx:1.21.6 imagePullPolicy: IfNotPresent ports: - name: http containerPort: 80
依次继续上述的步骤:
[leonli@192 update ] % kubectl apply -f nginx-update-roll.yml Warning: resource deployments/nginx-dev is missing the kubectl.kubernetes.io/last-applied-configuration annotation which is required by kubectl apply. kubectl apply should only be used on resources created declaratively by either kubectl create --save-config or kubectl apply. The missing annotation will be patched automatically. deployment.apps/nginx-dev configured
[leonli@192 update ] % kubectl get deployment -o wide NAME READY UP-TO-DATE AVAILABLE AGE CONTAINERS IMAGES SELECTOR nginx-dev 3/3 2 3 13m nginx nginx:1.21.6 app=nginx-dev [leonli@192 update ] % kubectl get deployment -o wide NAME READY UP-TO-DATE AVAILABLE AGE CONTAINERS IMAGES SELECTOR nginx-dev 3/3 3 3 14m nginx nginx:1.21.6 app=nginx-dev
基于上述的结果展示,我们可以看到:UP-TO-DATE 是 3 并且 IMAGES 是 nginx:1.21.6 。再一次查看副本集,我们发现更新前的 nginx-dev-774658df4 副本集的 DESIRED 变为 0 。如下所示:
[li@192 update ] % kubectl get replicaset -owide NAME DESIRED CURRENT READY AGE CONTAINERS IMAGES SELECTOR nginx-dev-65ccd6889 3 3 3 3m50s nginx nginx:1.21.6 app=nginx-dev,pod-template-hash=65ccd6889 nginx-dev-774658df4 0 0 0 14m nginx nginx:1.20.2 app=nginx-dev,pod-template-hash=774658df4
接下来,我们来看一下 Nginx 整个滚动更新过程,具体如下:
[leonli@192 update ] % kubectl describe deployment nginx-dev Name: nginx-dev Namespace: default CreationTimestamp: Sat, 14 May 2022 20:53:57 +0800 Labels: <none> Annotations: deployment.kubernetes.io/revision: 2 Selector: app=nginx-dev Replicas: 3 desired | 3 updated | 3 total | 3 available | 0 unavailable StrategyType: RollingUpdate MinReadySeconds: 0 RollingUpdateStrategy: 25% max unavailable, 25% max surge Pod Template: Labels: app=nginx-dev Containers: nginx: Image: nginx:1.21.6 Port: 80/TCP Host Port: 0/TCP Environment: <none> Mounts: <none> Volumes: <none> Conditions: Type Status Reason ---- ------ ------ Available True MinimumReplicasAvailable Progressing True NewReplicaSetAvailable OldReplicaSets: <none> NewReplicaSet: nginx-dev-65ccd6889 (3/3 replicas created) Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal ScalingReplicaSet 20m deployment-controller Scaled up replica set nginx-dev-774658df4 to 3 Normal ScalingReplicaSet 9m28s deployment-controller Scaled up replica set nginx-dev-65ccd6889 to 1 Normal ScalingReplicaSet 9m27s deployment-controller Scaled down replica set nginx-dev-774658df4 to 2 Normal ScalingReplicaSet 9m27s deployment-controller Scaled up replica set nginx-dev-65ccd6889 to 2 Normal ScalingReplicaSet 9m26s deployment-controller Scaled down replica set nginx-dev-774658df4 to 1 Normal ScalingReplicaSet 9m26s deployment-controller Scaled up replica set nginx-dev-65ccd6889 to 3 Normal ScalingReplicaSet 9m25s deployment-controller Scaled down replica set nginx-dev-774658df4 to 0
基于官方所述,每次更新更换的 Pod 数量是可定义的,Deployment 支持自定义控制更新过程中的滚动节奏,如“暂停”或“继续”更新操作。Kubernetes 提供了两个参数 maxSurge 和 maxUnavailable 来精细控制 Pod 替换的数量。
1、spec.strategy.rollingUpdate.maxSurge:指定更新期间存在的总 Pod 副本数最多可超出期望值 spec.replicas 的个数,默认是 1,也可以是 0 或其他正整数。
2、spec.strategy.rollingUpdate.maxUnavailable:升级期间不可用的 Pod 副本数,默认是 1,也可以是 0 或其他正整数。
其实,从本质上来讲,当我们对应用程序进行 kubectl apply 命令操作时,Kubernetes 会记录当前配置并将其保存为修订版,以便可以回滚到特定修订版。
默认情况下,Kubernetes 只会保留最新的修订版本。我们可以通过 Deployment 配置文件中的属性来 revisionHistoryLimit 增加修订数。
接下来,我们基于当前的 Nginx v1.21.6 进行回滚操作,将其回退至 Nginx v1.21.4,具体如下:
[leonli@192 update ] % vi nginx-roll-update.yml apiVersion: apps/v1 kind: Deployment metadata: name: nginx-dev namespace: default spec: revisionHistoryLimit: 5 replicas: 3 selector: matchLabels: app: nginx-dev template: metadata: name: nginx-dev labels: app: nginx-dev spec: containers: - name: nginx image: nginx:1.21.4 imagePullPolicy: IfNotPresent ports: - name: http containerPort: 80
[leonli@192 update ] % kubectl apply -f nginx-roll-update.yml --record Flag --record has been deprecated, --record will be removed in the future deployment.apps/nginx-dev configured
[leonli@192 update ] % kubectl get deployment nginx-dev -owide NAME READY UP-TO-DATE AVAILABLE AGE CONTAINERS IMAGES SELECTOR nginx-dev 3/3 3 3 51m nginx nginx:1.21.4 app=nginx-dev
此次,我们通过添加 --record 参数进行 kubectl apply 命令行操作,基于此参数,我们可以将当前的命令记录到修订记录中,这样我们就可以知道每个修订对应的是哪个配置文件,有利于对整个流程进行把握。
[leonli@192 update ] % kubectl rollout history deployment nginx-dev deployment.apps/nginx-dev REVISION CHANGE-CAUSE 1 <none> 2 <none> 3 kubectl apply --filename=nginx-roll-update.yml --record=true
此时,变更历史也会随之更新。通常情况下,我们可以通过 CHANGE-CAUSE 来追踪每一次的变化,实时掌握每次变更的具体内容。
基于上述结果,我们可以看到,此时 Nginx 已回滚至 v1.21.4,一个简单的容器升级回滚部署操作到此结束。综上所述,利用 Deployment 的滚动更新策略 maxSurge 和 maxUnavailable 设置最大可超期望的节点数和最大不可用节点数可实现简单的金丝雀发布。然而,在实际的业务场景中,此方案仅适用于单个应用的金丝雀发布,如果是前后端应用就显得捉襟见肘。
至此,一个简单的 Demo 先解析到此为止,希望大家有所收获!
Adiós !