K8S自己动手系列 - 1.2 - 节点管理

简介: 上篇文章我们从零搭建了一个1主2从的集群,实际生产实践中,经常要面对的可能就是节点管理了,本文对节点管理部分常用的功能进行实战操作。 主要包含: - 节点状态 - 节点通信及节点宕机 - 移除节点 - 增加节点

节点管理

节点状态

please refer: https://kubernetes.io/docs/concepts/architecture/nodes/#node-status
这里我们重点关注下Condition部分
如文档描述

节点失联 or 节点宕机

查看节点列表

  ~ kubectl get node -o wide
NAME       STATUS   ROLES    AGE   VERSION   INTERNAL-IP       EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION      CONTAINER-RUNTIME
worker01   Ready    master   21h   v1.14.3   192.168.101.113   <none>        Ubuntu 16.04.6 LTS   4.4.0-150-generic   docker://18.9.5
worker02   Ready    <none>   21h   v1.14.3   192.168.100.117   <none>        Ubuntu 16.04.6 LTS   4.4.0-150-generic   docker://18.9.5
# 查看pod和其运行所在节点
  ~ kubectl get pods -o wide
NAME                    READY   STATUS    RESTARTS   AGE   IP            NODE       NOMINATED NODE   READINESS GATES
nginx-6657c9ffc-6q2pt   1/1     Running   1          19h   10.244.0.11   worker01   <none>           <none>
nginx-6657c9ffc-msd6t   1/1     Running   0          19h   10.244.1.17   worker02   <none>           <none>

现在我们使worker02关机

  ~ kubectl get pod -o wide
NAME                    READY   STATUS    RESTARTS   AGE   IP            NODE       NOMINATED NODE   READINESS GATES
nginx-6657c9ffc-6q2pt   1/1     Running   1          20h   10.244.0.11   worker01   <none>           <none>
nginx-6657c9ffc-msd6t   1/1     Running   0          20h   10.244.1.17   worker02   <none>           <none>
  ~ kubectl get node -o wide
NAME       STATUS   ROLES    AGE   VERSION   INTERNAL-IP       EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION      CONTAINER-RUNTIME
worker01   Ready    master   22h   v1.14.3   192.168.101.113   <none>        Ubuntu 16.04.6 LTS   4.4.0-150-generic   docker://18.9.5
worker02   Ready    <none>   21h   v1.14.3   192.168.100.117   <none>        Ubuntu 16.04.6 LTS   4.4.0-150-generic   docker://18.9.5
  ~ ping 192.168.100.117
PING 192.168.100.117 (192.168.100.117) 56(84) bytes of data.
^C
--- 192.168.100.117 ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 1007ms

可以看到虽然查看状态还是正常,实际上节点已经无法ping通,再次查看,发现节点已经处于NotReady状态

  ~ kubectl get node -o wide
NAME       STATUS     ROLES    AGE   VERSION   INTERNAL-IP       EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION      CONTAINER-RUNTIME
worker01   Ready      master   22h   v1.14.3   192.168.101.113   <none>        Ubuntu 16.04.6 LTS   4.4.0-150-generic   docker://18.9.5
worker02   NotReady   <none>   21h   v1.14.3   192.168.100.117   <none>        Ubuntu 16.04.6 LTS   4.4.0-150-generic   docker://18.9.5
  ~ kubectl get pod -o wide 
NAME                    READY   STATUS    RESTARTS   AGE   IP            NODE       NOMINATED NODE   READINESS GATES
nginx-6657c9ffc-6q2pt   1/1     Running   1          20h   10.244.0.11   worker01   <none>           <none>
nginx-6657c9ffc-msd6t   1/1     Running   0          20h   10.244.1.17   worker02   <none>           <none>

查看节点详情,发现节点已经被打上node.kubernetes.io/unreachable:NoExecute和node.kubernetes.io/unreachable:NoSchedule两个污点,conditions也随之变化

  ~ kubectl describe node worker02
Name:             worker02
Roles:              <none>
Labels:            beta.kubernetes.io/arch=amd64
...
Annotations:    flannel.alpha.coreos.com/backend-data: {"VtepMAC":"4e:0a:b4:88:0d:83"}
...
CreationTimestamp:  Sat, 08 Jun 2019 12:42:00 +0800
Taints:             node.kubernetes.io/unreachable:NoExecute
                       node.kubernetes.io/unreachable:NoSchedule
Unschedulable:      false
Conditions:
  Type             Status    LastHeartbeatTime                 LastTransitionTime                Reason              Message
  ----             ------    -----------------                 ------------------                ------              -------
  MemoryPressure   Unknown   Sun, 09 Jun 2019 10:03:52 +0800   Sun, 09 Jun 2019 10:04:52 +0800   NodeStatusUnknown   Kubelet stopped posting node status.
  DiskPressure     Unknown   Sun, 09 Jun 2019 10:03:52 +0800   Sun, 09 Jun 2019 10:04:52 +0800   NodeStatusUnknown   Kubelet stopped posting node status.
  PIDPressure      Unknown   Sun, 09 Jun 2019 10:03:52 +0800   Sun, 09 Jun 2019 10:04:52 +0800   NodeStatusUnknown   Kubelet stopped posting node status.
  Ready            Unknown   Sun, 09 Jun 2019 10:03:52 +0800   Sun, 09 Jun 2019 10:04:52 +0800   NodeStatusUnknown   Kubelet stopped posting node status.

再次查看pod,发现已经被调度到其他节点

  ~ kubectl get pods -o wide
NAME                    READY   STATUS        RESTARTS   AGE   IP            NODE       NOMINATED NODE   READINESS GATES
nginx-6657c9ffc-6q2pt   1/1     Running       1          20h   10.244.0.11   worker01   <none>           <none>
nginx-6657c9ffc-msd6t   1/1     Terminating   0          20h   10.244.1.17   worker02   <none>           <none>
nginx-6657c9ffc-n9s4p   1/1     Running       0          71s   10.244.0.14   worker01   <none>           <none>

查看正在terminating的pod状态:

  ~ kubectl describe pod/nginx-6657c9ffc-msd6t
Name:                      nginx-6657c9ffc-msd6t
Node:                      worker02/192.168.100.117
Status:                    Terminating (lasts 2m5s)
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:          <none>

发现两个污点容忍tolerations选项,解释了为什么节点处于NotReady后,pod还是running状态一段时间后被terminate重建
更多细节,please refer https://kubernetes.io/docs/concepts/architecture/nodes/#node-controller

移除节点

为了实验效果,先把pod扩容到10个

  ~ kubectl scale --replicas=10 deployment/nginx
deployment.extensions/nginx scaled

  ~ kubectl get pod -o wide                     
NAME                    READY   STATUS    RESTARTS   AGE   IP            NODE       NOMINATED NODE   READINESS GATES
nginx-6657c9ffc-6lfbp   1/1     Running   0          8s    10.244.1.20   worker02   <none>           <none>
nginx-6657c9ffc-6q2pt   1/1     Running   1          20h   10.244.0.11   worker01   <none>           <none>
nginx-6657c9ffc-8cbpl   1/1     Running   0          8s    10.244.1.23   worker02   <none>           <none>
nginx-6657c9ffc-dbr7c   1/1     Running   0          8s    10.244.1.22   worker02   <none>           <none>
nginx-6657c9ffc-kk84d   1/1     Running   0          8s    10.244.1.21   worker02   <none>           <none>
nginx-6657c9ffc-kp64c   1/1     Running   0          8s    10.244.0.16   worker01   <none>           <none>
nginx-6657c9ffc-n9s4p   1/1     Running   0          40m   10.244.0.14   worker01   <none>           <none>
nginx-6657c9ffc-pxbs2   1/1     Running   0          8s    10.244.1.19   worker02   <none>           <none>
nginx-6657c9ffc-sb85v   1/1     Running   0          8s    10.244.1.18   worker02   <none>           <none>
nginx-6657c9ffc-sxb25   1/1     Running   0          8s    10.244.0.15   worker01   <none>           <none>

使用drain放干节点,然后删除节点

  ~ kubectl drain worker02 --force --grace-period=900 --ignore-daemonsets
node/worker02 already cordoned
WARNING: ignoring DaemonSet-managed Pods: kube-system/kube-flannel-ds-amd64-crq7k, kube-system/kube-proxy-x9h7m
evicting pod "nginx-6657c9ffc-sb85v"
evicting pod "nginx-6657c9ffc-dbr7c"
evicting pod "nginx-6657c9ffc-6lfbp"
evicting pod "nginx-6657c9ffc-8cbpl"
evicting pod "nginx-6657c9ffc-kk84d"
evicting pod "nginx-6657c9ffc-pxbs2"
pod/nginx-6657c9ffc-kk84d evicted
pod/nginx-6657c9ffc-dbr7c evicted
pod/nginx-6657c9ffc-8cbpl evicted
pod/nginx-6657c9ffc-pxbs2 evicted
pod/nginx-6657c9ffc-sb85v evicted
pod/nginx-6657c9ffc-6lfbp evicted
node/worker02 evicted

# 执行完后发现所有pod已经运行在worker01上了
  ~ kubectl get pod -o wide   
NAME                    READY   STATUS    RESTARTS   AGE     IP            NODE       NOMINATED NODE   READINESS GATES
nginx-6657c9ffc-45svt   1/1     Running   0          63s     10.244.0.18   worker01   <none>           <none>
nginx-6657c9ffc-489zz   1/1     Running   0          63s     10.244.0.21   worker01   <none>           <none>
nginx-6657c9ffc-49vqv   1/1     Running   0          63s     10.244.0.20   worker01   <none>           <none>
nginx-6657c9ffc-6q2pt   1/1     Running   1          20h     10.244.0.11   worker01   <none>           <none>
nginx-6657c9ffc-fq7gg   1/1     Running   0          63s     10.244.0.22   worker01   <none>           <none>
nginx-6657c9ffc-kl6j9   1/1     Running   0          63s     10.244.0.19   worker01   <none>           <none>
nginx-6657c9ffc-kp64c   1/1     Running   0          4m31s   10.244.0.16   worker01   <none>           <none>
nginx-6657c9ffc-n9s4p   1/1     Running   0          44m     10.244.0.14   worker01   <none>           <none>
nginx-6657c9ffc-ssmph   1/1     Running   0          63s     10.244.0.17   worker01   <none>           <none>
nginx-6657c9ffc-sxb25   1/1     Running   0          4m31s   10.244.0.15   worker01   <none>           <none>
# 删除节点
  ~ kubectl delete node worker02
node "worker02" deleted
  ~ kubectl get node
NAME       STATUS   ROLES    AGE   VERSION
worker01   Ready    master   22h   v1.14.3

更多详情 please refer https://kubernetes.io/docs/concepts/workloads/pods/disruptions/

加入节点

使用kubeadm初始化master节点后,会有一段提示,说明如何加入新节点,那个务必要保存好
接下来我们将刚刚被下线的worker02重置,然后重新加入集群

  ~ kubeadm reset
[reset] WARNING: Changes made to this host by 'kubeadm init' or 'kubeadm join' will be reverted.
[reset] Are you sure you want to proceed? [y/N]: y
[preflight] Running pre-flight checks
W0609 11:00:23.206928    7376 reset.go:234] [reset] No kubeadm config, using etcd pod spec to get data directory
[reset] No etcd config found. Assuming external etcd
[reset] Please manually reset etcd to prevent further issues
[reset] Stopping the kubelet service
[reset] unmounting mounted directories in "/var/lib/kubelet"
[reset] Deleting contents of stateful directories: [/var/lib/kubelet /etc/cni/net.d /var/lib/dockershim /var/run/kubernetes]
[reset] Deleting contents of config directories: [/etc/kubernetes/manifests /etc/kubernetes/pki]
[reset] Deleting files: [/etc/kubernetes/admin.conf /etc/kubernetes/kubelet.conf /etc/kubernetes/bootstrap-kubelet.conf /etc/kubernetes/controller-manager.conf /etc/kubernetes/scheduler.conf]

The reset process does not reset or clean up iptables rules or IPVS tables.
If you wish to reset iptables, you must do so manually.
For example:
iptables -F && iptables -t nat -F && iptables -t mangle -F && iptables -X

If your cluster was setup to utilize IPVS, run ipvsadm --clear (or similar)
to reset your system's IPVS tables.

  ~ iptables -F && iptables -t nat -F && iptables -t mangle -F && iptables -X

执行kubeadm join

  ~ kubeadm join 192.168.101.113:6443 --token ss6flg.csw4u0ok134n2fy1 \      
    --discovery-token-ca-cert-hash sha256:bac9a150228342b7cdedf39124ef2108653db1f083e9f547d251e08f03c41945
[preflight] Running pre-flight checks
    [WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
[kubelet-start] Downloading configuration for the kubelet from the "kubelet-config-1.14" ConfigMap in the kube-system namespace
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Activating the kubelet service
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...

This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.

Run 'kubectl get nodes' on the control-plane to see this node join the cluster.

查看节点,发现已经添加成功

  ~ kubectl get node -o wide
NAME       STATUS   ROLES    AGE   VERSION   INTERNAL-IP       EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION      CONTAINER-RUNTIME
worker01   Ready    master   23h   v1.14.3   192.168.101.113   <none>        Ubuntu 16.04.6 LTS   4.4.0-150-generic   docker://18.9.5
worker02   Ready    <none>   33s   v1.14.3   192.168.100.117   <none>        Ubuntu 16.04.6 LTS   4.4.0-150-generic   docker://18.9.5
相关实践学习
深入解析Docker容器化技术
Docker是一个开源的应用容器引擎,让开发者可以打包他们的应用以及依赖包到一个可移植的容器中,然后发布到任何流行的Linux机器上,也可以实现虚拟化,容器是完全使用沙箱机制,相互之间不会有任何接口。Docker是世界领先的软件容器平台。开发人员利用Docker可以消除协作编码时“在我的机器上可正常工作”的问题。运维人员利用Docker可以在隔离容器中并行运行和管理应用,获得更好的计算密度。企业利用Docker可以构建敏捷的软件交付管道,以更快的速度、更高的安全性和可靠的信誉为Linux和Windows Server应用发布新功能。 在本套课程中,我们将全面的讲解Docker技术栈,从环境安装到容器、镜像操作以及生产环境如何部署开发的微服务应用。本课程由黑马程序员提供。 &nbsp; &nbsp; 相关的阿里云产品:容器服务 ACK 容器服务 Kubernetes 版(简称 ACK)提供高性能可伸缩的容器应用管理能力,支持企业级容器化应用的全生命周期管理。整合阿里云虚拟化、存储、网络和安全能力,打造云端最佳容器化应用运行环境。 了解产品详情: https://www.aliyun.com/product/kubernetes
目录
相关文章
|
Kubernetes API 调度
k8s中节点无法启动Pod
【10月更文挑战第3天】
451 6
|
8月前
|
Kubernetes API 网络安全
当node节点kubectl 命令无法连接到 Kubernetes API 服务器
当Node节点上的 `kubectl`无法连接到Kubernetes API服务器时,可以通过以上步骤逐步排查和解决问题。首先确保网络连接正常,验证 `kubeconfig`文件配置正确,检查API服务器和Node节点的状态,最后排除防火墙或网络策略的干扰,并通过重启服务恢复正常连接。通过这些措施,可以有效解决与Kubernetes API服务器通信的常见问题,从而保障集群的正常运行。
591 17
|
8月前
|
Kubernetes Shell Windows
【Azure K8S | AKS】在AKS的节点中抓取目标POD的网络包方法分享
在AKS中遇到复杂网络问题时,可通过以下步骤进入特定POD抓取网络包进行分析:1. 使用`kubectl get pods`确认Pod所在Node;2. 通过`kubectl node-shell`登录Node;3. 使用`crictl ps`找到Pod的Container ID;4. 获取PID并使用`nsenter`进入Pod的网络空间;5. 在`/var/tmp`目录下使用`tcpdump`抓包。完成后按Ctrl+C停止抓包。
282 12
|
存储 Kubernetes Docker
Kubernetes节点资源耗尽状态的处理
Kubernetes节点资源耗尽状态的处理
|
存储 Kubernetes 调度
在K8S中,⼀个pod的不同container能够分开被调动到不同的节点上吗?
在K8S中,⼀个pod的不同container能够分开被调动到不同的节点上吗?
|
Kubernetes 调度 Perl
在K8S中,Pod多副本配置了硬亲和性,会调度到同⼀个节点上吗?
在K8S中,Pod多副本配置了硬亲和性,会调度到同⼀个节点上吗?
|
Kubernetes 负载均衡 调度
在K8S中,K8S外部节点访问Pod有哪些方式?
在K8S中,K8S外部节点访问Pod有哪些方式?
|
资源调度 Kubernetes 调度
玩转Kubernetes集群:掌握节点和Pod自动扩缩容,让你的系统更智能、更高效!
【8月更文挑战第22天】Kubernetes的核心功能之一是自动扩缩容,确保系统稳定与高可用。节点自动扩缩容由调度器和控制器管理器协作完成,依据资源紧张程度动态调整。主要采用HPA、VPA及Cluster Autoscaler实现。Pod自动扩缩容通常通过HPA控制器按需调整副本数量。例如,设置HPA控制器监视特定部署的CPU使用率,在80%阈值上下自动增减副本数。合理利用这些工具可显著提升系统性能。
369 2
|
Kubernetes 应用服务中间件 Linux
多Master节点的k8s集群部署
多Master节点的k8s集群部署
|
边缘计算 人工智能 Kubernetes
边缘计算问题之理解 Kubernetes 节点资源的四层分配结构如何解决
边缘计算问题之理解 Kubernetes 节点资源的四层分配结构如何解决
139 1

热门文章

最新文章

推荐镜像

更多