使用 kubeadm 初始化 worker节点出现 not ready 故障

本文涉及的产品
容器服务 Serverless 版 ACK Serverless,952元额度 多规格
容器服务 Serverless 版 ACK Serverless,317元额度 多规格
简介: 使用 kubeadm 初始化 worker节点出现 not ready 故障

一、遇到的问题

work 节点执行 kubeadm join 命令后集群状态一直显示 not ready,如下的 k8s-node-4

$ kubectl get nodes
NAME                     STATUS     ROLES    AGE    VERSION
k8s-jmeter-1.novalocal   Ready      <none>   17d    v1.18.5
k8s-jmeter-2.novalocal   Ready      <none>   17d    v1.18.5
k8s-jmeter-3.novalocal   Ready      <none>   17d    v1.18.5
k8s-master.novalocal     Ready      master   51d    v1.18.5
k8s-node-1.novalocal     Ready      <none>   51d    v1.18.5
k8s-node-2.novalocal     Ready      <none>   51d    v1.18.5
k8s-node-3.novalocal     Ready      <none>   51d    v1.18.5
k8s-node-4.novalocal     NotReady   <none>   160m   v1.18.5

二、问题排查

首先查看系统 pod 初始化情况:

$ kubectl get pod -n kube-system -o wide
NAME                                           READY   STATUS                  RESTARTS   AGE     IP               NODE                     NOMINATED NODE   READINESS GATES
calico-kube-controllers-5b8b769fcd-srkrb       1/1     Running                 0          3d19h   10.100.185.9     k8s-jmeter-2.novalocal   <none>           <none>
calico-node-5c8xj                              1/1     Running                 10         51d     172.16.106.227   k8s-node-1.novalocal     <none>           <none>
calico-node-9d7rt                              1/1     Running                 8          51d     172.16.106.203   k8s-node-3.novalocal     <none>           <none>
calico-node-crczj                              1/1     Running                 5          51d     172.16.106.226   k8s-node-2.novalocal     <none>           <none>
calico-node-g4hx4                              0/1     Init:ImagePullBackOff   0          99s     172.16.106.219   k8s-node-4.novalocal     <none>           <none>
calico-node-gpmsv                              1/1     Running                 5          17d     172.16.106.209   k8s-jmeter-1.novalocal   <none>           <none>
calico-node-pz7w5                              1/1     Running                 4          51d     172.16.106.200   k8s-master.novalocal     <none>           <none>
calico-node-r59bw                              1/1     Running                 3          17d     172.16.106.216   k8s-jmeter-2.novalocal   <none>           <none>
calico-node-xhjj8                              1/1     Running                 4          17d     172.16.106.210   k8s-jmeter-3.novalocal   <none>           <none>
coredns-66db54ff7f-2cxcp                       1/1     Running                 0          5d22h   10.100.167.140   k8s-node-1.novalocal     <none>           <none>
coredns-66db54ff7f-gptgt                       1/1     Running                 0          5d22h   10.100.41.31     k8s-master.novalocal     <none>           <none>
eip-nfs-nfs-storage-6fddcc8f9d-hqv7m           1/1     Running                 0          3d19h   10.100.185.4     k8s-jmeter-2.novalocal   <none>           <none>
etcd-k8s-master.novalocal                      1/1     Running                 0          5d21h   172.16.106.200   k8s-master.novalocal     <none>           <none>
kube-apiserver-k8s-master.novalocal            1/1     Running                 14         51d     172.16.106.200   k8s-master.novalocal     <none>           <none>
kube-controller-manager-k8s-master.novalocal   1/1     Running                 56         16d     172.16.106.200   k8s-master.novalocal     <none>           <none>
kube-proxy-5msrp                               1/1     Running                 1          9d      172.16.106.226   k8s-node-2.novalocal     <none>           <none>
kube-proxy-64pkw                               1/1     Running                 2          9d      172.16.106.210   k8s-jmeter-3.novalocal   <none>           <none>
kube-proxy-6j2fw                               1/1     Running                 1          9d      172.16.106.203   k8s-node-3.novalocal     <none>           <none>
kube-proxy-7cptn                               1/1     Running                 0          157m    172.16.106.219   k8s-node-4.novalocal     <none>           <none>
kube-proxy-fkt9p                               1/1     Running                 1          9d      172.16.106.227   k8s-node-1.novalocal     <none>           <none>
kube-proxy-fxvjb                               1/1     Running                 4          9d      172.16.106.209   k8s-jmeter-1.novalocal   <none>           <none>
kube-proxy-wnj2l                               1/1     Running                 2          9d      172.16.106.216   k8s-jmeter-2.novalocal   <none>           <none>
kube-proxy-wnzqg                               1/1     Running                 0          9d      172.16.106.200   k8s-master.novalocal     <none>           <none>
kube-scheduler-k8s-master.novalocal            1/1     Running                 48         16d     172.16.106.200   k8s-master.novalocal     <none>           <none>
kuboard-5cc4bcccd7-t8h8f                       1/1     Running                 0          21h     10.100.185.24    k8s-jmeter-2.novalocal   <none>           <none>
metrics-server-677dcb8b4d-jtpgd                1/1     Running                 0          3d20h   172.16.106.227   k8s-node-1.novalocal     <none>           <none>

通过结果我们可以看到 node-4 的 calico 组件未初始化成功,Pod 状态显示为 ImagePullBackoff

三、问题解决

1、获取容器镜像

通过命令获取 Pod 所使用的容器镜像:

$ kubectl get pods calico-node-7vrgx -n kube-system -o yaml | grep image:
            f:image: {}
            f:image: {}
            f:image: {}
            f:image: {}
    image: calico/node:v3.13.1
    image: calico/cni:v3.13.1
    image: calico/cni:v3.13.1
  - image: calico/pod2daemon-flexvol:v3.13.1
  - image: calico/node:v3.13.1
  - image: calico/cni:v3.13.1
  - image: calico/cni:v3.13.1
  - image: calico/pod2daemon-flexvol:v3.13.1

我们可以看到此 calico Pod 主要使用以下三个 image:

  • calico/node:v3.13.1
  • calico/cni:v3.13.1
  • calico/pod2daemon-flexvol:v3.13.1

2、下载镜像

找到 node-4 主机,在上面执行:

$ docker pull calico/node:v3.13.1
$ docker pull calico/cni:v3.13.1  
$ docker pull calico/pod2daemon-flexvol:v3.13.1

3、离线镜像

如果 docker pull 无法下载镜像,可以考虑从其他节点导出 calico 插件的镜像:

# 保存镜像到本地
$ docker save image_id -o xxxx.tar

# 拷贝镜像到 work 节点
$ scp xxxx.tar root@k8s-node-4:/root/

# 装载镜像
$ docker load -i xxxx.tar

# 给镜像打tag
$ docker tag image_id tag

4、重新创建pod

在 master 删除原有的 pod:

$ kubectl delete pod calico-node-g4hx4 -n kube-system

等一会重新查看 pod 状态:

$ kubectl get pod -n kube-system -o wide
NAME                                           READY   STATUS    RESTARTS   AGE     IP               NODE                     NOMINATED NODE   READINESS GATES
calico-kube-controllers-5b8b769fcd-srkrb       1/1     Running   0          3d19h   10.100.185.9     k8s-jmeter-2.novalocal   <none>           <none>
calico-node-5c7hn                              0/1     Running   0          8s      172.16.106.219   k8s-node-4.novalocal     <none>           <none>
calico-node-5c8xj                              1/1     Running   10         51d     172.16.106.227   k8s-node-1.novalocal     <none>           <none>
calico-node-9d7rt                              1/1     Running   8          51d     172.16.106.203   k8s-node-3.novalocal     <none>           <none>
calico-node-crczj                              1/1     Running   5          51d     172.16.106.226   k8s-node-2.novalocal     <none>           <none>
calico-node-gpmsv                              1/1     Running   5          17d     172.16.106.209   k8s-jmeter-1.novalocal   <none>           <none>
calico-node-pz7w5                              1/1     Running   4          51d     172.16.106.200   k8s-master.novalocal     <none>           <none>
calico-node-r59bw                              1/1     Running   3          17d     172.16.106.216   k8s-jmeter-2.novalocal   <none>           <none>
calico-node-xhjj8                              1/1     Running   4          17d     172.16.106.210   k8s-jmeter-3.novalocal   <none>           <none>
coredns-66db54ff7f-2cxcp                       1/1     Running   0          5d22h   10.100.167.140   k8s-node-1.novalocal     <none>           <none>
coredns-66db54ff7f-gptgt                       1/1     Running   0          5d22h   10.100.41.31     k8s-master.novalocal     <none>           <none>
eip-nfs-nfs-storage-6fddcc8f9d-hqv7m           1/1     Running   0          3d19h   10.100.185.4     k8s-jmeter-2.novalocal   <none>           <none>
etcd-k8s-master.novalocal                      1/1     Running   0          5d21h   172.16.106.200   k8s-master.novalocal     <none>           <none>
kube-apiserver-k8s-master.novalocal            1/1     Running   14         51d     172.16.106.200   k8s-master.novalocal     <none>           <none>
kube-controller-manager-k8s-master.novalocal   1/1     Running   56         16d     172.16.106.200   k8s-master.novalocal     <none>           <none>
kube-proxy-5msrp                               1/1     Running   1          9d      172.16.106.226   k8s-node-2.novalocal     <none>           <none>
kube-proxy-64pkw                               1/1     Running   2          9d      172.16.106.210   k8s-jmeter-3.novalocal   <none>           <none>
kube-proxy-6j2fw                               1/1     Running   1          9d      172.16.106.203   k8s-node-3.novalocal     <none>           <none>
kube-proxy-7cptn                               1/1     Running   0          160m    172.16.106.219   k8s-node-4.novalocal     <none>           <none>
kube-proxy-fkt9p                               1/1     Running   1          9d      172.16.106.227   k8s-node-1.novalocal     <none>           <none>
kube-proxy-fxvjb                               1/1     Running   4          9d      172.16.106.209   k8s-jmeter-1.novalocal   <none>           <none>
kube-proxy-wnj2l                               1/1     Running   2          9d      172.16.106.216   k8s-jmeter-2.novalocal   <none>           <none>
kube-proxy-wnzqg                               1/1     Running   0          9d      172.16.106.200   k8s-master.novalocal     <none>           <none>
kube-scheduler-k8s-master.novalocal            1/1     Running   48         16d     172.16.106.200   k8s-master.novalocal     <none>           <none>
kuboard-5cc4bcccd7-t8h8f                       1/1     Running   0          21h     10.100.185.24    k8s-jmeter-2.novalocal   <none>           <none>
metrics-server-677dcb8b4d-jtpgd                1/1     Running   0          3d20h   172.16.106.227   k8s-node-1.novalocal     <none>           <none>

我们看到所有的 Pod 都已经处于正常状态,这时候查看下 node 的状态:

$ kubectl get nodes
NAME                     STATUS   ROLES    AGE    VERSION
k8s-jmeter-1.novalocal   Ready    <none>   17d    v1.18.5
k8s-jmeter-2.novalocal   Ready    <none>   17d    v1.18.5
k8s-jmeter-3.novalocal   Ready    <none>   17d    v1.18.5
k8s-master.novalocal     Ready    master   51d    v1.18.5
k8s-node-1.novalocal     Ready    <none>   51d    v1.18.5
k8s-node-2.novalocal     Ready    <none>   51d    v1.18.5
k8s-node-3.novalocal     Ready    <none>   51d    v1.18.5
k8s-node-4.novalocal     Ready    <none>   161m   v1.18.5

以上,初始化的 node 状态恢复正常。

相关实践学习
容器服务Serverless版ACK Serverless 快速入门:在线魔方应用部署和监控
通过本实验,您将了解到容器服务Serverless版ACK Serverless 的基本产品能力,即可以实现快速部署一个在线魔方应用,并借助阿里云容器服务成熟的产品生态,实现在线应用的企业级监控,提升应用稳定性。
云原生实践公开课
课程大纲 开篇:如何学习并实践云原生技术 基础篇: 5 步上手 Kubernetes 进阶篇:生产环境下的 K8s 实践 相关的阿里云产品:容器服务&nbsp;ACK 容器服务&nbsp;Kubernetes&nbsp;版(简称&nbsp;ACK)提供高性能可伸缩的容器应用管理能力,支持企业级容器化应用的全生命周期管理。整合阿里云虚拟化、存储、网络和安全能力,打造云端最佳容器化应用运行环境。 了解产品详情:&nbsp;https://www.aliyun.com/product/kubernetes
目录
相关文章
|
6月前
|
存储 Kubernetes 调度
k8s教程(pod篇)-DaemonSet(每个node上只调度一个pod)
k8s教程(pod篇)-DaemonSet(每个node上只调度一个pod)
95 0
|
7月前
|
Kubernetes Linux Docker
kubelet 压力驱逐 - The node had condition:[DiskPressure]
kubelet 压力驱逐 - The node had condition:[DiskPressure]
578 0
|
1月前
|
存储 Kubernetes 调度
k8s常见的排错指南Node,svc,Pod等以及K8s网络不通问题
k8s常见的排错指南Node,svc,Pod等以及K8s网络不通问题
530 1
|
27天前
|
Kubernetes 算法 调度
k8s群集调度之 pod亲和 node亲和 标签指定
k8s群集调度之 pod亲和 node亲和 标签指定
|
1月前
|
关系型数据库 数据库 OceanBase
重启集群中所有节点的 observer 进程
重启集群中所有节点的 observer 进程
36 0
|
XML 数据格式 Python
Launching nodes:启动节点
Launching nodes:启动节点
115 0
Launching nodes:启动节点
|
Kubernetes 容器
为什么Kubernetes从节点会join失败
为什么Kubernetes从节点会join失败
1260 0
为什么Kubernetes从节点会join失败
|
Kubernetes Perl 容器
node节点pod无法启动/节点删除网络重置“cni0“ already has an IP address different from
node节点pod无法启动/节点删除网络重置“cni0“ already has an IP address different from
401 0
|
JavaScript
node运行服务无响应
node运行服务无响应
node运行服务无响应
|
网络协议 Linux Scala
指定 Master 与 Worker 的启动参数 | 学习笔记
快速学习指定 Master 与 Worker 的启动参数
110 0