1、问题描述
k8s集群配置为 一主+三个节点;刚开始运行一直正常;某天突然node03主机状态变为notready,问题如下:
在master节点使用:
#master节点查看节点工作状态 kubectl get nodes
出现node03节点的状态为NotReady。
2、查看node03的日志
在node03节点中使用一下命令查看报错信息,代码:
#node03节点查看日志 journalctl -f -u kubelet.service
报错意思是不能加载kubelet配置文件!
报错意思是不能加载kubelet配置文件!
报错意思是不能加载kubelet配置文件!
-- Logs begin at 四 2023-12-21 15:25:07 CST. -- 12月 22 01:01:00 tigerhhzz-node03-43 systemd[1]: Unit kubelet.service entered failed state. 12月 22 01:01:00 tigerhhzz-node03-43 systemd[1]: kubelet.service failed. 12月 22 01:01:10 tigerhhzz-node03-43 systemd[1]: kubelet.service holdoff time over, scheduling restart. 12月 22 01:01:10 tigerhhzz-node03-43 systemd[1]: Stopped kubelet: The Kubernetes Node Agent. 12月 22 01:01:10 tigerhhzz-node03-43 systemd[1]: Started kubelet: The Kubernetes Node Agent. 12月 22 01:01:10 tigerhhzz-node03-43 kubelet[121391]: Flag --cgroup-driver has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information. 12月 22 01:01:10 tigerhhzz-node03-43 kubelet[121391]: F1222 01:01:10.301771 121391 server.go:198] failed to load Kubelet config file /var/lib/kubelet/config.yaml, error failed to read kubelet config file "/var/lib/kubelet/config.yaml", error: open /var/lib/kubelet/config.yaml: no such file or directory 12月 22 01:01:10 tigerhhzz-node03-43 systemd[1]: kubelet.service: main process exited, code=exited, status=255/n/a 12月 22 01:01:10 tigerhhzz-node03-43 systemd[1]: Unit kubelet.service entered failed state. 12月 22 01:01:10 tigerhhzz-node03-43 systemd[1]: kubelet.service failed. 12月 22 01:01:20 tigerhhzz-node03-43 systemd[1]: kubelet.service holdoff time over, scheduling restart. 12月 22 01:01:20 tigerhhzz-node03-43 systemd[1]: Stopped kubelet: The Kubernetes Node Agent. 12月 22 01:01:20 tigerhhzz-node03-43 systemd[1]: Started kubelet: The Kubernetes Node Agent. 12月 22 01:01:20 tigerhhzz-node03-43 kubelet[121400]: Flag --cgroup-driver has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information. 12月 22 01:01:20 tigerhhzz-node03-43 kubelet[121400]: F1222 01:01:20.508883 121400 server.go:198] failed to load Kubelet config file /var/lib/kubelet/config.yaml, error failed to read kubelet config file "/var/lib/kubelet/config.yaml", error: open /var/lib/kubelet/config.yaml: no such file or directory 12月 22 01:01:20 tigerhhzz-node03-43 systemd[1]: kubelet.service: main process exited, code=exited, status=255/n/a 12月 22 01:01:20 tigerhhzz-node03-43 systemd[1]: Unit kubelet.service entered failed state. 12月 22 01:01:20 tigerhhzz-node03-43 systemd[1]: kubelet.service failed. 12月 22 01:01:30 tigerhhzz-node03-43 systemd[1]: kubelet.service holdoff time over, scheduling restart. 12月 22 01:01:30 tigerhhzz-node03-43 systemd[1]: Stopped kubelet: The Kubernetes Node Agent. 12月 22 01:01:30 tigerhhzz-node03-43 systemd[1]: Started kubelet: The Kubernetes Node Agent. 12月 22 01:01:30 tigerhhzz-node03-43 kubelet[121407]: Flag --cgroup-driver has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information. 12月 22 01:01:30 tigerhhzz-node03-43 kubelet[121407]: F1222 01:01:30.820217 121407 server.go:198] failed to load Kubelet config file /var/lib/kubelet/config.yaml, error failed to read kubelet config file "/var/lib/kubelet/config.yaml", error: open /var/lib/kubelet/config.yaml: no such file or directory 12月 22 01:01:30 tigerhhzz-node03-43 systemd[1]: kubelet.service: main process exited, code=exited, status=255/n/a 12月 22 01:01:30 tigerhhzz-node03-43 systemd[1]: Unit kubelet.service entered failed state. 12月 22 01:01:30 tigerhhzz-node03-43 systemd[1]: kubelet.service failed.
由日志信息可知,报错原因是不能从/var/llib/kubelet/config.yaml下载到kubelet的配置。
3、错误原因分析
可能node03主机自身某种原因,出现宕机后重启,然后在 kubeadm init初始化后没有加入node03节点到集群中,不能加载kubelet的配置文件/var/lib/kubelet/config.yaml,导致读取/var/llib/kubelet/config.yaml文件失败。
另外估计是我之前没有做 kubeadm init就运行了systemctl start kubelet。
4、解决办法
在master节点,重新生成token,然后尝试在node03问题节点上重新更新token。
## master节点操作 kubeadm token create --print-join-command
kubeadm join 192.168.162.31:6443 --token 6u1q3a.qxhb1wyjztsp34ty --discovery-token-ca-cert-hash sha256:967bbc3b30871241bbfd61e42ae5fa836e08111a5a43d63b319f028fdbc2241a
在node03节点运行一下代码:(尝试重新加入集群)
## node03节点操作 kubeadm join 192.168.162.31:6443 --token 6u1q3a.qxhb1wyjztsp34ty --discovery-token-ca-cert-hash sha256:967bbc3b30871241bbfd61e42ae5fa836e08111a5a43d63b319f028fdbc2241a
出现以下情况表明成功加入:
此时查看node03 kubelet的状态
systemctl status kubelet
kubelet在node03节点成功运行,node03重新加入集群之后查看所有节点状态,。
继续返回master节点主机查看所有节点状态:
kubectl get nodes
所有节点状态为ready,问题解决!!!