使用 kubeadm 安装的k8s集群,不过排查思路都一样。
问题重现:
1.master kubelet[22628]: E0919 21:16:24.171522 22628 kubelet.go:2267] node "master" not found
[root@master ~]# journalctl -xefu kubelet
2.The connection to the server 192.168.31.119:6443 was refused - did you specify the right host or port?
[root@node1 ~]# kubectl get node The connection to the server 192.168.31.119:6443 was refused - did you specify the right host or port?
3.ed or is not yet valid", ServerName "")
WARNING: 2023/09/19 13:42:34 grpc: addrConn.createTransport failed to connect to {127.0.0.1:2379 0 <nil>}. Err :connection error: desc = "transport: authentication handshake failed: remote error: tls: bad certificate". Reconnecting...
2023-09-19 13:42:52.485923 I | embed: rejected connection from "127.0.0.1:40148" (error "remote error: tls: bad certificate", ServerName "")
2023-09-19 13:42:53.481391 I | embed: rejected connection from "127.0.0.1:40154" (error "remote error: tls: bad certificate", ServerName "")
2023-09-19 13:42:53.489182 I | embed: rejected connection from "127.0.0.1:40156" (error "remote error: tls: bad certificate", ServerName "")
[root@master pods]# docker ps -a | grep etcd [root@master pods]# docker logs -f 0a72fc9181f8
解决:
1.首先检查服务是否启动有无报错,如果服务报错进行排查
[root@master ~]# systemctl status kubelet ● kubelet.service - kubelet: The Kubernetes Node Agent Loaded: loaded (/usr/lib/systemd/system/kubelet.service; enabled; vendor preset: disabled) Drop-In: /usr/lib/systemd/system/kubelet.service.d └─10-kubeadm.conf Active: active (running) since 二 2023-09-19 21:24:21 CST; 28s ago Docs: https://kubernetes.io/docs/ Main PID: 30962 (kubelet) Tasks: 15 Memory: 27.3M CGroup: /system.slice/kubelet.service └─30962 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib... 9月 19 21:24:48 master kubelet[30962]: E0919 21:24:48.548081 30962 kubelet.go:2267] node "master" not found 9月 19 21:24:48 master kubelet[30962]: E0919 21:24:48.648252 30962 kubelet.go:2267] node "master" not found 9月 19 21:24:48 master kubelet[30962]: E0919 21:24:48.748704 30962 kubelet.go:2267] node "master" not found 9月 19 21:24:48 master kubelet[30962]: E0919 21:24:48.849705 30962 kubelet.go:2267] node "master" not found 9月 19 21:24:48 master kubelet[30962]: E0919 21:24:48.950054 30962 kubelet.go:2267] node "master" not found 9月 19 21:24:49 master kubelet[30962]: E0919 21:24:49.050555 30962 kubelet.go:2267] node "master" not found 9月 19 21:24:49 master kubelet[30962]: E0919 21:24:49.150827 30962 kubelet.go:2267] node "master" not found 9月 19 21:24:49 master kubelet[30962]: E0919 21:24:49.251845 30962 kubelet.go:2267] node "master" not found 9月 19 21:24:49 master kubelet[30962]: E0919 21:24:49.339954 30962 csi_plugin.go:271] Failed to initialize CSINodeInfo: error updating CSINode an...n refused 9月 19 21:24:49 master kubelet[30962]: E0919 21:24:49.352815 30962 kubelet.go:2267] node "master" not found Hint: Some lines were ellipsized, use -l to show in full.
2.确认端口是否被占用
[root@master ~]# netstat -napt | grep 6443 tcp 0 0 192.168.31.119:47980 192.168.31.119:6443 ESTABLISHED 19146/kube-controll tcp 0 0 192.168.31.119:47982 192.168.31.119:6443 ESTABLISHED 33191/kubelet tcp 0 0 192.168.31.119:47964 192.168.31.119:6443 ESTABLISHED 33191/kubelet tcp 0 0 192.168.31.119:47994 192.168.31.119:6443 ESTABLISHED 33191/kubelet tcp 0 0 192.168.31.119:47990 192.168.31.119:6443 ESTABLISHED 19177/kube-schedule tcp 0 0 192.168.31.119:47988 192.168.31.119:6443 ESTABLISHED 33191/kubelet tcp 0 0 192.168.31.119:48000 192.168.31.119:6443 ESTABLISHED 33191/kubelet tcp 0 0 192.168.31.119:47968 192.168.31.119:6443 ESTABLISHED 33191/kubelet tcp 0 0 192.168.31.119:47996 192.168.31.119:6443 ESTABLISHED 33191/kubelet tcp6 15 0 :::6443 :::* LISTEN 34191/kube-apiserve tcp6 261 0 192.168.31.119:6443 192.168.31.119:47988 ESTABLISHED - tcp6 261 0 192.168.31.119:6443 192.168.31.119:47996 ESTABLISHED - tcp6 261 0 192.168.31.119:6443 192.168.31.119:47968 ESTABLISHED - tcp6 261 0 192.168.31.119:6443 192.168.31.119:47990 ESTABLISHED - tcp6 261 0 192.168.31.119:6443 192.168.31.119:48000 ESTABLISHED - tcp6 261 0 192.168.31.119:6443 192.168.31.119:47980 ESTABLISHED - tcp6 261 0 192.168.31.119:6443 192.168.31.119:47994 ESTABLISHED - tcp6 261 0 192.168.31.119:6443 192.168.31.138:53554 ESTABLISHED - tcp6 261 0 192.168.31.119:6443 192.168.31.119:47964 ESTABLISHED - tcp6 261 0 192.168.31.119:6443 192.168.31.119:47982 ESTABLISHED - tcp6 261 0 192.168.31.119:6443 192.168.31.138:53560 ESTABLISHED - tcp6 261 0 192.168.31.119:6443 192.168.31.138:53558 ESTABLISHED - tcp6 261 0 192.168.31.119:6443 192.168.31.138:53556 ESTABLISHED - tcp6 261 0 192.168.31.119:6443 192.168.31.138:53562 ESTABLISHED - tcp6 261 0 192.168.31.119:6443 192.168.31.138:53552 ESTABLISHED -
3.防火墙
systemctl enable firewalld| systemctl start firewalld|
4.iptables 规则 是否放行
[root@master ~]# iptables -nL Chain INPUT (policy ACCEPT) target prot opt source destination KUBE-FIREWALL all -- 0.0.0.0/0 0.0.0.0/0 ACCEPT udp -- 0.0.0.0/0 0.0.0.0/0 udp dpt:53 ACCEPT tcp -- 0.0.0.0/0 0.0.0.0/0 tcp dpt:53 ACCEPT udp -- 0.0.0.0/0 0.0.0.0/0 udp dpt:67 ACCEPT tcp -- 0.0.0.0/0 0.0.0.0/0 tcp dpt:67
5.检查环境变量
[root@master ~]# env | grep -i kub KUBECONFIG=/etc/kubernetes/admin.conf #查看文件是否存在,如果不存在执行下面的步骤 ll /etc/kubernetes/admin.conf #重新写入环境变量 echo "export KUBECONFIG=/etc/kubernetes/admin.conf" >> ~/.bash_profile #生效 source ~/.bash_profile
6.重启
systemctl restart docker systemctl restart kubelet
7.查看 kube-apiserver
[root@master ~]# docker ps -a | grep kube-apiserver 91408667ac3c 74060cea7f70 "kube-apiserver --ad…" 3 seconds ago Up 3 seconds k8s_kube-apiserver_kube-apiserver-master_kube-system_42ea279818ec7cd5d7cf9ab59d471527_120 0b4a4347d286 74060cea7f70 "kube-apiserver --ad…" 36 seconds ago Exited (2) 16 seconds ago k8s_kube-apiserver_kube-apiserver-master_kube-system_42ea279818ec7cd5d7cf9ab59d471527_119 bb0e796cd064 registry.aliyuncs.com/google_containers/pause:3.2 "/pause" 21 minutes ago Up 16 minutes k8s_POD_kube-apiserver-master_kube-system_42ea279818ec7cd5d7cf9ab59d471527_40
可以看到有异常
[root@master ~]# docker restart 91408667ac3c
重启后还有异常。
8. 查看etcd是否正常
[root@master ~]# docker ps -a | grep etcd 0a72fc9181f8 303ce5db0e90 "etcd --advertise-cl…" 23 minutes ago Up 23 minutes k8s_etcd_etcd-master_kube-system_6ce5e992bc51b3eb87f2bae12bb6461b_38 20a5fda377a5 registry.aliyuncs.com/google_containers/pause:3.2 "/pause" 23 minutes ago Up 23 minutes k8s_POD_etcd-master_kube-system_6ce5e992bc51b3eb87f2bae12bb6461b_38 8d4bb904f788 303ce5db0e90 "etcd --advertise-cl…" 38 minutes ago Exited (0) 23 minutes ago k8s_etcd_etcd-master_kube-system_6ce5e992bc51b3eb87f2bae12bb6461b_37 6dd731e54cbf registry.aliyuncs.com/google_containers/pause:3.2 "/pause" 38 minutes ago Exited (0) 23 minutes ago k8s_POD_etcd-master_kube-system_6ce5e992bc51b3eb87f2bae12bb6461b_37
9.查看报错日志
[root@master ~]# cd /var/log/pods [root@master pods]# ls kube-system_coredns-7ff77c879f-8blg8_ba35cd55-b154-4483-9c6a-9a788de6e755 kube-system_kube-flannel-ds-amd64-pgjgh_6c7b3afc-d9ae-4906-b92f-5482f46f3210 kube-system_etcd-master_6ce5e992bc51b3eb87f2bae12bb6461b kube-system_kube-proxy-r5dk6_e4c2bb5d-5fc6-4bd9-b574-4ced96cf06bd kube-system_kube-apiserver-master_42ea279818ec7cd5d7cf9ab59d471527 kube-system_kube-scheduler-master_ca2aa1b3224c37fa1791ef6c7d883bbe kube-system_kube-controller-manager-master_c4d2dd4abfffdee4d424ce839b0de402
10.查看磁盘空间是否满了
[root@master pods]# df -h 文件系统 容量 已用 可用 已用% 挂载点 devtmpfs 1.9G 0 1.9G 0% /dev tmpfs 1.9G 0 1.9G 0% /dev/shm tmpfs 1.9G 37M 1.9G 2% /run tmpfs 1.9G 0 1.9G 0% /sys/fs/cgroup /dev/mapper/centos-root 50G 6.2G 44G 13% / /dev/sda1 1014M 241M 774M 24% /boot /dev/mapper/centos-home 146G 1.9G 144G 2% /home tmpfs 378M 12K 378M 1% /run/user/42 tmpfs 378M 0 378M 0% /run/user/0 overlay 146G 1.9G 144G 2% /home/dir/docker/overlay2/90fa943a031ea73797f5ae492efb609ee283065848a0a1984357464fd4f6b87a/merged overlay 146G 1.9G 144G 2% /home/dir/docker/overlay2/582a6e2a601b751c9c543525c34d806dfd64242da4bbf6569316fb6600597c7d/merged overlay 146G 1.9G 144G 2% /home/dir/docker/overlay2/73385ea42a8b75fde7968d81c9dc9512557bc12b691317d7363a9010dab69bbc/merged shm 64M 0 64M 0% /home/dir/docker/containers/20a5fda377a54f337dbae9767e0350318731f20e7eb172155d5170601d97b4de/mounts/shm shm 64M 0 64M 0% /home/dir/docker/containers/2ebe25d4900540b7883e5ea23372db4e0f1f2989e62c921d1eead1f8261e6792/mounts/shm overlay 146G 1.9G 144G 2% /home/dir/docker/overlay2/331c608a8a68dc1a296e9e94dd31394caf797907a60773e7c786a36d46847d0d/merged shm 64M 0 64M 0% /home/dir/docker/containers/6b5f4603aaa8d7fc92a9ed08f68119106e2b99f0294152c82526609323ddd017/mounts/shm overlay 146G 1.9G 144G 2% /home/dir/docker/overlay2/59dce5e3d143017e0146f4bbf0142767689c2439c13534582b1b1695d3eca45f/merged overlay 146G 1.9G 144G 2% /home/dir/docker/overlay2/b98bf5b91d02521d2d4c34fec8061bfe12eabef38e894d2858e16125ab6aacf7/merged overlay 146G 1.9G 144G 2% /home/dir/docker/overlay2/75ce3ac7be535c2e91af0dd23729332459580d49c73562e80bfc9b954e22589c/merged shm 64M 0 64M 0% /home/dir/docker/containers/bb0e796cd064a1fc3d656758bbf9076d5effdbd12103cd4f76541784bd6c65db/mounts/shm
11.查看api-server证书是否过期-这里是过期了重新生成证书就可以了
[root@master pods]# openssl x509 -in /etc/kubernetes/pki/apiserver.crt -noout -text |grep 'Not' Not Before: Sep 18 06:30:54 2022 GMT Not After : Sep 18 06:30:54 2023 GMT
PS:到这里所有的问题就都解决了,证书生成在我下一遍博客。写作不易,拒绝白嫖。