1.网络不通的因素
Pod 内的路由丢失
Host 路由丢失iptables 规则问题
IPVS 规则问题
IP 冲突
Pod 网卡停止工作
ARP 表错误
Core DNS 解析问题
流量转发表问题
2.一般问题排查
1.Pod A 到 Host A,即 Pod 到当前主机不通
2.Host A 到 Host B,即 Pod 跨节点访问主机不通
3.Host B 到 Pod B,即 Pod 跨节点访问 Pod 不通,也可能是 Pod B 到 Host B 不通
4.PodA 到 PodB 即 pod 和 pod 网络不通。
3.在Pod中(ip a)查看网卡信息是否正常
[root@node1 ~]# kubectl exec -it busybox -n default /bin/sh kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl kubectl exec [POD] -- [COMMAND] instead. / # / # / # / # ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast qlen 1000 link/ether 00:0c:29:62:3d:1c brd ff:ff:ff:ff:ff:ff inet 192.168.31.138/24 brd 192.168.31.255 scope global ens33 valid_lft forever preferred_lft forever inet6 fe80::bc28:8e16:aea8:c62e/64 scope link valid_lft forever preferred_lft forever 3: virbr0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue qlen 1000 link/ether 52:54:00:62:2a:94 brd ff:ff:ff:ff:ff:ff inet 192.168.122.1/24 brd 192.168.122.255 scope global virbr0 valid_lft forever preferred_lft forever 4: virbr0-nic: <BROADCAST,MULTICAST> mtu 1500 qdisc pfifo_fast master virbr0 qlen 1000 link/ether 52:54:00:62:2a:94 brd ff:ff:ff:ff:ff:ff 5: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue link/ether 02:42:24:8b:af:61 brd ff:ff:ff:ff:ff:ff inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0 valid_lft forever preferred_lft forever
4. 使用 (ip r) 查看路由是否正常
/ # ip r default via 192.168.31.1 dev ens33 metric 100 10.244.0.0/24 via 10.244.0.0 dev flannel.1 onlink 10.244.1.0/24 dev cni0 scope link src 10.244.1.1 172.17.0.0/16 dev docker0 scope link src 172.17.0.1 192.168.31.0/24 dev ens33 scope link src 192.168.31.138 metric 100 192.168.122.0/24 dev virbr0 scope link src 192.168.122.1
5.在主机上用命令 (arp -a | grep PodIP )查看 arp 表是否异常。检查是否丢失、存在多条、Mac 地址与 pod 中看到的 Mac 不一致。
[root@node1 ~]# arp -a | grep 10 ? (10.244.1.48) at 9e:f6:c5:d4:47:0b [ether] on cni0 ? (10.244.1.51) at da:03:c2:99:07:5f [ether] on cni0 ? (10.244.1.53) at 82:0c:32:fc:74:71 [ether] on cni0 ? (10.244.1.54) at da:9b:f5:dd:1a:d6 [ether] on cni0 ? (10.244.1.36) at <incomplete> on cni0 ? (10.244.1.42) at <incomplete> on cni0 ? (10.244.1.45) at de:97:bd:2f:ec:e6 [ether] on cni0 ? (10.244.0.0) at 7a:a2:12:5d:dd:6f [ether] PERM on flannel.1 ? (10.244.1.46) at da:1e:b8:6e:d5:e9 [ether] on cni0
6.在主机上查看网卡是否存在
[root@node1 ~]# ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000 link/ether 00:0c:29:62:3d:1c brd ff:ff:ff:ff:ff:ff inet 192.168.31.138/24 brd 192.168.31.255 scope global noprefixroute ens33 valid_lft forever preferred_lft forever
7.pod中与主机中的路由是否存在
/ # ip r default via 192.168.31.1 dev ens33 metric 100 10.244.0.0/24 via 10.244.0.0 dev flannel.1 onlink 10.244.1.0/24 dev cni0 scope link src 10.244.1.1 172.17.0.0/16 dev docker0 scope link src 172.17.0.1 192.168.31.0/24 dev ens33 scope link src 192.168.31.138 metric 100 192.168.122.0/24 dev virbr0 scope link src 192.168.122.1 / # ip r get 192.168.31.138 local 192.168.31.138 dev lo src 192.168.31.138
8.在 Host A 上,执行命令 ip r get PodB IP,检查 Host A 是否有到 Pod B 的路由。
9.主机上到Pod 是否有到Pod的路由
[root@node1 ~]# ip r get 10.244.1.51 10.244.1.51 dev cni0 src 10.244.1.1 cache