24

k8s pod 没有 IP ,报错 failed to read pod IP from plugin/docker

 1 year ago
source link: https://zhangguanzhang.github.io/2022/07/12/pod-no-ip-addr/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

k8s pod 没有 IP ,报错 failed to read pod IP from plugin/docker



字数统计: 1.2k阅读时长: 6 min
 2022/07/12  23  Share

有事回到工位上,还没坐下同事就过来喊我,让我帮忙看个客户的生产环境问题,大致就是客户为了搞安全,开了 ipset,然后发现业务受影响了。

$ kubectl get pod -o wide | grep -v Runn
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
xxxx-privilege-r97z4 0/1 CrashLoopBackOff 51 107m <none> 10.x.xx.xx <none> <none>
etcd1-10.x.xx.xx 0/1 CrashLoopBackOff 61 44m <none> 10.x.xx.xx <none> <none>
promtail-7jk8j 0/1 CrashLoopBackOff 51 107m <none> 10.x.xx.xx <none> <none>
zookeeper-1-10.x.xx.xx 0/1 CrashLoopBackOff 83 107m <none> 10.x.xx.xx <none> <none>

看下 etcd 日志,谁让 etcd 是 golang 写的,golang 服务的日志比 java 的日志更清晰 😉

$ docker ps -a |grep etcd | head -n 3
a01d26e5668a mirrorgooglecontainers/pause-amd64:3.1 "/pause" 1 second ago Created k8s_POD_etcd1-10.x.xx.xx_default_90bb7a6b237dd87a85e03ed7981e90f3_3182
2434b43b2026 mirrorgooglecontainers/pause-amd64:3.1 "/pause" 3 seconds ago Exited (0) 1 second ago k8s_POD_etcd1-10.x.xx.xx_default_90bb7a6b237dd87a85e03ed7981e90f3_3181
8e434211ee24 b5d94f31df3a "/app/etcd --name=et…" 5 seconds ago Exited (1) 4 seconds ago k8s_etcd1_etcd1-10.x.xx.xx_default_90bb7a6b237dd87a85e03ed7981e90f3_62
$ docker logs 8e43
2022-07-12 09:11:45.659987 W | pkg/flags: unrecognized environment variable ETCD_PORT_2379_TCP_PORT=2379
2022-07-12 09:11:45.660118 W | pkg/flags: unrecognized environment variable ETCD_SERVICE_PORT_ETCD_CLIENT_2379=2379
2022-07-12 09:11:45.660149 W | pkg/flags: unrecognized environment variable ETCD_PORT_2379_TCP_ADDR=xxx.xx.145.219
2022-07-12 09:11:45.660161 W | pkg/flags: unrecognized environment variable ETCD_PORT_2379_TCP_PROTO=tcp
2022-07-12 09:11:45.660180 W | pkg/flags: unrecognized environment variable ETCD_SERVICE_HOST=xxx.xx.145.219
2022-07-12 09:11:45.660202 W | pkg/flags: unrecognized environment variable ETCD_PORT_2379_TCP=tcp://xxx.xx.145.219:2379
2022-07-12 09:11:45.660227 W | pkg/flags: unrecognized environment variable ETCD_SERVICE_PORT=2379
2022-07-12 09:11:45.660322 W | pkg/flags: unrecognized environment variable ETCD_PORT=tcp://xxx.xx.145.219:2379
2022-07-12 09:11:45.660376 E | etcdmain: error verifying flags, expected IP in URL for binding (http://:2380). See 'etcd --help'.

日志报错没有 IP,看了下 flannel 的 pod 都是正常运行的,看下 kubelet 日志:

$ journalctl -xe --no-pager -u kubelet
Jul 12 17:13:31 xxx.xxx.xxx kubelet[761]: with error: exit status 1
Jul 12 17:13:31 xxx.xxx.xxx kubelet[761]: I0712 17:13:31.384403 761 kubelet.go:1933] SyncLoop (PLEG): "xxxx-privilege-r97z4_default(3b3e512c-61e4-4e0f-ae19-217f9d23bdce)", event: &pleg.PodLifecycleEvent{ID:"3b3e512c-61e4-4e0f-ae19-217f9d23bdce", Type:"ContainerStarted", Data:"9584c21ffd29fcc723f9855b3235e652058c8f8dd66bcc1539fd8d213c059482"}
Jul 12 17:13:31 xxx.xxx.xxx kubelet[761]: I0712 17:13:31.385046 761 kuberuntime_manager.go:434] Sandbox for pod "xxxx-privilege-r97z4_default(3b3e512c-61e4-4e0f-ae19-217f9d23bdce)" has no IP address. Need to start a new one
Jul 12 17:13:31 xxx.xxx.xxx kubelet[761]: W0712 17:13:31.398508 761 docker_sandbox.go:384] failed to read pod IP from plugin/docker: NetworkPlugin cni failed on the status hook for pod "zookeeper-1-10.x.xx.xx_default": Unexpected command output nsenter: failed to execute ip: No such file or directory
Jul 12 17:13:31 xxx.xxx.xxx kubelet[761]: with error: exit status 1
Jul 12 17:13:31 xxx.xxx.xxx kubelet[761]: I0712 17:13:31.418775 761 kubelet.go:1933] SyncLoop (PLEG): "zookeeper-1-10.x.xx.xx_default(9ede2a2352bf8cd0cf86767166391721)", event: &pleg.PodLifecycleEvent{ID:"9ede2a2352bf8cd0cf86767166391721", Type:"ContainerStarted", Data:"1f7169d3e335fd0e79c717290d8ac42de5d089dae5fa4647a13ebe53b8611c88"}
Jul 12 17:13:31 xxx.xxx.xxx kubelet[761]: I0712 17:13:31.419167 761 kuberuntime_manager.go:434] Sandbox for pod "zookeeper-1-10.x.xx.xx_default(9ede2a2352bf8cd0cf86767166391721)" has no IP address. Need to start a new one
Jul 12 17:13:31 xxx.xxx.xxx kubelet[761]: W0712 17:13:31.430948 761 docker_sandbox.go:384] failed to read pod IP from plugin/docker: NetworkPlugin cni failed on the status hook for pod "xxxx-gateway-7dd6cdc85d-6hsz7_default": Unexpected command output nsenter: failed to execute ip: No such file or directory
Jul 12 17:13:31 xxx.xxx.xxx kubelet[761]: with error: exit status 1

看报错意思是执行 ip netns 报错,看了下,果然没 ip 命令了,系统是 centos 7.9,需要安装 iproute 包,在 http://www.rpmfind.net/ 上下了个 centos7 的 rpm 后让人传上去安装后就好了。

$ docker ps -a |grep etcd  | head -n 5
52b7b3a6a5b2 b5d94f31df3a "/app/etcd --name=et…" 30 seconds ago Up 29 seconds k8s_etcd1_etcd1-10.x.xx.xx_default_90bb7a6b237dd87a85e03ed7981e90f3_68
93f00c95f7ba mirrorgooglecontainers/pause-amd64:3.1 "/pause" 33 seconds ago Up 32 seconds k8s_POD_etcd1-10.x.xx.xx_default_90bb7a6b237dd87a85e03ed7981e90f3_3378
fb75458a53b8 mirrorgooglecontainers/pause-amd64:3.1 "/pause" 36 seconds ago Exited (0) 34 seconds ago k8s_POD_etcd1-10.x.xx.xx_default_90bb7a6b237dd87a85e03ed7981e90f3_3377
e291ec9ed471 mirrorgooglecontainers/pause-amd64:3.1 "/pause" 39 seconds ago Exited (0) 37 seconds ago k8s_POD_etcd1-10.x.xx.xx_default_90bb7a6b237dd87a85e03ed7981e90f3_3376
e6eeed41ef31 b5d94f31df3a "/app/etcd --name=et…" 3 minutes ago Exited (1) 3 minutes ago k8s_etcd1_etcd1-10.x.xx.xx_default_90bb7a6b237dd87a85e03ed7981e90f3_67

按理说不应该有人去卸载它,是不是有其他依赖给它卸载了,看下日志果然:

$ grep -C 20  iprout /var/log/yum.log
Jul 04 22:43:24 Erased: plymouth-0.8.9-0.34.20140113.el7.centos.x86_64
Jul 04 22:43:24 Erased: plymouth-scripts-0.8.9-0.34.20140113.el7.centos.x86_64
Jul 04 22:43:24 Erased: iptables-services-1.4.21-35.el7.x86_64
Jul 04 22:43:25 Erased: kbd-1.15.5-15.el7.x86_64
Jul 04 22:43:25 Erased: kexec-tools-2.0.15-51.el7_9.3.x86_64
Jul 04 22:43:25 Erased: dracut-network-033-572.el7.x86_64
Jul 04 22:43:25 Erased: 12:dhclient-4.2.5-82.el7.centos.x86_64
Jul 04 22:43:26 Erased: initscripts-9.49.53-1.el7_9.1.x86_64
Jul 04 22:43:27 Erased: open-vm-tools-11.0.5-3.el7_9.3.x86_64
Jul 04 22:43:27 Erased: iproute-4.11.0-30.el7.x86_64
Jul 04 22:43:27 Erased: iptables-1.4.21-35.el7.x86_64
Jul 04 22:44:33 Installed: iptables-1.4.21-35.el7.x86_64
$ uptime -s
2022-07-12 13:37:19

上面的 -C 20 就这几行,说明日志文件内容就这么点,看就是客户之前自己去安装 iptables 的那个一次性导入规则服务导致的,客户自己的锅


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK