15

银河麒麟arm64系统上k8s集群跨节点不通的一次排查

 3 years ago
source link: https://zhangguanzhang.github.io/2020/10/20/kylin-v10-k8s-overlay-error/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

同事在客户那边部署的集群问题频繁,先给他解决了个问题后又反映说业务 POD 由于 DNS 无法解析而启动失败,排查完发现这样的情况从没遇到过,挺有意思的,这里记录下。实际排查过程也有往错误的方向浪费了一些时间和尝试,就不写进来了,以正确的角度写下排查过程。

环境信息

集群信息:

$ kubectl version -o json
{
  "clientVersion": {
    "major": "1",
    "minor": "15",
    "gitVersion": "v1.15.12",
    "gitCommit": "e2a822d9f3c2fdb5c9bfbe64313cf9f657f0a725",
    "gitTreeState": "clean",
    "buildDate": "2020-05-06T05:17:59Z",
    "goVersion": "go1.12.17",
    "compiler": "gc",
    "platform": "linux/arm64"
  },
  "serverVersion": {
    "major": "1",
    "minor": "15",
    "gitVersion": "v1.15.12",
    "gitCommit": "e2a822d9f3c2fdb5c9bfbe64313cf9f657f0a725",
    "gitTreeState": "clean",
    "buildDate": "2020-05-06T05:09:48Z",
    "goVersion": "go1.12.17",
    "compiler": "gc",
    "platform": "linux/arm64"
  }
}

OS 是 arm64 的银河麒麟系统

$ cat /etc/os-release
NAME="Kylin Linux Advanced Server"
VERSION="V10 (Tercel)"
ID="kylin"
VERSION_ID="V10"
PRETTY_NAME="Kylin Linux Advanced Server V10 (Tercel)"
ANSI_COLOR="0;31"
$ uname -a
Linux xxx 4.19.90-17.ky10.aarch64 #1 SMP Sun Jun 28 14:27:40 CST 2020 aarch64 aarch64 aarch64 GNU/Linux

排查

先看下集群 DNS 的 SVC IP。

$ kubectl -n kube-system get svc -l k8s-app=kube-dns
NAME                 TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                      AGE
kube-dns             ClusterIP   10.186.0.2       <none>        53/UDP,53/TCP,9153/TCP       87m

手动用 dig 发 DNS 请求看看,刚开始是用的 cluster.local ,后面感觉不对劲看了下 kubelet 的参数发现 cluster.domaincluster1.local

$ dig @10.186.0.2 kubernetes.default.svc.cluster1.local +tcp
;; Connection to 10.186.0.2#53(10.186.0.2) for kubernetes.default.svc.cluster1.local failed: timed out.
;; Connection to 10.186.0.2#53(10.186.0.2) for kubernetes.default.svc.cluster1.local failed: timed out.

超时,用 coredns 的 metrics 接口试试:

$ curl -I 10.186.0.2:9153/metrics
^C

还是超时,看下 flannel 的 vtep 都正确

$ kubectl get node -o yaml | grep -A3 Vtep
      flannel.alpha.coreos.com/backend-data: '{"VtepMAC":"ea:77:37:86:ee:bf"}'
      flannel.alpha.coreos.com/backend-type: vxlan
      flannel.alpha.coreos.com/kube-subnet-manager: "true"
      flannel.alpha.coreos.com/public-ip: 172.31.159.19
--
      flannel.alpha.coreos.com/backend-data: '{"VtepMAC":"f2:d2:28:8e:4c:61"}'
      flannel.alpha.coreos.com/backend-type: vxlan
      flannel.alpha.coreos.com/kube-subnet-manager: "true"
      flannel.alpha.coreos.com/public-ip: 172.31.159.20
--
      flannel.alpha.coreos.com/backend-data: '{"VtepMAC":"2a:f1:d4:d0:32:24"}'
      flannel.alpha.coreos.com/backend-type: vxlan
      flannel.alpha.coreos.com/kube-subnet-manager: "true"
      flannel.alpha.coreos.com/public-ip: 172.31.159.21
--
      flannel.alpha.coreos.com/backend-data: '{"VtepMAC":"4a:e7:02:47:20:b8"}'
      flannel.alpha.coreos.com/backend-type: vxlan
      flannel.alpha.coreos.com/kube-subnet-manager: "true"
      flannel.alpha.coreos.com/public-ip: 172.31.159.22
--
      flannel.alpha.coreos.com/backend-data: '{"VtepMAC":"ce:ce:f3:fc:3f:77"}'
      flannel.alpha.coreos.com/backend-type: vxlan
      flannel.alpha.coreos.com/kube-subnet-manager: "true"
      flannel.alpha.coreos.com/public-ip: 172.31.159.23

看下 coredns 的 pod ip,绕过集群 SVC 使用 pod ip 测试下

$ kubectl -n kube-system get po -o wide -l k8s-app=kube-dns
NAME                                  READY   STATUS    RESTARTS   AGE   IP              NODE            NOMINATED NODE   READINESS GATES
coredns-677d9c57f-tdnd4               1/1     Running   0          10m   10.187.1.24     172.31.159.21   <none>           <none>
coredns-677d9c57f-x274j               1/1     Running   0          10m   10.187.4.24     172.31.159.22   <none>           <none>
$ curl -I 10.187.1.24:9153/metrics
^C

还是超时,继续上面的 curl ,因为是 curl 的 9153 ,它不是常见的端口,否则下文的 tcpdump 过滤条件太麻烦了。这里去目的主机上抓包

$ tcpdump -nn -i flannel.1 host 10.187.1.24 and port 9153 -vv
tcpdump: listening on flannel.1, link-type EN10MB (Ethernet), capture size 262144 bytes
16:39:35.019165 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    10.187.1.24.9153 > 10.187.0.0.45920: Flags [S.], cksum 0xe94e (correct), seq 1670919335, ack 1103099581, win 64308, options [mss 1410,sackOK,TS val 3440201878 ecr 632709592,nop,wscale 7], length 0
16:39:35.068097 IP (tos 0x0, ttl 64, id 39684, offset 0, flags [DF], proto TCP (6), length 60)
    10.187.0.0.45920 > 10.187.1.24.9153: Flags [S], cksum 0x80ed (correct), seq 1103099580, win 64860, options [mss 1410,sackOK,TS val 632716806 ecr 0,nop,wscale 7], length 0
16:39:35.068241 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    10.187.1.24.9153 > 10.187.0.0.45920: Flags [S.], cksum 0xe91c (correct), seq 1670919335, ack 1103099581, win 64308, options [mss 1410,sackOK,TS val 3440201928 ecr 632709592,nop,wscale 7], length 0
16:39:43.419197 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    10.187.1.24.9153 > 10.187.0.0.45920: Flags [S.], cksum 0xc87e (correct), seq 1670919335, ack 1103099581, win 64308, options [mss 1410,sackOK,TS val 3440210278 ecr 632709592,nop,wscale 7], length 0
16:39:43.708101 IP (tos 0x0, ttl 64, id 39685, offset 0, flags [DF], proto TCP (6), length 60)
    10.187.0.0.45920 > 10.187.1.24.9153: Flags [S], cksum 0x5f2d (correct), seq 1103099580, win 64860, options [mss 1410,sackOK,TS val 632725446 ecr 0,nop,wscale 7], length 0
16:39:43.708233 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    10.187.1.24.9153 > 10.187.0.0.45920: Flags [S.], cksum 0xc75c (correct), seq 1670919335, ack 1103099581, win 64308, options [mss 1410,sackOK,TS val 3440210568 ecr 632709592,nop,wscale 7], length 0
16:39:54.141929 IP (tos 0x0, ttl 64, id 12300, offset 0, flags [DF], proto TCP (6), length 60)
    10.187.0.0.46388 > 10.187.1.24.9153: Flags [S], cksum 0x0a5a (correct), seq 3149899513, win 64860, options [mss 1410,sackOK,TS val 632735880 ecr 0,nop,wscale 7], length 0
16:39:54.142080 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    10.187.1.24.9153 > 10.187.0.0.46388: Flags [S.], cksum 0xeb46 (correct), seq 14530549, ack 3149899514, win 64308, options [mss 1410,sackOK,TS val 3440221001 ecr 632735880,nop,wscale 7], length 0
16:39:55.148096 IP (tos 0x0, ttl 64, id 12301, offset 0, flags [DF], proto TCP (6), length 60)
    10.187.0.0.46388 > 10.187.1.24.9153: Flags [S], cksum 0x066c (correct), seq 3149899513, win 64860, options [mss 1410,sackOK,TS val 632736886 ecr 0,nop,wscale 7], length 0
16:39:55.148381 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    10.187.1.24.9153 > 10.187.0.0.46388: Flags [S.], cksum 0xe757 (correct), seq 14530549, ack 3149899514, win 64308, options [mss 1410,sackOK,TS val 3440222008 ecr 632735880,nop,wscale 7], length 0
16:39:56.219200 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    10.187.1.24.9153 > 10.187.0.0.46388: Flags [S.], cksum 0xe329 (correct), seq 14530549, ack 3149899514, win 64308, options [mss 1410,sackOK,TS val 3440223078 ecr 632735880,nop,wscale 7], length 0
16:39:57.228103 IP (tos 0x0, ttl 64, id 12302, offset 0, flags [DF], proto TCP (6), length 60)
    10.187.0.0.46388 > 10.187.1.24.9153: Flags [S], cksum 0xfe4b (correct), seq 3149899513, win 64860, options [mss 1410,sackOK,TS val 632738966 ecr 0,nop,wscale 7], length 0
16:39:57.228247 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    10.187.1.24.9153 > 10.187.0.0.46388: Flags [S.], cksum 0xdf37 (correct), seq 14530549, ack 3149899514, win 64308, options [mss 1410,sackOK,TS val 3440224088 ecr 632735880,nop,wscale 7], length 0
16:39:59.259269 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    10.187.1.24.9153 > 10.187.0.0.46388: Flags [S.], cksum 0xd748 (correct), seq 14530549, ack 3149899514, win 64308, options [mss 1410,sackOK,TS val 3440226119 ecr 632735880,nop,wscale 7], length 0
16:40:00.059221 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    10.187.1.24.9153 > 10.187.0.0.45920: Flags [S.], cksum 0x877e (correct), seq 1670919335, ack 1103099581, win 64308, options [mss 1410,sackOK,TS val 3440226918 ecr 632709592,nop,wscale 7], length 0
16:40:01.308098 IP (tos 0x0, ttl 64, id 12303, offset 0, flags [DF], proto TCP (6), length 60)
    10.187.0.0.46388 > 10.187.1.24.9153: Flags [S], cksum 0xee5b (correct), seq 3149899513, win 64860, options [mss 1410,sackOK,TS val 632743046 ecr 0,nop,wscale 7], length 0
16:40:01.308248 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    10.187.1.24.9153 > 10.187.0.0.46388: Flags [S.], cksum 0xcf47 (correct), seq 14530549, ack 3149899514, win 64308, options [mss 1410,sackOK,TS val 3440228168 ecr 632735880,nop,wscale 7], length 0

可以看到回了包,但是报文都是 TCP 的 SYN 的报文重传了,回到 curl 的机器上,另开一个窗口抓包

$ tcpdump -nn -i flannel.1 host 10.187.1.24 and port 9153 -vv
tcpdump: listening on flannel.1, link-type EN10MB (Ethernet), capture size 262144 bytes
16:46:20.324596 IP (tos 0x0, ttl 64, id 7952, offset 0, flags [DF], proto TCP (6), length 60)
    10.187.0.0.53748 > 10.187.1.24.9153: Flags [S], cksum 0x29f7 (correct), seq 1340604575, win 64860, options [mss 1410,sackOK,TS val 633118295 ecr 0,nop,wscale 7], length 0
16:46:20.324636 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    10.187.1.24.9153 > 10.187.0.0.53748: Flags [S.], cksum 0xdf09 (correct), seq 1764271535, ack 1340604576, win 64308, options [mss 1410,sackOK,TS val 3440603416 ecr 633118295,nop,wscale 7], length 0
16:46:21.346975 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    10.187.1.24.9153 > 10.187.0.0.53748: Flags [S.], cksum 0xdb0b (correct), seq 1764271535, ack 1340604576, win 64308, options [mss 1410,sackOK,TS val 3440604438 ecr 633118295,nop,wscale 7], length 0
16:46:21.395375 IP (tos 0x0, ttl 64, id 7953, offset 0, flags [DF], proto TCP (6), length 60)
    10.187.0.0.53748 > 10.187.1.24.9153: Flags [S], cksum 0x25c8 (correct), seq 1340604575, win 64860, options [mss 1410,sackOK,TS val 633119366 ecr 0,nop,wscale 7], length 0
16:46:21.395409 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    10.187.1.24.9153 > 10.187.0.0.53748: Flags [S.], cksum 0xdada (correct), seq 1764271535, ack 1340604576, win 64308, options [mss 1410,sackOK,TS val 3440604487 ecr 633118295,nop,wscale 7], length 0
16:46:23.426969 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    10.187.1.24.9153 > 10.187.0.0.53748: Flags [S.], cksum 0xd2eb (correct), seq 1764271535, ack 1340604576, win 64308, options [mss 1410,sackOK,TS val 3440606518 ecr 633118295,nop,wscale 7], length 0
16:46:23.475374 IP (tos 0x0, ttl 64, id 7954, offset 0, flags [DF], proto TCP (6), length 60)
    10.187.0.0.53748 > 10.187.1.24.9153: Flags [S], cksum 0x1da8 (correct), seq 1340604575, win 64860, options [mss 1410,sackOK,TS val 633121446 ecr 0,nop,wscale 7], length 0

当时没详细的看上面的报文,这里来仔细分析下上面的报文,收到 10.187.1.24.9153 回复的报文里 seq 都是 1340604575 ,从抓包现象看是这个握手包确实回来了,但是从 seq 的数字看是没有接收者,也是就是目的主机上 pod 一直 tcp 重传。查看了下路由:

$ cat /run/flannel/subnet.env
FLANNEL_NETWORK=10.187.0.0/16
FLANNEL_SUBNET=10.187.0.1/24
FLANNEL_MTU=1450
FLANNEL_IPMASQ=true
$ ip a s flannel.1
542: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group default 
    link/ether b6:9b:ed:b0:37:74 brd ff:ff:ff:ff:ff:ff
    inet 10.187.0.0/32 scope global flannel.1
       valid_lft forever preferred_lft forever
    inet6 fe80::b49b:edff:feb0:3774/64 scope link
       valid_lft forever preferred_lft forever
$ ip route get 10.187.0.0
local 10.187.0.0 dev lo src 10.187.0.0 uid 0
    cache <local>

绝了,居然错了,莫名奇妙的是 lo ,看了下 NetworkManager 是开启的,重启了下它。

$ systemctl restart NetworkManager
$ ip route get 10.187.0.0
broadcast 10.187.0.0 dev cni0 src 10.187.0.1 uid 0
    cache <local,brd>
$ route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         172.31.159.254  0.0.0.0         UG    100    0        0 eno1
10.185.0.0      0.0.0.0         255.255.0.0     U     0      0        0 docker0
10.187.0.0      0.0.0.0         255.255.255.0   U     0      0        0 cni0
172.31.159.0    0.0.0.0         255.255.255.0   U     100    0        0 eno1

路由正确了,但是 flannel 到其他节点的路由消失了,得重启下 flannel。

$ docker ps -a | grep flannel
4b3f04e62b25        122cdb7aa710                                 "/opt/bin/flanneld -…"    2 hours ago         Up 2 hours                                             k8s_kube-flannel_kube-flannel-ds-22bwd_kube-system_6f5ce812-c5ae-4102-9398-c4a6fee4c7ab_0
$ docker restart 4b3
4b3
$ route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         172.31.159.254  0.0.0.0         UG    100    0        0 eno1
10.185.0.0      0.0.0.0         255.255.0.0     U     0      0        0 docker0
10.187.0.0      0.0.0.0         255.255.255.0   U     0      0        0 cni0
10.187.1.0      10.187.1.0      255.255.255.0   UG    0      0        0 flannel.1
10.187.2.0      10.187.2.0      255.255.255.0   UG    0      0        0 flannel.1
10.187.3.0      10.187.3.0      255.255.255.0   UG    0      0        0 flannel.1
10.187.4.0      10.187.4.0      255.255.255.0   UG    0      0        0 flannel.1
172.31.159.0    0.0.0.0         255.255.255.0   U     100    0        0 eno1
$ ip route get 10.187.0.0
broadcast 10.187.0.0 dev cni0 src 10.187.0.1 uid 0
    cache <local,brd>

再 curl 下试试:

curl -I 10.187.4.24:9153/metrics
HTTP/1.1 200 OK
Content-Length: 19491
Content-Type: text/plain; version=0.0.4; charset=utf-8
Date: Tue, 20 Oct 2020 10:14:42 GMT

然后每台机器上去操作了下,集群跨节点网络没有任何问题了。我们也有其他开了 NetworkManager 的 K8S 环境,但是麒麟系统上是头一次遇到这个

个人对于 NetworkManager 的一些看法

这个东西我个人角度讲是感觉不成熟,之前有次同事用 nmcli 配置的掩码导致 VIP 失效,配置文件里是 PREFIX ,最后我改回 NETMASK 正常。它是一个 daemon 进程,但是现在 Linux 上的网络技术层出不穷,它并没有适配好大多数,而且更新缓慢。


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK