

一次kube-controller-manager的bug导致的线上无法调度处理过程
source link: https://zhangguanzhang.github.io/2019/03/22/k8s-controller-error/?amp%3Butm_medium=referral
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

线上的k8s环境普遍版本很低,今天同事心血来潮去一个新节点上去看了下发现根分区满了,清理了后发有一个pod没有创建出来。清理后把因为文件系统满了而down掉的k8s相关进程起来后发现kubelet的log一直报错cannot remove /var/lib/kubelet/xxxxconfig/key resource busy now 啥的。查看了下describe对应rc还没有任何Event
$ kubectl describe rc rabbit3rc Name: rabbit3rc Namespace: default Selector: app=rabbitmq-cluster,node=rabbit3 Labels: app=rabbitmq-cluster node=rabbit3 Annotations: <none> Replicas: 1 current / 1 desired Pods Status: 0 Running / 0 Waiting / 0 Succeeded / 0 Failed Pod Template: Labels: app=rabbitmq-cluster node=rabbit3 Containers: rabbit3: Image: cloud-base/rabbitmq-3.6.5:E3103-PUB-20181015-RC1 Ports: 4369/TCP, 5672/TCP, 15672/TCP, 25672/TCP Limits: cpu: 16 memory: 8Gi Requests: cpu: 400m memory: 500Mi Liveness: exec [health_check.sh] delay=600s timeout=10s period=15s #success=1 #failure=3 Environment: RABBITMQ_DEFAULT_USER: xxx RABBITMQ_DEFAULT_PASS: xxx RABBITMQ_ERLANG_COOKIE: xxx Mounts: /etc/localtime from time (rw) /var/lib/rabbitmq from rabbitmqvar (rw) Volumes: time: Type: HostPath (bare host directory volume) Path: /etc/localtime rabbitmqvar: Type: HostPath (bare host directory volume) Path: /opt/cloud/rabbitmq Events: <none>
在对应的node上看了下发现没有rabbitmq的容器,只有一个pause的容器,但是状态是Dead并且很多这样的而且删不掉。然后重启了下docker后虽然Dead的没了但是等被拉起来后对应的rabiitmq和它的pause容器都没有。怀疑调度有问题。
这个低版本集群的HA这块儿方案一直很迷,从进程存活来判断leader的机器后用systemctl查看上面的kube-controller-manager发现如下信息
Failed to update lock: Operation cannot be fulfilled on endpoints "kube-controller-manager": the object has been modified; please apply your changes to the latest version and try again
其他节点的controller认为它是leader但是它一直报错无法获取选举锁,然后复制systemd启动参数手动启动调整loglevel为8发现信息为如下
I0322 20:26:36.966931 34218 round_trippers.go:395] PUT https://100.68.24.2:6443/api/v1/namespaces/kube-system/endpoints/kube-controller-manager I0322 20:26:36.966938 34218 round_trippers.go:402] Request Headers: I0322 20:26:36.966944 34218 round_trippers.go:405] Accept: application/vnd.kubernetes.protobuf, */* I0322 20:26:36.966951 34218 round_trippers.go:405] Content-Type: application/vnd.kubernetes.protobuf I0322 20:26:36.966956 34218 round_trippers.go:405] User-Agent: kube-controller-manager/v1.6.7+095136c3078cc (linux/amd64) kubernetes/095136c/leader-election I0322 20:26:36.967726 34218 round_trippers.go:420] Response Status: 409 Conflict in 0 milliseconds I0322 20:26:36.967738 34218 round_trippers.go:423] Response Headers: I0322 20:26:36.967744 34218 round_trippers.go:426] Content-Type: application/vnd.kubernetes.protobuf I0322 20:26:36.967749 34218 round_trippers.go:426] Content-Length: 259 I0322 20:26:36.967754 34218 round_trippers.go:426] Date: Fri, 22 Mar 2019 12:26:36 GMT I0322 20:26:36.967888 34218 request.go:988] Response Body: 00000000 6b 38 73 00 0a 0c 0a 02 76 31 12 06 53 74 61 74 |k8s.....v1..Stat| 00000010 75 73 12 ea 01 0a 04 0a 00 12 00 12 07 46 61 69 |us...........Fai| 00000020 6c 75 72 65 1a a1 01 4f 70 65 72 61 74 69 6f 6e |lure...Operation| 00000030 20 63 61 6e 6e 6f 74 20 62 65 20 66 75 6c 66 69 | cannot be fulfi| 00000040 6c 6c 65 64 20 6f 6e 20 65 6e 64 70 6f 69 6e 74 |lled on endpoint| 00000050 73 20 22 6b 75 62 65 2d 63 6f 6e 74 72 6f 6c 6c |s "kube-controll| 00000060 65 72 2d 6d 61 6e 61 67 65 72 22 3a 20 74 68 65 |er-manager": the| 00000070 20 6f 62 6a 65 63 74 20 68 61 73 20 62 65 65 6e | object has been| 00000080 20 6d 6f 64 69 66 69 65 64 3b 20 70 6c 65 61 73 | modified; pleas| 00000090 65 20 61 70 70 6c 79 20 79 6f 75 72 20 63 68 61 |e apply your cha| 000000a0 6e 67 65 73 20 74 6f 20 74 68 65 20 6c 61 74 65 |nges to the late| 000000b0 73 74 20 76 65 72 73 69 6f 6e 20 61 6e 64 20 74 |st version and t| 000000c0 72 79 20 61 67 61 69 6e 22 08 43 6f 6e 66 6c 69 |ry again".Confli| 000000d0 63 74 2a 28 0a 17 6b 75 62 65 2d 63 6f 6e 74 72 |ct*(..kube-contr| 000000e0 6f 6c 6c 65 72 2d 6d 61 6e 61 67 65 72 12 00 1a |oller-manager...| 000000f0 09 65 6e 64 70 6f 69 6e 74 73 28 00 30 99 03 1a |.endpoints(.0...| 00000100 00 22 00 |.".| E0322 20:26:36.967960 34218 leaderelection.go:263] Failed to update lock: Operation cannot be fulfilled on endpoints "kube-controller-manager": the object has been modified; please apply your changes to the latest version and try again I0322 20:26:36.967971 34218 leaderelection.go:185] failed to acquire lease kube-system/kube-controller-manager ^C
去官方代码仓库准备搜下代码看看选举逻辑是如何获得锁的,结果根本看不懂,于是想着看看有没有人研究过选举逻辑啥的结果。按照关键词搜到了一个国外文章 http://gogosatellite.blogspot.com/2017/07/how-to-setup-high-availability.html ,里面一行输出日志给了我排查方向
I0607 11:04:32.485502 17291 leaderelection.go:248] lock is held by kuberm and has not yet expired I0607 11:04:32.485506 17291 leaderelection.go:185] failed to acquire lease kube-system/kube-controller-manager I0607 11:04:36.263032 17291 round_trippers.go:417] GET http://172.16.155.165:8080/api/v1/namespaces/kube-system/endpoints/kube-controller-manager 200 OK in 1 milliseconds I0607 11:04:36.263122 17291 leaderelection.go:248] lock is held by kuberm and has not yet expired I0607 11:04:36.263125 17291 leaderelection.go:185] failed to acquire lease kube-system/kube-controller-manager
猜测到是所有kube-controller-manager请求apiserver然后竞争这个ep来获得锁,用kubectl请求了下这个ep发现如下信息并且一直不会改变, annotations
的里的 holderIdentity
字段是当前leader的node
$ kubectl get --raw /api/v1/namespaces/kube-system/endpoints/kube-controller-manager | jq . { "kind": "Endpoints", "apiVersion": "v1", "metadata": { "name": "kube-controller-manager", "namespace": "kube-system", "selfLink": "/api/v1/namespaces/kube-system/endpoints/kube-controller-manager", "uid": "87e9ff0a-388b-11e9-949b-0cda411d3f00", "resourceVersion": "36217274", "creationTimestamp": "2019-02-24T23:25:54Z", "annotations": { "control-plane.alpha.kubernetes.io/leader": "{\"holderIdentity\":\"xxxxx{nodename}xxxxxx\",\"leaseDurationSeconds\":15,\"acquireTime\":\"2019-02-24T23:25:54Z\",\"renewTime\":\"2019-03-17T11:20:08Z\",\"leaderTransitions\":0}" } }, "subsets": [] } $ kubectl get --raw /api/v1/namespaces/kube-system/endpoints/kube-controller-manager | jq . { "kind": "Endpoints", "apiVersion": "v1", "metadata": { "name": "kube-controller-manager", "namespace": "kube-system", "selfLink": "/api/v1/namespaces/kube-system/endpoints/kube-controller-manager", "uid": "87e9ff0a-388b-11e9-949b-0cda411d3f00", "resourceVersion": "36217274", "creationTimestamp": "2019-02-24T23:25:54Z", "annotations": { "control-plane.alpha.kubernetes.io/leader": "{\"holderIdentity\":\"Xxxxxxx-S02\",\"leaseDurationSeconds\":15,\"acquireTime\":\"2019-02-24T23:25:54Z\",\"renewTime\":\"2019-03-17T11:20:08Z\",\"leaderTransitions\":0}" } }, "subsets": [] }
在我自己搭建的新集群上查看了下发现每时每刻 resourceVersion
一直在改变,而对比上面线上的却没有任何变化
[root@k8s-m1 Kubernetes-ansible]# kubectl get --raw /api/v1/namespaces/kube-system/endpoints/kube-controller-manager | jq . { "kind": "Endpoints", "apiVersion": "v1", "metadata": { "name": "kube-controller-manager", "namespace": "kube-system", "selfLink": "/api/v1/namespaces/kube-system/endpoints/kube-controller-manager", "uid": "0915773e-4c4d-11e9-a0b8-fa163e4edb6a", "resourceVersion": "52752", "creationTimestamp": "2019-03-22T02:48:56Z", "annotations": { "control-plane.alpha.kubernetes.io/leader": "{\"holderIdentity\":\"k8s-m1_00dbe494-4c4d-11e9-a89f-fa163ed10d54\",\"leaseDurationSeconds\":15,\"aransitions\":1}" } } } [root@k8s-m1 Kubernetes-ansible]# kubectl get --raw /api/v1/namespaces/kube-system/endpoints/kube-controller-manager | jq . { "kind": "Endpoints", "apiVersion": "v1", "metadata": { "name": "kube-controller-manager", "namespace": "kube-system", "selfLink": "/api/v1/namespaces/kube-system/endpoints/kube-controller-manager", "uid": "0915773e-4c4d-11e9-a0b8-fa163e4edb6a", "resourceVersion": "52772", "creationTimestamp": "2019-03-22T02:48:56Z", "annotations": { "control-plane.alpha.kubernetes.io/leader": "{\"holderIdentity\":\"k8s-m1_00dbe494-4c4d-11e9-a89f-fa163ed10d54\",\"leaseDurationSeconds\":15,\"aransitions\":1}" } } }
为了确定字段 holderIdentity
是leader,便尝试通过关闭 kube-controller-manager
看看字段会不会变,发现真的改变并且会有新的ep又出现了(注意看 creationTimestamp
实际上不是生成)
[root@k8s-m1 Kubernetes-ansible]# systemctl stop kube-controller-manager.service [root@k8s-m1 Kubernetes-ansible]# kubectl get --raw /api/v1/namespaces/kube-system/endpoints/kube-controller-manager | jq . { "kind": "Endpoints", "apiVersion": "v1", "metadata": { "name": "kube-controller-manager", "namespace": "kube-system", "selfLink": "/api/v1/namespaces/kube-system/endpoints/kube-controller-manager", "uid": "0915773e-4c4d-11e9-a0b8-fa163e4edb6a", "resourceVersion": "52819", "creationTimestamp": "2019-03-22T02:48:56Z", "annotations": { "control-plane.alpha.kubernetes.io/leader": "{\"holderIdentity\":\"k8s-m1_00dbe494-4c4d-11e9-a89f-fa163ed10d54\",\"leaseDurationSeconds\":15,\"aransitions\":1}" } } } [root@k8s-m1 Kubernetes-ansible]# kubectl get --raw /api/v1/namespaces/kube-system/endpoints/kube-controller-manager | jq . { "kind": "Endpoints", "apiVersion": "v1", "metadata": { "name": "kube-controller-manager", "namespace": "kube-system", "selfLink": "/api/v1/namespaces/kube-system/endpoints/kube-controller-manager", "uid": "0915773e-4c4d-11e9-a0b8-fa163e4edb6a", "resourceVersion": "52819", "creationTimestamp": "2019-03-22T02:48:56Z", "annotations": { "control-plane.alpha.kubernetes.io/leader": "{\"holderIdentity\":\"k8s-m1_00dbe494-4c4d-11e9-a89f-fa163ed10d54\",\"leaseDurationSeconds\":15,\"aransitions\":1}" } } }
基本就确定了某些原因(可能文件系统慢导致的也可能其他),于是生产环境上删除ep来踢出无法修改的leader信息,发现虽然还是他自己获取到leader,但是 resourceVersion
开始刷新了,而controller的日志里也没报错了
$ kubectl -n kube-system delete ep kube-controller-manager endpoints "kube-controller-manager" deleted $ kubectl get --raw /api/v1/namespaces/kube-system/endpoints/kube-controller-manager | jq . { "kind": "Endpoints", "apiVersion": "v1", "metadata": { "name": "kube-controller-manager", "namespace": "kube-system", "selfLink": "/api/v1/namespaces/kube-system/endpoints/kube-controller-manager", "uid": "dec669dd-4c9f-11e9-949b-0cda411d3f00", "resourceVersion": "37542637", "creationTimestamp": "2019-03-22T12:41:53Z", "annotations": { "control-plane.alpha.kubernetes.io/leader": "{\"holderIdentity\":\"Xxxxxxx-S02\",\"leaseDurationSeconds\":15,\"acquireTime\":\"2019-03-22T12:41:53Z\",\"renewTime\":\"2019-03-22T12:41:53Z\",\"leaderTransitions\":0}" } }, "subsets": [] } $ kubectl get --raw /api/v1/namespaces/kube-system/endpoints/kube-controller-manager | jq . { "kind": "Endpoints", "apiVersion": "v1", "metadata": { "name": "kube-controller-manager", "namespace": "kube-system", "selfLink": "/api/v1/namespaces/kube-system/endpoints/kube-controller-manager", "uid": "dec669dd-4c9f-11e9-949b-0cda411d3f00", "resourceVersion": "37542785", "creationTimestamp": "2019-03-22T12:41:53Z", "annotations": { "control-plane.alpha.kubernetes.io/leader": "{\"holderIdentity\":\"Xxxxxxx-S02\",\"leaseDurationSeconds\":15,\"acquireTime\":\"2019-03-22T12:41:53Z\",\"renewTime\":\"2019-03-22T12:41:59Z\",\"leaderTransitions\":0}" } }, "subsets": []
查看查看rc描述和对应的pod都起来了
kubectl describe rc rabbit3rc Name: rabbit3rc Namespace: default Selector: app=rabbitmq-cluster,node=rabbit3 Labels: app=rabbitmq-cluster node=rabbit3 Annotations: <none> Replicas: 1 current / 1 desired Pods Status: 1 Running / 0 Waiting / 0 Succeeded / 0 Failed ...
Recommend
-
25
-
39
Redis 导致线上服务每隔 5 分钟可用率下降
-
15
张哈希2小时前最近发现一个应用,使用默认的 JFR 配置,发生了类似于雪崩的现象,这里记录下,引以为戒。 Key takeaways JFR 的线程堆栈...
-
8
关注公众号:后端技术漫谈,技术之路不迷路~ 好了,进入今天的正文,今天想跟大家聊聊一次 mybatis 动态 SQL 引发的生产事故。 事情这样的,我们有个订单相关数据库服务,专门负责订单相关的增删改查。这个服务运...
-
13
一次 cni-plugins 导致集群 dns 无法解析的排错 ...
-
3
ubuntu18下io调度算法是cfq导致mysql非常慢 ...
-
5
V2EX › 程序员 如何优化 Go 频繁的读文件导致的调度开销 chuwt · 4 小时 22 分钟前 · 736...
-
8
记一次 Quantumult X 打开后导致无法查看豆油问题的解决 2021-09-28 技术 暂无评...
-
2
iterm2 给台前调度Stage Manager留出空间 2022-11-11 tech
-
7
1. 讲故事 前些天有位朋友找到我,说他程序中的线程数爆高,让我帮忙看下怎么回事,这种线程数爆高的情况找问题相对比较容易,就让朋友丢一个dump给我,看看便知。 二:为什么会爆高
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK