4

低版本内核下容器内部无法使用 unix socket 通信

 1 year ago
source link: https://zhangguanzhang.github.io/2022/05/13/overlayfs-unix-socket/#/%E5%8F%82%E8%80%83
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

最近几次遇到的机器上所有容器内,都无法使用 unix:// 去通信的的一个问题。

遇到的几个错误现象

最开始是我们部署容器内无法使用 ansible 链接,后面是机器上所有容器内的 supervisorctl 无法通过 unix sock 连接 supervisord

ansible

报错如下:

failt: [10.x.x.x]: UNREACHABLE! => {
"changed": false,
"msg": "Data could not be sent to remote host \"10.x.x.x\". Make sure this host can be reached over ssh: Control socket connect (/tmp/ansible-ssh-10.x.x.x-22-root): Connection refused
Failed to connect to new control master", "unreachable": true}

ansible 的命令加上 -vvvv 后复制 ssh 的所有参数执行,然后发现去掉 socket 的参数就能连上,以前遇到过这个问题,当时搜到 issue 后是改 ansible.cfg 把 ssh 的持久化 socket 路径换了下:

[ssh_connection]
#control_path = /tmp/ansible-ssh-%%h-%%p-%%r
control_path = /dev/shm/cp%%h-%%p-%%r

supervisor 的 unix:// refused

然后昨天我们开发机器上,发现有个业务容器内不能用 supervisorctl

$ supervisorctl status 
unix:///var/run/supervisor.sock refused connection

然后今天我们的一个部署容器发现也是这样,大致分析了下:

# 启动参数
#/usr/local/bin/python3.7 /usr/local/bin/supervisord -c /root/xxx/supervisord.conf.containd --nodaemon
# 指定配置文件还是一样
$ supervisorctl -c /root/xxx/supervisord.conf.containd status
unix:///var/run/supervisor.sock refused connection

# socket 文件存在,权限也对
ls -l /var/run/supervisor.sock
srw-rw---- 1 root root 0 5月 13 11:32 /var/run/supervisor.sock

配置里相关的都是对的

[unix_http_server]
file=/var/run/supervisor.sock ; (the path to the socket file)
chmod=0700 ; sockef file mode (default 0700)

[rpcinterface:supervisor]
supervisor.rpcinterface_factory = supervisor.rpcinterface:make_main_rpcinterface

[supervisorctl]
serverurl=unix:///var/run/supervisor.sock ; use a unix:// URL for a unix socket

然后对比另一个我自己的正常的 CentOS 机器上部署容器内部的 supervisor 配置文件发现是一模一样的,断定和配置无关,然后搜了下发现是内核问题。低版本的 overlayfs 内部使用 unix sock 文件通信会出问题,机器相关信息为:

$ cat /etc/os-release
NAME="Ubuntu"
VERSION="16.04 LTS (Xenial Xerus)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 16.04 LTS"
VERSION_ID="16.04"
HOME_URL="http://www.ubuntu.com/"
SUPPORT_URL="http://help.ubuntu.com/"
BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
UBUNTU_CODENAME=xenial
$ uname -a
Linux xxx 4.4.0-21-generic #37-Ubuntu ....

解决容器内使用 sock 文件通信的问题

ubuntu 官方说这个在内核 4.4.0-36.55 修复了,需要升级内核,先说下升级内核的步骤

$ dpkg --get-selections |grep -E '^linux'
linux-base install
linux-firmware install
linux-generic install
linux-headers-4.4.0-21 install
linux-headers-4.4.0-21-generic install
linux-headers-generic install
linux-image-4.4.0-21-generic install
linux-image-extra-4.4.0-21-generic install
linux-image-generic install

查找指定内核

$ apt-cache search linux| grep 4.4.0-36
linux-cloud-tools-4.4.0-36 - Linux kernel version specific cloud tools for version 4.4.0-36
linux-cloud-tools-4.4.0-36-generic - Linux kernel version specific cloud tools for version 4.4.0-36
linux-cloud-tools-4.4.0-36-lowlatency - Linux kernel version specific cloud tools for version 4.4.0-36
linux-headers-4.4.0-36 - Header files related to Linux kernel version 4.4.0
linux-headers-4.4.0-36-generic - Linux kernel headers for version 4.4.0 on 64 bit x86 SMP
linux-headers-4.4.0-36-lowlatency - Linux kernel headers for version 4.4.0 on 64 bit x86 SMP
linux-image-4.4.0-36-generic - Linux kernel image for version 4.4.0 on 64 bit x86 SMP
linux-image-4.4.0-36-lowlatency - Linux kernel image for version 4.4.0 on 64 bit x86 SMP
linux-image-extra-4.4.0-36-generic - Linux kernel extra modules for version 4.4.0 on 64 bit x86 SMP
linux-signed-image-4.4.0-36-generic - Signed kernel image generic
linux-signed-image-4.4.0-36-lowlatency - Signed kernel image lowlatency
linux-tools-4.4.0-36 - Linux kernel version specific tools for version 4.4.0-36
linux-tools-4.4.0-36-generic - Linux kernel version specific tools for version 4.4.0-36
linux-tools-4.4.0-36-lowlatency - Linux kernel version specific tools for version 4.4.0-36

安装指定内核:

apt-get install -y linux-{headers,image}-4.4.0-36-generic

可以看下 /boot/grub/grub.cfg 里第一个 menuentry 段的 initrd 是否指定到新的内核了,整体开机 grub 菜单可以通过下面命令查看。

$ grep -Ei 'submenu|menuentry ' /boot/grub/grub.cfg | sed -re "s/(.? )'([^']+)'.*/\1 \2/"
menuentry Ubuntu
submenu Advanced options for Ubuntu
menuentry Ubuntu, with Linux 4.4.0-36-generic
menuentry Ubuntu, with Linux 4.4.0-36-generic (recovery mode)
menuentry Ubuntu, with Linux 4.4.0-21-generic
menuentry Ubuntu, with Linux 4.4.0-21-generic (recovery mode)

如果想开机启动到 submenu 的 第三个,可以改文件 /etc/default/grub

GRUB_DEFAULT="1>2"

前面是外层菜单,后面是子菜单,0 开始,更新下 grub:

update-grub

重启后进入发现好了。

不升级内核的办法

就是容器内的 sock 文件存放到容器内部的 /dev/shm/ 里即可。

Centos 7 没问题是因为 centos 的内核是 backport 的,可以理解为 centos 的 overlay 内核模块实际上是 4.10 内核代码移植的。


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK