16

Kubernetes 集群安装部署-初心、始终

 4 years ago
source link: https://blog.51cto.com/gouyc/2485909
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Kubernetes 集群安装部署

一、服务器准备

说明主机名服务器IP配置
ETCD集群kubernetes-etcd01192.168.110.964c/3g
kubernetes-etcd02192.168.110.974c/3g
kubernetes-etcd03192.168.110.984c/3g
Master集群kubernetes-master01192.168.110.1284c/3g
kubernetes-master02192.168.110.1304c/3g
kubernetes-master03192.168.110.1314c/3g
Worker集群kubernetes-worker01192.168.110.1064c/3g
kubernetes-worker02192.168.110.1074c/3g
kubernetes-worker03192.168.110.1084c/3g
集群Proxykubernetes-nginx01192.168.110.1044c/3g
kubernetes-nginx02192.168.110.1054c/3g

二、集群架构

k8scluster.png

三、Proxy集群安装

说明:因为k8s其组件之间通信只能配置单IP,存在单点故障,所以使用Nginx做负载均衡搭配Keepalived高可用解决集群的单点问题

1、安装nginx服务

[root@jumpserver ~]# ansible test_k8s_ng -m yum -a "name=nginx"
[root@jumpserver ~]# ansible test_k8s_ng -m shell -a "systemctl enable nginx"

1.1、编辑nginx配置文件

# For more information on configuration, see:
#   * Official English Documentation: http://nginx.org/en/docs/
#   * Official Russian Documentation: http://nginx.org/ru/docs/

user nginx;
worker_processes auto;
error_log /var/log/nginx/error.log;
pid /run/nginx.pid;

# Load dynamic modules. See /usr/share/doc/nginx/README.dynamic.
include /usr/share/nginx/modules/*.conf;

events {
    worker_connections 65535;
    use epoll;
}

stream {
  upstream etcd_cluster {
    server 192.168.110.96:2379 max_fails=1 fail_timeout=2s;
    server 192.168.110.97:2379 max_fails=1 fail_timeout=2s;
    server 192.168.110.98:2379 max_fails=1 fail_timeout=2s;
  }

 upstream apiserver_cluster {
    server 192.168.110.128:6443 max_fails=1 fail_timeout=2s;
    server 192.168.110.130:6443 max_fails=1 fail_timeout=2s;
    server 192.168.110.131:6443 max_fails=1 fail_timeout=2s;
  }

  server {
    listen 2379;
    proxy_connect_timeout 1s;
    proxy_pass etcd_cluster;
  }

 server {
    listen 6443;
    proxy_connect_timeout 1s;
    proxy_pass apiserver_cluster;
  }
}

1.2、nginx配置文件分发到nginx节点

[root@jumpserver json]# ansible test_k8s_ng -m copy -a "src=./nginx.conf dest=/etc/nginx/"

2、安装keepalived服务

[root@jumpserver json]# ansible test_k8s_ng -m yum -a "name=keepalived"
[root@jumpserver json]# ansible test_k8s_ng -m shell -a "systemctl enable keepalived"

2.1、编辑nginx监测脚本

[root@jumpserver json]# vim nginx_check.sh
#!/bin/bash

counter=$(ps -C nginx --no-heading|wc -l)
if [ "${counter}" = "0" ]; then
  /usr/sbin/nginx
  sleep 2
  counter=$(ps -C nginx --no-heading|wc -l)
  if [ "${counter}" = "0" ]; then
    systemctl stop keepalievd
  fi
fi

[root@jumpserver json]# ansible test_k8s_ng -m copy -a "src=./nginx_check.sh dest=/etc/keepalived/ mode=755"
[root@jumpserver json]# ansible test_k8s_ng -m shell -a "ls -l /etc/keepalived/"
192.168.110.104 | CHANGED | rc=0 >>
总用量 8
-rw-r--r-- 1 root root 3598 8月  13 2019 keepalived.conf
-rw-r--r-- 1 root root  232 4月   8 10:47 nginx_check.sh

192.168.110.105 | CHANGED | rc=0 >>
总用量 8
-rw-r--r-- 1 root root 3598 8月  13 2019 keepalived.conf
-rw-r--r-- 1 root root  232 4月   8 10:47 nginx_check.sh

2.2、编辑keepalived配置文件

! Configuration File for keepalived

global_defs {
    router_id lb01
  }

vrrp_script chk_nginx {
    script "/etc/keepalived/nginx_check.sh" #配置Nginx状态检查脚本路径
    interval 2
    weight -20
  }

vrrp_instance VI_1 {
    state MASTER #配置节点为master主节点
    interface eth0
    virtual_router_id 51
    priority 100 #配置权重为100
    advert_int 1

  authentication {
      auth_type PASS
      auth_pass 1111
    }

  track_script {
      chk_nginx #配置执行脚本
    }

  virtual_ipaddress {
      192.168.110.230/24 dev eth0 label eth0:0 #配置网卡绑定VIP地址
    }
  }

2.3、将keepalived配置文件分发到nginx节点

[root@jumpserver json]# ansible test_k8s_ng -m copy -a "src=keepalived.conf dest=/etc/keepalived/"

2.4、修改nginx从节点的keepalived配置文件

[root@jumpserver json]# ansible 192.168.110.105 -m shell -a "sed -i 's/state MASTER/state BACKUP/g' /etc/keepalived/keepalived.conf"
[root@jumpserver json]# ansible 192.168.110.105 -m shell -a "sed -i 's/priority 100/priority 50/g' /etc/keepalived/keepalived.conf"

3、启动nginx服务

[root@jumpserver json]# ansible test_k8s_ng -m service -a "name=nginx state=started"
[root@jumpserver json]# ansible test_k8s_ng -m shell -a "netstat -tlunp |egrep '6443|2379'"
192.168.110.104 | CHANGED | rc=0 >>
tcp        0      0 0.0.0.0:6443            0.0.0.0:*               LISTEN      18174/nginx: master 
tcp        0      0 0.0.0.0:2379            0.0.0.0:*               LISTEN      18174/nginx: master 

192.168.110.105 | CHANGED | rc=0 >>
tcp        0      0 0.0.0.0:6443            0.0.0.0:*               LISTEN      18200/nginx: master 
tcp        0      0 0.0.0.0:2379            0.0.0.0:*               LISTEN      18200/nginx: master

4、启动keepalived服务

4.1、先启动主keepalived服务

[root@jumpserver json]# ansible 192.168.110.104 -m service -a "name=keepalived state=started"
[root@jumpserver json]# ansible 192.168.110.104 -m shell -a "ip addr"
192.168.110.104 | CHANGED | rc=0 >>
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
    link/ether 00:50:56:be:3d:18 brd ff:ff:ff:ff:ff:ff
    inet 192.168.110.104/24 brd 192.168.110.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet 192.168.110.230/24 scope global secondary eth0:0
       valid_lft forever preferred_lft forever
    inet6 fe80::250:56ff:febe:3d18/64 scope link 
       valid_lft forever preferred_lft forever
    
[root@jumpserver json]# ping 192.168.110.230
PING 192.168.110.230 (192.168.110.230) 56(84) bytes of data.
64 bytes from 192.168.110.230: icmp_seq=1 ttl=64 time=0.393 ms
64 bytes from 192.168.110.230: icmp_seq=2 ttl=64 time=0.343 ms
64 bytes from 192.168.110.230: icmp_seq=3 ttl=64 time=0.383 ms
64 bytes from 192.168.110.230: icmp_seq=4 ttl=64 time=0.435 ms

说明:可以看到虚拟ip已经激活并且可以访问

4.2、启动从keepalived服务

[root@jumpserver json]# ansible 192.168.110.105 -m service -a "name=keepalived state=started"
[root@jumpserver json]# ansible test_k8s_ng -m shell -a "ps -ef |grep keepalived"
192.168.110.105 | CHANGED | rc=0 >>
root     18482     1  0 11:16 ?        00:00:00 /usr/sbin/keepalived -D
root     18483 18482  0 11:16 ?        00:00:00 /usr/sbin/keepalived -D
root     18484 18482  0 11:16 ?        00:00:00 /usr/sbin/keepalived -D
root     18535 18534  0 11:19 pts/1    00:00:00 /bin/sh -c ps -ef |grep keepalived
root     18537 18535  0 11:19 pts/1    00:00:00 grep keepalived

192.168.110.104 | CHANGED | rc=0 >>
root     18463     1  0 11:14 ?        00:00:00 /usr/sbin/keepalived -D
root     18464 18463  0 11:14 ?        00:00:00 /usr/sbin/keepalived -D
root     18465 18463  0 11:14 ?        00:00:00 /usr/sbin/keepalived -D
root     18562 18561  0 11:19 pts/0    00:00:00 /bin/sh -c ps -ef |grep keepalived
root     18564 18562  0 11:19 pts/0    00:00:00 grep keepalived

5、使用etcdctl测试proxy地址

[root@kubernetes-etcd01 ~]# etcdctl --endpoints "https://192.168.110.104:2379" --cert "/application/etcd/pki/server.pem" --key "/application/etcd/pki/server-key.pem" --cacert "/application/etcd/pki/ca.pem" member list
71b29764d8cc5f1a, started, kubernetes-etcd03, https://192.168.110.98:2380, https://192.168.110.96:2379,https://192.168.110.97:2379,https://192.168.110.98:2379
8d3877faaee1ab32, started, kubernetes-etcd01, https://192.168.110.96:2380, https://192.168.110.96:2379,https://192.168.110.97:2379,https://192.168.110.98:2379
c9ef20e9b8fa5e73, started, kubernetes-etcd02, https://192.168.110.97:2380, https://192.168.110.96:2379,https://192.168.110.97:2379,https://192.168.110.98:2379
[root@kubernetes-etcd01 ~]# 
[root@kubernetes-etcd01 ~]# etcdctl --endpoints "https://192.168.110.105:2379" --cert "/application/etcd/pki/server.pem" --key "/application/etcd/pki/server-key.pem" --cacert "/application/etcd/pki/ca.pem" member list
71b29764d8cc5f1a, started, kubernetes-etcd03, https://192.168.110.98:2380, https://192.168.110.96:2379,https://192.168.110.97:2379,https://192.168.110.98:2379
8d3877faaee1ab32, started, kubernetes-etcd01, https://192.168.110.96:2380, https://192.168.110.96:2379,https://192.168.110.97:2379,https://192.168.110.98:2379
c9ef20e9b8fa5e73, started, kubernetes-etcd02, https://192.168.110.97:2380, https://192.168.110.96:2379,https://192.168.110.97:2379,https://192.168.110.98:2379
[root@kubernetes-etcd01 ~]# 
[root@kubernetes-etcd01 ~]# etcdctl --endpoints "https://192.168.110.230:2379" --cert "/application/etcd/pki/server.pem" --key "/application/etcd/pki/server-key.pem" --cacert "/application/etcd/pki/ca.pem" member list
71b29764d8cc5f1a, started, kubernetes-etcd03, https://192.168.110.98:2380, https://192.168.110.96:2379,https://192.168.110.97:2379,https://192.168.110.98:2379
8d3877faaee1ab32, started, kubernetes-etcd01, https://192.168.110.96:2380, https://192.168.110.96:2379,https://192.168.110.97:2379,https://192.168.110.98:2379
c9ef20e9b8fa5e73, started, kubernetes-etcd02, https://192.168.110.97:2380, https://192.168.110.96:2379,https://192.168.110.97:2379,https://192.168.110.98:2379

6、测试keepalived虚拟IP漂移

6.1、在主节点上停止Keepalived服务

[root@kubernetes-nginx01 ~]# service keepalived stop
Redirecting to /bin/systemctl stop  keepalived.service
[root@kubernetes-nginx01 ~]# 
[root@kubernetes-nginx01 ~]# 
[root@kubernetes-nginx01 ~]# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
    link/ether 00:50:56:be:3d:18 brd ff:ff:ff:ff:ff:ff
    inet 192.168.110.104/24 brd 192.168.110.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::250:56ff:febe:3d18/64 scope link 
       valid_lft forever preferred_lft forever

[root@kubernetes-etcd02 ~]# etcdctl --endpoints "https://192.168.110.230:2379" --cert "/application/etcd/pki/server.pem" --key "/application/etcd/pki/server-key.pem" --cacert "/application/etcd/pki/ca.pem" member list
71b29764d8cc5f1a, started, kubernetes-etcd03, https://192.168.110.98:2380, https://192.168.110.96:2379,https://192.168.110.97:2379,https://192.168.110.98:2379
8d3877faaee1ab32, started, kubernetes-etcd01, https://192.168.110.96:2380, https://192.168.110.96:2379,https://192.168.110.97:2379,https://192.168.110.98:2379
c9ef20e9b8fa5e73, started, kubernetes-etcd02, https://192.168.110.97:2380, https://192.168.110.96:2379,https://192.168.110.97:2379,https://192.168.110.98:2379

说明:可以看到,虚拟ip已经漂移到了从节点的nginx上。并且通过虚拟ip还可以访问到etcd集群。

6.2、在主节点恢复nginx查看虚拟ip是否恢复

[root@kubernetes-nginx01 ~]# service keepalived start
Redirecting to /bin/systemctl start  keepalived.service
[root@kubernetes-nginx01 ~]# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
    link/ether 00:50:56:be:3d:18 brd ff:ff:ff:ff:ff:ff
    inet 192.168.110.104/24 brd 192.168.110.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet 192.168.110.230/24 scope global secondary eth0:0
       valid_lft forever preferred_lft forever
    inet6 fe80::250:56ff:febe:3d18/64 scope link 
       valid_lft forever preferred_lft forever

说明:可以看到,虚拟IP已经漂移到主nginx节点上。

四、ETCD集群安装

说明:因为集群安装需要做大量的重复工作,故我使用ansible进行操作

1、安装etcd服务

[root@jumpserver ~]# ansible test_k8s_etcd -m yum -a "name=etcd"

1.1、配置hosts解析

[root@jumpserver ~]# ansible test_k8s_etcd -m shell -a "echo -e '192.168.110.96 kubernetes-etcd01\n192.168.110.97 kubetnetes-etcd02\n192.168.110.98 kubernetes-etcd03' >>/etc/hosts"

1.2、配置环境变量指定etcdctl_api的版本

[root@jumpserver ~]# ansible test_k8s_etcd -m shell -a "echo 'export ETCDCTL_API=3' >> /etc/profile"
[root@jumpserver ~]# ansible test_k8s_etcd -m shell -a "source /etc/profile"

1.3、将etcd加入系统服务

[root@jumpserver ~]# ansible test_k8s_etcd -m shell -a "systemctl enable etcd"

2、生成etcd集群证书文件

2.1、生成ca根证书

编辑ca-config证书配置文件

[root@jumpserver json]# mkdir /root/create_cert/{cert,json} -p && cd /root/create_cert/json
[root@jumpserver json]# cat ca-config.json
{
    "signing": {
        "default": {
            "expiry": "43800h"
        },
        "profiles": {
            "server": {
                "expiry": "43800h",
                "usages": [
                    "signing",
                    "key encipherment",
                    "server auth",
                    "client auth"
                ]
            },
            "client": {
                "expiry": "43800h",
                "usages": [
                    "signing",
                    "key encipherment",
                    "client auth"
                ]
            },
            "peer": {
                "expiry": "43800h",
                "usages": [
                    "signing",
                    "key encipherment",
                    "server auth",
                    "client auth"
                ]
            }
        }
    }
}

编辑ca-csr配置文件

[root@jumpserver json]# cat ca-csr.json 
{
    "CN": "etcd",
    "key": {
        "algo": "rsa",
        "size": 2048
    }
}

生成ca证书

cfssl gencert -initca /root/create_cert/json/ca-csr.json |cfssljson -bare ca -

2.2、生成client证书

编辑client证书配置文件

[root@jumpserver json]# cat client.json 
{
    "CN": "client",
    "key": {
        "algo": "ecdsa",
        "size": 256
    }
}

生成client证书

cfssl gencert -ca=../cert/ca.pem -ca-key=../cert/ca-key.pem -config=./ca-config.json -profile=client ./client.json | cfssljson -bare clien

2.3、生成server证书

编辑config.json配置文件

[root@jumpserver json]# cat config.json 
{
    "CN": "citylife",
    "hosts": [
        "kubernetes-etcd01",
        "kubernetes-etcd02",
        "kubernetes-etcd03",
        "192.168.110.96",
        "192.168.110.97",
        "192.168.110.98",
        "192.168.110.128",
        "192.168.110.130",
        "192.168.110.131",
        "192.168.110.104",
        "192.168.110.230",
    ],
    "key": {
        "algo": "ecdsa",
        "size": 256
    },
    "names": [
        {
            "C": "US",
            "ST": "CA",
            "L": "San Francisco"
        }
    ]
}

注意:这里的hosts中要配置所有要和etcd集群通信的组件的ip地址

生成server和peer证书

[root@jumpserver json]# cfssl gencert -ca=../cert/ca.pem -ca-key=../cert/ca-key.pem -config=ca-config.json -profile=server config.json |cfssljson -bare server
[root@jumpserver json]# cfssl gencert -ca=../cert/ca.pem -ca-key=../cert/ca-key.pem -config=ca-config.json -profile=peer config.json |cfssljson -bare peer

2.4、将生成好的证书拷贝到etcd其他节点

[root@jumpserver json]# ls -l ../cert/
总用量 48
-rw-r--r-- 1 root root  883 4月   3 16:02 ca.csr
-rw------- 1 root root 1679 4月   3 16:02 ca-key.pem
-rw-r--r-- 1 root root 1078 4月   3 16:02 ca.pem
-rw-r--r-- 1 root root  351 4月   3 16:08 client.csr
-rw------- 1 root root  227 4月   3 16:08 client-key.pem
-rw-r--r-- 1 root root  875 4月   3 16:08 client.pem
-rw-r--r-- 1 root root  590 4月   3 18:33 peer.csr
-rw------- 1 root root  227 4月   3 18:33 peer-key.pem
-rw-r--r-- 1 root root 1103 4月   3 18:33 peer.pem
-rw-r--r-- 1 root root  590 4月   3 18:33 server.csr
-rw------- 1 root root  227 4月   3 18:33 server-key.pem
-rw-r--r-- 1 root root 1103 4月   3 18:33 server.pem
[root@jumpserver json]# ansible test_k8s_etcd -m copy -a "mkdir -p /aplication/etcd/pki"
[root@jumpserver json]# ansible test_k8s_etcd -m copy -a "src=/root/create_cert/cert/ dest=/application/etcd/pki/ owner=etcd mode=600"

2.5、 编辑etcd配置文件

[root@jumpserver json]# ansible test_k8s_etcd -m shell -a "mkdir -p /application/etcd/default.etcd"

[root@kubernetes-etcd01 ~]# cat /etc/etcd/etcd.conf 
#[Member]
ETCD_DATA_DIR="/application/etcd/default.etcd"
ETCD_LISTEN_PEER_URLS="https://192.168.110.96:2380"
ETCD_LISTEN_CLIENT_URLS="https://192.168.110.96:2379"
ETCD_NAME="kubernetes-etcd01"                              # etcd节点的hostname

#[Clustering]
ETCD_INITIAL_ADVERTISE_PEER_URLS="    # 当前节点监听的地址 
ETCD_ADVERTISE_CLIENT_URLS=" 

ETCD_INITIAL_CLUSTER="kubernetes-etcd01=https://192.168.110.96:2380,kubernetes-etcd02=https://192.168.110.97:2380,kubernetes-etcd03=https://192.168.110.98:2380"
ETCD_INITIAL_CLUSTER_TOKEN="etcd-cluster"
ETCD_INITIAL_CLUSTER_STATE="new"

#[Security]
ETCD_CERT_FILE="/application/etcd/pki/server.pem"
ETCD_KEY_FILE="/application/etcd/pki/server-key.pem"
ETCD_CLIENT_CERT_AUTH="true"
ETCD_TRUSTED_CA_FILE="/application/etcd/pki/ca.pem"
ETCD_AUTO_TLS="true"
ETCD_PEER_CERT_FILE="/application/etcd/pki/peer.pem"
ETCD_PEER_KEY_FILE="/application/etcd/pki/peer-key.pem"
ETCD_PEER_CLIENT_CERT_AUTH="true"
ETCD_PEER_TRUSTED_CA_FILE="/application/etcd/pki/ca.pem"
ETCD_PEER_AUTO_TLS="true"

2.5、启动etcd集群

[root@jumpserver json]# ansible test_k8s_etcd -m service -a "name=etcd state=started"

注意:第二次启动集群时,要将配置文件中的”ETCD_INITIAL_CLUSTER_STATE=new“字段改成”existing“,否则启动不成功

2.6、验证集群状态

[root@kubernetes-etcd01 ~]# etcdctl --endpoints "https://192.168.110.96:2379,https://192.168.110.97:2379,https://192.168.110.98:2379" --cert "/application/etcd/pki/server.pem" --key "/application/etcd/pki/server-key.pem" --cacert "/application/etcd/pki/ca.pem"  endpoint health
https://192.168.110.96:2379 is healthy: successfully committed proposal: took = 4.662674ms
https://192.168.110.97:2379 is healthy: successfully committed proposal: took = 2.713563ms
https://192.168.110.98:2379 is healthy: successfully committed proposal: took = 4.360699ms

返回如上信息代表集群安装正常

向集群添加数据

[root@kubernetes-etcd01 ~]# etcdctl --endpoints "https://192.168.110.96:2379" --cert "/application/etcd/pki/server.pem" --key "/application/etcd/pki/server-key.pem" --cacert "/application/etcd/pki/ca.pem" put etcd-cluster ok
OK
[root@kubernetes-etcd01 ~]#

在其他节点查看数据

[root@kubernetes-etcd01 ~]# etcdctl --endpoints "https://192.168.110.97:2379" --cert "/application/etcd/pki/server.pem" --key "/application/etcd/pki/server-key.pem" --cacert "/application/etcd/pki/ca.pem" get etcd-cluster
etcd-cluster
ok

删除集群数据

[root@kubernetes-etcd01 ~]# etcdctl --endpoints "https://192.168.110.98:2379" --cert "/application/etcd/pki/server.pem" --key "/application/etcd/pki/server-key.pem" --cacert "/application/etcd/pki/ca.pem" del etcd-cluster ok
1
[root@kubernetes-etcd01 ~]#

再次查看数据

[root@kubernetes-etcd01 ~]# etcdctl --endpoints "https://192.168.110.97:2379" --cert "/application/etcd/pki/server.pem" --key "/application/etcd/pki/server-key.pem" --cacert "/application/etcd/pki/ca.pem" get etcd-cluster
[root@kubernetes-etcd01 ~]#

没有任何返回代表数据已经删除

五、安装部署kubernetes集群

1、准备工作

1.1、设置hosts解析

[root@jumpserver json]# ansible test_k8s_cluster -m shell -a "echo -e '192.168.110.128 kubernetes-master01\n192.168.110.230 kubernetes-mster02\n192.168.110.231 kubernetes-master03\n192.168.110.106 kubernetes-worker01\n192.168.110.107 kubernetes-worker02\n192.168.110.108 kubernetes-worker03' >>/etc/hosts"

1.2、设置系统参数

[root@jumpserver json]# ansible test_k8s_cluster -m shell -a "echo -e 'net.bridge.bridge-nf-call-iptables = 1\nnet.ipv4.ip_forward = 1' >>/etc/sysctl.conf"
[root@jumpserver json]# ansible test_k8s_cluster -m shell -a "sysctl -p"
192.168.110.107 | CHANGED | rc=0 >>
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1

192.168.110.106 | CHANGED | rc=0 >>
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1

192.168.110.131 | CHANGED | rc=0 >>
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1

192.168.110.130 | CHANGED | rc=0 >>
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1

192.168.110.128 | CHANGED | rc=0 >>
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1

192.168.110.108 | CHANGED | rc=0 >>
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1

1.3、关闭swap分区、防火墙和selinux

[root@jumpserver json]# ansible test_k8s_cluster -m shell -a "systemctl disable firewalld"
[root@jumpserver json]# ansible test_k8s_cluster -m shell -a "systemctl stop firewalld"
[root@jumpserver json]# ansible test_k8s_cluster -m shell -a "setenforce 0"
[root@jumpserver json]# ansible test_k8s_cluster -m shell -a "sed -i 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/sysconfig/selinux"
[root@jumpserver json]# ansible test_k8s_cluster -m shell -a "swapoff -a"
[root@jumpserver json]# ansible test_k8s_cluster -m shell -a "sed -i '/swap/d' /etc/fstab"

2、安装docker(在所有master&worker节点执行)

2.1、安装yum仓库管理工具

[root@jumpserver json]# ansible test_k8s_cluster -m yum -a "name=yum-utils"
[root@jumpserver json]# ansible test_k8s_cluster -m shell -a "yum-config-manager --add-repo

2.2、安装docker并启动

[root@jumpserver json]# ansible test_k8s_cluster -m yum -a "name=docker-ce"
[root@jumpserver json]# ansible test_k8s_cluster -m shell -a "systemctl enable docker"
[root@kubernetes-master01 ~]# cat /etc/docker/daemon.json
{
        "data-root": "/application/docker",
        "registry-mirrors": ["https://registry.cn-beijing.aliyuncs.com"],
        "exec-opts": ["native.cgroupdriver=systemd"]
}

[root@jumpserver json]# ansible test_k8s_cluster -m copy -a "src=./daemon.json dest=/etc/docker/"
[root@jumpserver json]# ansible test_k8s_cluster -m service -a "name=docker state=restarted"
[root@jumpserver json]# ansible test_k8s_cluster -m shell -a "docker info" |grep cgroup
Cgroup Driver: cgroupfs
Cgroup Driver: cgroupfs
Cgroup Driver: cgroupfs
Cgroup Driver: cgroupfs
Cgroup Driver: cgroupfs
Cgroup Driver: cgroupfs
[root@jumpserver json]# ansible test_k8s_cluster -m shell -a "docker info" |grep overlay2
 Storage Driver: overlay2
 Storage Driver: overlay2
 Storage Driver: overlay2
 Storage Driver: overlay2
 Storage Driver: overlay2
 Storage Driver: overlay2

3、安装kubernetes的客户端工具,集群初始化工具和kubelet组件

[root@jumpserver json]# cat <<EOF > /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64/
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey= https://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg
EOF
[root@jumpserver json]# ansible test_k8s_cluster -m copy -a "src=/etc/yum.repos.d/kubernetes.repo  dest=/etc/yum.repos.d/"
[root@jumpserver json]# ansible test_k8s_cluster -m shell -a "yum install kubeadm-1.16.8 kubectl-1.16.8 kubelet-1.16.8 -y"

4、配置kubelet的cgroups(一定要和docker info的一样)

[root@jumpserver json]# ansible test_k8s_cluster -m shell -a "sed -i '4a Environment="KUBELET_CGROUP_ARGS=--cgroup-driver=cgroupfs"' /usr/lib/systemd/system/kubelet.service.d/10-kubeadm.conf"
[root@jumpserver json]# ansible test_k8s_cluster -m shell -a "cat  /usr/lib/systemd/system/kubelet.service.d/10-kubeadm.conf" |grep cgroupfs
Environment=KUBELET_CGROUP_ARGS=--cgroup-driver=cgroupfs
Environment=KUBELET_CGROUP_ARGS=--cgroup-driver=cgroupfs
Environment=KUBELET_CGROUP_ARGS=--cgroup-driver=cgroupfs
Environment=KUBELET_CGROUP_ARGS=--cgroup-driver=cgroupfs
Environment=KUBELET_CGROUP_ARGS=--cgroup-driver=cgroupfs
Environment=KUBELET_CGROUP_ARGS=--cgroup-driver=cgroupfs
[root@jumpserver json]# ansible test_k8s_cluster -m shell -a "systemctl daemon-reload"

5、下载kubernetes镜像

查看需要的镜像

[root@kubernetes-master01 ~]# kubeadm config images list
I0407 16:34:47.889136    8134 version.go:251] remote version is much newer: v1.18.0; falling back to: stable-1.16
k8s.gcr.io/kube-apiserver:v1.16.8
k8s.gcr.io/kube-controller-manager:v1.16.8
k8s.gcr.io/kube-scheduler:v1.16.8
k8s.gcr.io/kube-proxy:v1.16.8
k8s.gcr.io/pause:3.1
k8s.gcr.io/etcd:3.3.15-0
k8s.gcr.io/coredns:1.6.2
[root@jumpserver json]# ansible test_k8s_cluster -m shell -a 'images=(kube-apiserver:v1.16.8 kube-controller-manager:v1.16.8 kube-scheduler:v1.16.8 kube-proxy:v1.16.8 pause:3.1 coredns:1.6.2);\
for image in ${images[@]};do \
docker pull registry.cn-beijing.aliyuncs.com/citylife-k8s/${image};\
docker tag registry.cn-beijing.aliyuncs.com/citylife-k8s/${image} k8s.gcr.io/${image};\
docker image rm registry.cn-beijing.aliyuncs.com/citylife-k8s/${image}; done'

[root@jumpserver json]# ansible test_k8s_cluster -m shell -a 'docker images'
192.168.110.107 | CHANGED | rc=0 >>
REPOSITORY                           TAG                 IMAGE ID            CREATED             SIZE
k8s.gcr.io/kube-apiserver            v1.16.8             48db9392345b        3 weeks ago         160MB
k8s.gcr.io/kube-controller-manager            v1.16.8             01aec835c89f        3 weeks ago         151MB
k8s.gcr.io/kube-scheduler            v1.16.8             133a50b2b327        3 weeks ago         83.6MB
k8s.gcr.io/kube-proxy              v1.16.8             3b8ffbdbcca3        3 weeks ago         82.8MB
k8s.gcr.io/coredns                1.6.2              bf261d157914        7 months ago        44.1MB
k8s.gcr.io/pause                 3.1               da86e6ba6ca1        2 years ago         742kB

192.168.110.106 | CHANGED | rc=0 >>
REPOSITORY                           TAG                 IMAGE ID            CREATED             SIZE
k8s.gcr.io/kube-apiserver            v1.16.8             48db9392345b        3 weeks ago         160MB
k8s.gcr.io/kube-controller-manager            v1.16.8             01aec835c89f        3 weeks ago         151MB
k8s.gcr.io/kube-proxy              v1.16.8             3b8ffbdbcca3        3 weeks ago         82.8MB
k8s.gcr.io/kube-scheduler            v1.16.8             133a50b2b327        3 weeks ago         83.6MB
k8s.gcr.io/coredns                1.6.2              bf261d157914        7 months ago        44.1MB
k8s.gcr.io/pause                 3.1               da86e6ba6ca1        2 years ago         742kB
...
...

6、初始化Master

以下操作在3台master上操作

6.1、将etcd生成的证书拷贝到master节点(master需要和etcd集群通信,故需要证书文件)

[root@jumpserver create_cert]# ansible test_k8s_master -m copy -a "src=./cert dest=/application/kubernetes/pki/"
[root@jumpserver create_cert]# ansible test_k8s_master -m shell -a "mv /application/kubernetes/pki/cert/* /application/kubernetes/pki/"
[root@jumpserver create_cert]# ansible test_k8s_master -m shell -a "rm -rf /application/kubernetes/pki/cert"

6.2、编辑初始化配置文件

[root@jumpserver json]# cat init_kubernetes.yaml
apiVersion: kubeadm.k8s.io/v1beta2
kind: InitConfiguration
localAPIEndpoint:
  advertiseAddress: 192.168.110.128 #配置为本机的IP
  bindPort: 6443
nodeRegistration:
  criSocket: /var/run/dockershim.sock
  name: kubernetes-master01 #配置为本机主机名
  taints:
  - effect: NoSchedule
    key: node-role.kubernetes.io/master
---
apiServer:
  certSANs:
  #下面写入apiserver证书中master所有节点的IP和主机名也包括负载均衡的虚拟IP
  - kubernetes-master01
  - kubernetes-master02
  - kubernetes-master03
  - 192.168.110.128
  - 192.168.110.130
  - 192.168.110.131
  - 192.168.110.230
  - 192.168.110.104
  - 192.168.110.105
  timeoutForControlPlane: 4m0s
apiVersion: kubeadm.k8s.io/v1beta2
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes #集群的名称
controllerManager: {}
dns:
  type: CoreDNS
etcd:
  external:
    endpoints:
    #外部ETCD集群的负载均衡虚拟IP
    - https://192.168.110.230:2379
    #访问ETCD集群的所有证书文件路径
    caFile: /application/kubernetes/pki/ca.pem
    certFile: /application/kubernetes/pki/client.pem
    keyFile: /application/kubernetes/pki/client-key.pem
imageRepository: k8s.gcr.io
kind: ClusterConfiguration
kubernetesVersion: v1.16.8 #K8S镜像版本
networking:
  dnsDomain: cluster.local
  serviceSubnet: 10.96.0.0/12
  podSubnet: 10.244.0.0/16 #POD的默认IP地址段匹配flannel网络
scheduler: {}

6.3、将配置文件分发到master节点

[root@jumpserver json]# ansible test_k8s_master -m copy -a "src=./init_kubernetes.yaml dest=/application/kubernetes/"

6.4、初始化第一台master

[root@kubernetes-master01 kubernetes]# kubeadm init --config=./init_kubernetes.yaml
...
...
[bootstrap-token] Using token: 83k4wo.uirwmdd3tajnkg23
[bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles
[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstrap-token] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstrap-token] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[bootstrap-token] Creating the "cluster-info" ConfigMap in the "kube-public" namespace
[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxy

Your Kubernetes control-plane has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  https://kubernetes.io/docs/concepts/cluster-administration/addons/

Then you can join any number of worker nodes by running the following on each as root:

kubeadm join 192.168.110.128:6443 --token 83k4wo.uirwmdd3tajnkg23 \
    --discovery-token-ca-cert-hash sha256:424baf048926a2a42ea651443ff3803da3366135f76b75aa843970abc1bf009e

6.4.1、创建kubectl终端配置文件

[root@kubernetes-master01 kubernetes]# mkdir -p $HOME/.kube
[root@kubernetes-master01 kubernetes]# sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
cp:是否覆盖"/root/.kube/config"? y
[root@kubernetes-master01 kubernetes]# sudo chown $(id -u):$(id -g) $HOME/.kube/config

6.4.2、将kubectl配置文件的apiserver地址改成负载地址

[root@kubernetes-master01 kubernetes]# vim /root/.kube/config 

apiVersion: v1
clusters:
- cluster:
    certificate-authority-data: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUN5RENDQWJDZ0F3SUJBZ0lCQURBTkJna3Foa2lHOXcwQkFRc0ZBREFWTVJNd0VRWURWUVFERXdwcmRXSmwKY201bGRHVnpNQjRYRFRJd01EUXdPREE0TWprek9Wb1hEVE13TURRd05qQTRNamt6T1Zvd0ZURVRNQkVHQTFVRQpBeE1LYTNWaVpYSnVaWFJsY3pDQ0FTSXdEUVlKS29aSWh2Y05BUUVCQlFBRGdnRVBBRENDQVFvQ2dnRUJBT1ZhClZpYU9WZTlKZmpTTHpuV08rVld6bmFLMFpUYnpCWk9HZ0hTRWNENitYZTFBODhKVHR5aCtWSGtGd1M2dFk2bU8KQXl1TnRibUhMaWlnREVlWUREbUlyMUtVa24wdFg0RmV6TzFaYzIzV1NwMG91Wnprbk1XQXVxTmYzRHBhL0x2Rgovc3VsMzd1RkJHOW1KVitJUFQ5MUNxSFRlbHpmMnhXUlhBVFhBYWhKL2xCSzdQaXVORzdvVW5OR2xVOEpSZFRvClBjQ1ozcytpSEFUamxSSWE3ajdvUmVzYUgxUUxvSFRmeVFEbWdWNnl3TzljTVc4VEt5a0FKbk53MDcyR3gra2YKSGJmMnlPUm5ieXYzNVRDeW9nQk8rYkM5OEpCNFVGQ1JvbFNYS1QxN25oR2pDVkd1cTNlV05TRnBQWWozWHZNdgo5N3VEelRyTk9nK1dPSitzL29jQ0F3RUFBYU1qTUNFd0RnWURWUjBQQVFIL0JBUURBZ0trTUE4R0ExVWRFd0VCCi93UUZNQU1CQWY4d0RRWUpLb1pJaHZjTkFRRUxCUUFEZ2dFQkFMaGJPZ2kvWjRaLytuMC93Y3ltRktQQzI0SHoKUGtBcUIrNzd4ZmVWQUEvWCtzMFA2WThhZnVyb0oyL2VZY3NHbHhHUG15NGd5ell2TkhvTi9SVjNHc1dhVS9yZwpaK2xwZm0yOWJEUjNodXEyQnArNHl4czRnKzl5N1JrOGNRYzlWQlFmZmJhblk3N1kzclIzNGJFZ2FlL1FjbVd3CitLbFdrdFJKUDIrNTU3Vjl0VjdwRnBwbjVjekZqTE9xMXhaaUhObmRQRVhSNVNiZk9yQVFkbkRIVThrSG1BV1kKWU8zYjBjYk9yL05CeG9zVTNqUnRyK01oTE5SWDQ3OTdxcXN1bmNxbWF5VGErYjBlNy8wTU5mQS8vZEZsL0s0bgoyaUhmN2wzbHkzSUVaQUNaOW1RaFBNME9QN2dwa1pKTjhSbXJYS21sTzYzak1QdWl2Nmc5Rk95SGdUOD0KLS0tLS1FTkQgQ0VSVElGSUNBVEUtLS0tLQo=
    server: https://192.168.110.230:6443
  name: kubernetes
contexts:
- context:
    cluster: kubernetes
    user: kubernetes-admin
  name: kubernetes-admin@kubernetes
current-context: kubernetes-admin@kubernetes

6.4.3、安装flannel网络插件

[root@kubernetes-master01 kubernetes]# wget https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
[root@kubernetes-master01 kubernetes]# kubectl apply -f kube-flannel.yml
podsecuritypolicy.policy/psp.flannel.unprivileged created
clusterrole.rbac.authorization.k8s.io/flannel created
clusterrolebinding.rbac.authorization.k8s.io/flannel created
serviceaccount/flannel created
configmap/kube-flannel-cfg created
daemonset.apps/kube-flannel-ds-amd64 created
daemonset.apps/kube-flannel-ds-arm64 created
daemonset.apps/kube-flannel-ds-arm created
daemonset.apps/kube-flannel-ds-ppc64le created
daemonset.apps/kube-flannel-ds-s390x created

6.4.4、安装完后查看k8s的pod状态

[root@kubernetes-master01 kubernetes]# kubectl get pods --all-namespaces
NAMESPACE     NAME                                          READY   STATUS    RESTARTS   AGE
kube-system   coredns-5644d7b6d9-sxhm2                      1/1     Running   0          14m
kube-system   coredns-5644d7b6d9-v67nr                      1/1     Running   0          14m
kube-system   kube-apiserver-kubernetes-master01            1/1     Running   0          13m
kube-system   kube-controller-manager-kubernetes-master01   1/1     Running   0          12m
kube-system   kube-flannel-ds-amd64-zr82z                   1/1     Running   0          72s
kube-system   kube-proxy-hgsxr                              1/1     Running   0          14m
kube-system   kube-scheduler-kubernetes-master01            1/1     Running   0          13m
[root@kubernetes-master01 kubernetes]#
[root@kubernetes-master01 kubernetes]# kubectl get nodes
NAME                  STATUS   ROLES    AGE   VERSION
kubernetes-master01   Ready    master   15m   v1.16.8

说明:pod全部为运行状态,且node为Ready,表示第一台master初始化工作完成

6.4.5、将master01生成的证书拷贝到另外两台master上

[root@kubernetes-master01 kubernetes]# scp /etc/kubernetes/pki/* 192.168.110.130:/etc/kubernetes/pki/                                                                                                                                                                                    100%  451   408.5KB/s   00:00    
[root@kubernetes-master01 kubernetes]# 
[root@kubernetes-master01 kubernetes]# scp /etc/kubernetes/pki/* 192.168.110.131:/etc/kubernetes/pki/

6.5、初始化第二台master

由于之前拷贝的初始化配置文件是master01节点的,所以要将advertiseAddress地址改成master02的地址以及name改成master02的主机名

[root@kubernetes-master02 ~]# vim /application/kubernetes/init_kubernetes.yaml 

apiVersion: kubeadm.k8s.io/v1beta2
kind: InitConfiguration
localAPIEndpoint:
  advertiseAddress: 192.168.110.130 #配置为本机的IP
  bindPort: 6443
nodeRegistration:
  criSocket: /var/run/dockershim.sock
  name: kubernetes-master02 #配置为本机主机名
  taints:
  - effect: NoSchedule
    key: node-role.kubernetes.io/master
...
...

6.5.1、初始化

[root@kubernetes-master02 ~]# kubeadm init --config=/application/kubernetes/init_kubernetes.yaml
...
...
[bootstrap-token] Using token: 7d99tg.l61o827mk9oqz6qu
[bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles
[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstrap-token] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstrap-token] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[bootstrap-token] Creating the "cluster-info" ConfigMap in the "kube-public" namespace
[kubelet-check] Initial timeout of 40s passed.
[addons]: Migrating CoreDNS Corefile
[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxy

Your Kubernetes control-plane has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  https://kubernetes.io/docs/concepts/cluster-administration/addons/

Then you can join any number of worker nodes by running the following on each as root:

kubeadm join 192.168.110.130:6443 --token 7d99tg.l61o827mk9oqz6qu \
    --discovery-token-ca-cert-hash sha256:424baf048926a2a42ea651443ff3803da3366135f76b75aa843970abc1bf009e

6.5.2、创建kubectl配置文件

[root@kubernetes-master02 ~]# mkdir -p $HOME/.kube
[root@kubernetes-master02 ~]# sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
[root@kubernetes-master02 ~]# sudo chown $(id -u):$(id -g) $HOME/.kube/config

6.5.3、修改apiserver地址为nginx负载地址

[root@kubernetes-master02 ~]# vim .kube/config 

apiVersion: v1
clusters:
- cluster:
    certificate-authority-data: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUN5RENDQWJDZ0F3SUJBZ0lCQURBTkJna3Foa2lHOXcwQkFRc0ZBREFWTVJNd0VRWURWUVFERXdwcmRXSmwKY201bGRHVnpNQjRYRFRJd01EUXdPREE1TURFd05Wb1hEVE13TURRd05qQTVNREV3TlZvd0ZURVRNQkVHQTFVRQpBeE1LYTNWaVpYSnVaWFJsY3pDQ0FTSXdEUVlKS29aSWh2Y05BUUVCQlFBRGdnRVBBRENDQVFvQ2dnRUJBTTh6CnJPZ1JNbmwyWUhTRXFkQmdyOXhqZWpGNEYxUGhnNlRxTC9OVlNCSDBCSVhHMjRXTUczS3lLb1o5US9QdElXTHEKUmZjdklVaVcwR1ZSTSttSGJJN1IxTWNRa3ovY2lPZTh2Rm5JV2RDTVhGNmtnenRTSVhiTnJORGZrOTMxQWgvbApjWW55bEYwaU1Id1FrOHl4V1QvTWh2dEh3QVR3MTJFM25NZTBrd1plY2puYnpYSGpqVmJZYVpZUHhEc2pWM3I3CmFIYVo0d0llZ2tSVTRYYi9VbkFGcE9TVkhFRnExMGltNGpwNGtaV3dzckhRenRkYVhhelBWU0hEdnM1bUFkeDgKcjcybGhhQnZyUytGeGY4elI5UTdncGtzeW5zMUZqNmpmSTFlR1Fjamt3eEkwdTU3TUVoOVZDbWFVSWRCU1g2UAo1S1JGVkVQRS94azllMWg0RlNzQ0F3RUFBYU1qTUNFd0RnWURWUjBQQVFIL0JBUURBZ0trTUE4R0ExVWRFd0VCCi93UUZNQU1CQWY4d0RRWUpLb1pJaHZjTkFRRUxCUUFEZ2dFQkFFdDVwNjVzZm5VUWdGaUJDN043RjVLSmZvaFkKZCs3WGxUalByYUVhT0x4SFIvMTJYdmRRVStGTmZxUVU4WTNqZVU2SEtiYjBjMEVYeFA3SzBPbGZoV0N3SHpJOApmbHJYR2hRUEt3eDN5V1haUktlV1N5b2kvdW15eFRrUytMQTBBaUVWYURETlBBVDM0NHdoZmlRYjJpS0NSZ2x3ClRCVFBhTlVmNURpeXZrQ0lLaDBqYkpjazMyYVFoYzVXbU94c05yR0ZuZFBqRm83d0IyM3hWOVVFNGlFY0hNN2UKOUZ4VWdZb0NZbVJjdVEzNkpZYi9HTkVxUTNaTjdUYzg4UWlua3J0RHpac211dUVHR3ZGKzVBMWNZRTNSYnVhZgpFU2xLUlVqRjRvaE81dUV5Rjc2ZlVzdS9rb2NadDRiTU9aNUFwYTlPWVNYbVRZbVRXWVR3dm9vYkd6cz0KLS0tLS1FTkQgQ0VSVElGSUNBVEUtLS0tLQo=
    server: https://192.168.110.230:6443
  name: kubernetes
contexts:

6.5.4、查看node和pod信息

[root@kubernetes-master02 ~]# kubectl get nodes
NAME                  STATUS   ROLES    AGE   VERSION
kubernetes-master01   Ready    master   79m   v1.16.8
kubernetes-master02   Ready    master   17m   v1.16.8
[root@kubernetes-master02 ~]# 
[root@kubernetes-master02 ~]# kubectl get pods -A -o wide
NAMESPACE     NAME                                          READY   STATUS    RESTARTS   AGE   IP                NODE                  NOMINATED NODE   READINESS GATES
kube-system   coredns-5644d7b6d9-4p799                      1/1     Running   0          47m   10.244.0.6        kubernetes-master01   <none>           <none>
kube-system   coredns-5644d7b6d9-rzvcn                      1/1     Running   0          11m   10.244.0.14       kubernetes-master01   <none>           <none>
kube-system   kube-apiserver-kubernetes-master01            1/1     Running   0          18m   192.168.110.128   kubernetes-master01   <none>           <none>
kube-system   kube-apiserver-kubernetes-master02            1/1     Running   0          10m   192.168.110.130   kubernetes-master02   <none>           <none>
kube-system   kube-controller-manager-kubernetes-master01   1/1     Running   0          78m   192.168.110.128   kubernetes-master01   <none>           <none>
kube-system   kube-controller-manager-kubernetes-master02   1/1     Running   0          10m   192.168.110.130   kubernetes-master02   <none>           <none>
kube-system   kube-flannel-ds-amd64-5jsd9                   1/1     Running   0          17m   192.168.110.130   kubernetes-master02   <none>           <none>
kube-system   kube-flannel-ds-amd64-zr82z                   1/1     Running   0          66m   192.168.110.128   kubernetes-master01   <none>           <none>
kube-system   kube-proxy-hgsxr                              1/1     Running   0          79m   192.168.110.128   kubernetes-master01   <none>           <none>
kube-system   kube-proxy-txmxc                              1/1     Running   0          17m   192.168.110.130   kubernetes-master02   <none>           <none>
kube-system   kube-scheduler-kubernetes-master01            1/1     Running   0          78m   192.168.110.128   kubernetes-master01   <none>           <none>
kube-system   kube-scheduler-kubernetes-master02            1/1     Running   0          10m   192.168.110.130   kubernetes-master02   <none>           <none>

6.5.5、初始化错误汇总:

6.5.5.1、报错:初始化第二台master成功,看不到master2上的pod,get nodes也没有master02

解决步骤:

  • 查看初始化配置文件,由于配置文件是从master1上拷贝过来的,name字段没有改,使用的还是master01的主机名导致

6.5.5.2、报错:初始化第二台master成功,get node已经可以看到master02,但是get pods看不到master02的pod

解决步骤:

  • 由于上一次错误,执行reset后,系统自动将证书文件清除导致。在每一次要执行初始化动作时,一定要记得重新拷贝master01的证书文件到其他master节点。/etc/kubernetes/pki/{sa.*,*ca.*}两种证书文件

6.5.5.3、报错:kubelet.go:1380] Failed to start ContainerManager failed to initialize top level QOS containers: failed to update top level Burstable QOS cgroup : failed to set supported cgroup subsystems for cgroup [kubepods burstable]: failed to find subsystem mount for required subsystem: pids

解决步骤:

  • 原因不知道,此错误需要修改文件/usr/lib/systemd/system/kubelet.service.d/10-kubeadm.conf,在最后一行的ExecStart的末尾加上如下代码。而后执行systemctl daemon-reload

--feature-gates SupportPodPidsLimit=false --feature-gates SupportNodePidsLimit=false

6.5.5.4、报错:cni.go:237] Unable to update cni config: no networks found in /etc/cni/net.d

解决步骤:

  • 修改文件/usr/lib/systemd/system/kubelet.service.d/10-kubeadm.conf,添加一行Environment,而后执行systemctl daemon-reload

Environment="KUBELET_NETWORK_ARGS=--network-plugin=cni --cni-conf-dir=/etc/cni/ --cni-bin-dir=/opt/cni/bin"

6.5.5.5、安装nginx-ingress时发现,bankend的pod始终无法ready,频繁重启,于是乎开始了大半天的排查之路

解决步骤:

  • 查看pod信息

[root@kubernetes-master01 ~]# kubectl describe pods nginx-ingress-nginx-ingress-controller-defaultbackend-595bqdjkf
...
...
Events:
  Type     Reason     Age                    From                        Message
  ----     ------     ----                   ----                        -------
  Normal   Scheduled  5m6s                   default-scheduler           Successfully assigned default/nginx-ingress-nginx-ingress-controller-defaultbackend-595bqdjkf to kubernetes-node02
  Normal   Pulled     4m13s (x2 over 5m12s)  kubelet, kubernetes-node02  Container image "registry.cn-hangzhou.aliyuncs.com/link-cloud/nginx-ingress-controller-defaultbackend:1.5" already present on machine
  Normal   Created    4m13s (x2 over 5m12s)  kubelet, kubernetes-node02  Created container nginx-ingress-nginx-ingress-controller-defaultbackend
  Normal   Started    4m13s (x2 over 5m12s)  kubelet, kubernetes-node02  Started container nginx-ingress-nginx-ingress-controller-defaultbackend
  Warning  Unhealthy  4m13s (x3 over 4m33s)  kubelet, kubernetes-node02  Liveness probe failed: Get http://172.17.0.2:8080/healthz: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
  Normal   Killing    4m13s                  kubelet, kubernetes-node02  Container nginx-ingress-nginx-ingress-controller-defaultbackend failed liveness probe, will be restarted
  Warning  Unhealthy  3m51s (x15 over 5m6s)  kubelet, kubernetes-node02  Readiness probe failed: Get http://172.17.0.2:8080/healthz: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)

导致pod无法ready的原因是就绪性检查和存货性检查失败,超时了。奇怪的是kubelet为容器分配的地址不是flannel的网段地址,而是docker网桥的地址段

  • 查看对应worker节点的网络信息

[root@kubernetes-node02 ~]# ip addr
...
...
3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN 
    link/ether 02:42:9b:f2:86:56 brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
       valid_lft forever preferred_lft forever
    inet6 fe80::42:9bff:fef2:8656/64 scope link 
       valid_lft forever preferred_lft forever
4: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN 
    link/ether 92:a9:06:11:33:8a brd ff:ff:ff:ff:ff:ff
    inet 10.244.9.0/32 scope global flannel.1
       valid_lft forever preferred_lft forever
    inet6 fe80::90a9:6ff:fe11:338a/64 scope link 
       valid_lft forever preferred_lft forever
7: cni0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP 
    link/ether 6a:9e:15:0d:5b:c6 brd ff:ff:ff:ff:ff:ff
    inet 10.244.9.1/24 scope global cni0
       valid_lft forever preferred_lft forever
    inet6 fe80::d8ea:a1ff:fe70:e3a4/64 scope link 
       valid_lft forever preferred_lft forever
8: veth845222bf@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue master cni0 state UP 
    link/ether da:ea:a1:70:e3:a4 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet6 fe80::d8ea:a1ff:fe70:e3a4/64 scope link 
       valid_lft forever preferred_lft forever

查看不出有什么异常

Apr 10 11:27:39 kubernetes-node02 kubelet: I0410 11:27:39.127052   10847 reconciler.go:207] operationExecutor.VerifyControllerAttachedVolume started for volume "nginx-ingress-backend-token-s49ls" (UniqueName: "kubernetes.io/secret/a3e78975-5b02-4e7c-bbb0-04ebabaaef75-nginx-ingress-backend-token-s49ls") pod "nginx-ingress-nginx-ingress-controller-defaultbackend-595bf5thf" (UID: "a3e78975-5b02-4e7c-bbb0-04ebabaaef75")
Apr 10 11:27:39 kubernetes-node02 systemd: Started Kubernetes transient mount for /var/lib/kubelet/pods/a3e78975-5b02-4e7c-bbb0-04ebabaaef75/volumes/kubernetes.io~secret/nginx-ingress-backend-token-s49ls.
Apr 10 11:27:39 kubernetes-node02 kernel: device veth13f53d0 entered promiscuous mode
Apr 10 11:27:39 kubernetes-node02 kernel: IPv6: ADDRCONF(NETDEV_UP): veth13f53d0: link is not ready
Apr 10 11:27:39 kubernetes-node02 NetworkManager[833]: <warn>  (veth3b3366d): failed to find device 23 'veth3b3366d' with udev
Apr 10 11:27:39 kubernetes-node02 NetworkManager[833]: <info>  (veth3b3366d): new Veth device (carrier: OFF, driver: 'veth', ifindex: 23)
Apr 10 11:27:39 kubernetes-node02 NetworkManager[833]: <warn>  (veth13f53d0): failed to find device 24 'veth13f53d0' with udev
Apr 10 11:27:39 kubernetes-node02 NetworkManager[833]: <info>  (veth13f53d0): new Ethernet device (carrier: OFF, driver: 'veth', ifindex: 24)
Apr 10 11:27:39 kubernetes-node02 NetworkManager[833]: <info>  (docker0): bridge port veth13f53d0 was attached
Apr 10 11:27:39 kubernetes-node02 NetworkManager[833]: <info>  (veth13f53d0): enslaved to docker0
Apr 10 11:27:39 kubernetes-node02 containerd: time="2020-04-10T11:27:39.351258658+08:00" level=info msg="shim containerd-shim started" address="/containerd-shim/moby/68e5a4552df200a0de18035914f4d09db773d8bc37e9215278e95f40aba11d1f/shim.sock" debug=false pid=15899
Apr 10 11:27:39 kubernetes-node02 kernel: IPVS: Creating netns size=2040 id=10
Apr 10 11:27:39 kubernetes-node02 NetworkManager[833]: <warn>  (veth3b3366d): failed to disable userspace IPv6LL address handling
Apr 10 11:27:39 kubernetes-node02 NetworkManager[833]: <info>  (veth13f53d0): link connected
Apr 10 11:27:39 kubernetes-node02 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): veth13f53d0: link becomes ready
Apr 10 11:27:39 kubernetes-node02 kubelet: W0410 11:27:39.612818   10847 docker_sandbox.go:394] failed to read pod IP from plugin/docker: Couldn't find network status for default/nginx-ingress-nginx-ingress-controller-defaultbackend-595bf5thf through plugin: invalid network status for
Apr 10 11:27:39 kubernetes-node02 containerd: time="2020-04-10T11:27:39.664664106+08:00" level=info msg="shim containerd-shim started" address="/containerd-shim/moby/2444548c4c6b0df1186673410e438126af28d7a050f7db377db3b2b03537d5d1/shim.sock" debug=false pid=15956
Apr 10 11:27:39 kubernetes-node02 kubelet: W0410 11:27:39.803031   10847 docker_sandbox.go:394] failed to read pod IP from plugin/docker: Couldn't find network status for default/nginx-ingress-nginx-ingress-controller-defaultbackend-595bf5thf through plugin: invalid network status for

每次有pod分配到此节点时,都会有此信息,于是仔细看了几遍,发现异常信息为

Apr 10 11:27:39 kubernetes-node02 NetworkManager[833]: <warn>  (veth3b3366d): failed to find device 23 'veth3b3366d' with udev
Apr 10 11:27:39 kubernetes-node02 NetworkManager[833]: <info>  (veth3b3366d): new Veth device (carrier: OFF, driver: 'veth', ifindex: 23)
Apr 10 11:27:39 kubernetes-node02 NetworkManager[833]: <warn>  (veth13f53d0): failed to find device 24 'veth13f53d0' with udev
Apr 10 11:27:39 kubernetes-node02 NetworkManager[833]: <info>  (veth13f53d0): new Ethernet device (carrier: OFF, driver: 'veth', ifindex: 24)
Apr 10 11:27:39 kubernetes-node02 NetworkManager[833]: <info>  (docker0): bridge port veth13f53d0 was attached
Apr 10 11:27:39 kubernetes-node02 NetworkManager[833]: <info>  (veth13f53d0): enslaved to docker0

kubelet的网络管理插件在主机上创建veth网卡设备失败“carrier是什么东西我也没搞明白,反正就是看着不正常,没有创建成功”。下面继续去创建了另一个网卡设备“veth13f53d0”是通过docker0创建的“enslaved to docker0”于是取了一个正常节点的日志做比较

Apr 10 11:28:57 kubernetes-node01 NetworkManager[842]: <warn>  (vethc61f1092): failed to find device 13 'vethc61f1092' with udev
Apr 10 11:28:57 kubernetes-node01 NetworkManager[842]: <info>  (vethc61f1092): link connected
Apr 10 11:28:57 kubernetes-node01 NetworkManager[842]: <info>  (vethc61f1092): new Veth device (carrier: ON, driver: 'veth', ifindex: 13)
Apr 10 11:28:57 kubernetes-node01 kernel: device vethc61f1092 entered promiscuous mode
Apr 10 11:28:57 kubernetes-node01 kernel: cni0: port 1(vethc61f1092) entered forwarding state
Apr 10 11:28:57 kubernetes-node01 kernel: cni0: port 1(vethc61f1092) entered forwarding state
Apr 10 11:28:57 kubernetes-node01 NetworkManager[842]: <info>  (cni0): bridge port vethc61f1092 was attached
Apr 10 11:28:57 kubernetes-node01 NetworkManager[842]: <info>  (vethc61f1092): enslaved to cni0
Apr 10 11:28:57 kubernetes-node01 NetworkManager[842]: <info>  (cni0): link connected
Apr 10 11:28:57 kubernetes-node01 NetworkManager[842]: <info>  ifcfg-rh: add connection in-memory (6c9188a6-1d5e-4b5f-bfc1-1f433a9f998d,"vethc61f1092")

问题很明显了,猜想应该是通过flannel去创建veth设备失败了,关键在那个“carrier:OFF”也没整明白这个是干嘛的。最后就通过docker0网桥创建了一个veth的网卡设备。

  • 查看CNI的配置与安装的版本

    1、查看是否安装了cni插件

[root@kubernetes-node02 ~]# yum list installed |grep cni
kubernetes-cni.x86_64                0.7.5-0                         @kubernetes

    2、查看cni插件的目录和配置文件

[root@kubernetes-node02 ~]# cat /etc/cni/net.d/10-flannel.conflist 
{
  "name": "cbr0",
  "cniVersion": "0.3.1",
  "plugins": [
    {
      "type": "flannel",
      "delegate": {
        "hairpinMode": true,
        "isDefaultGateway": true
      }
    },
    {
      "type": "portmap",
      "capabilities": {
        "portMappings": true
      }
    }
  ]
}
[root@kubernetes-node02 ~]# 
[root@kubernetes-node02 ~]# 
[root@kubernetes-node02 ~]# ll /opt/cni/bin/ |wc -l
14

和正常worker节点比对了一下没有发现异常

  • 继续查看kubelet的启动参数文件

[root@kubernetes-node02 ~]# cat /usr/lib/systemd/system/kubelet.service.d/10-kubeadm.conf 
# Note: This dropin only works with kubeadm and kubelet v1.11+
[Service]
Environment="KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf"
Environment="KUBELET_CONFIG_ARGS=--config=/var/lib/kubelet/config.yaml"
Environment="KUBELET_CGROUP_ARGS=--cgroup-driver=cgroupfs"
Environment="KUBELET_NETWORK_ARGS=--network-plugin=cni --cni-conf-dir=/etc/cni/ --cni-bin-dir=/opt/cni/bin"
# This is a file that "kubeadm init" and "kubeadm join" generates at runtime, populating the KUBELET_KUBEADM_ARGS variable dynamically
EnvironmentFile="-/var/lib/kubelet/kubeadm-flags.env"
# This is a file that the user can use for overrides of the kubelet args as a last resort. Preferably, the user should use
# the .NodeRegistration.KubeletExtraArgs object in the configuration files instead. KUBELET_EXTRA_ARGS should be sourced from this file.
EnvironmentFile="-/etc/sysconfig/kubelet"
ExecStart=
ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS --feature-gates SupportPodPidsLimit=false --feature-gates SupportNodePidsLimit=false

反复比较了好几遍,最后发现了问题所在。。。

EnvironmentFile="-/var/lib/kubelet/kubeadm-flags.env"
EnvironmentFile="-/etc/sysconfig/kubelet"

我也不知道自己什么时候给这两行加上了引号。于是删除双引号并重启kubelet服务

[root@kubernetes-node02 ~]# systemctl restart kubelet
[root@kubernetes-node02 ~]#

再次创建pod后观察日志正常了,pod也可以顺利ready了

Apr 10 14:11:11 kubernetes-node02 NetworkManager[853]: <info>  (cni0): device state change: unmanaged -> unavailable (reason 'connection-assumed') [10 20 41]
Apr 10 14:11:11 kubernetes-node02 NetworkManager[853]: <info>  (cni0): device state change: unavailable -> disconnected (reason 'none') [20 30 0]
Apr 10 14:11:11 kubernetes-node02 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
Apr 10 14:11:11 kubernetes-node02 NetworkManager[853]: <warn>  (veth845222bf): failed to find device 8 'veth845222bf' with udev
Apr 10 14:11:11 kubernetes-node02 NetworkManager[853]: <info>  (veth845222bf): link connected
Apr 10 14:11:11 kubernetes-node02 NetworkManager[853]: <info>  (veth845222bf): new Veth device (carrier: ON, driver: 'veth', ifindex: 8)
Apr 10 14:11:11 kubernetes-node02 kernel: device veth845222bf entered promiscuous mode
Apr 10 14:11:11 kubernetes-node02 kernel: cni0: port 1(veth845222bf) entered forwarding state
Apr 10 14:11:11 kubernetes-node02 kernel: cni0: port 1(veth845222bf) entered forwarding state
Apr 10 14:11:11 kubernetes-node02 NetworkManager[853]: <info>  (cni0): bridge port veth845222bf was attached
Apr 10 14:11:11 kubernetes-node02 NetworkManager[853]: <info>  (veth845222bf): enslaved to cni0
Apr 10 14:11:11 kubernetes-node02 NetworkManager[853]: <info>  (cni0): link connected
[root@kubernetes-node02 ~]#

[root@kubernetes-master01 ~]# kubectl get pods -owide
NAME                                                              READY   STATUS    RESTARTS   AGE   IP                NODE                NOMINATED NODE   READINESS GATES
nginx-ingress-controller-74k75                                    1/1     Running   1          19h   192.168.110.108   kubernetes-node03   <none>           <none>
nginx-ingress-controller-jbb5n                                    1/1     Running   0          19h   192.168.110.106   kubernetes-node01   <none>           <none>
nginx-ingress-controller-xnffd                                    1/1     Running   1          19h   192.168.110.107   kubernetes-node02   <none>           <none>
nginx-ingress-nginx-ingress-controller-defaultbackend-595bwzn6w                                    1/1     Running   0          19s   10.244.9.2      kubernetes-node02   <none>           <none>

6.6、初始化第三台master

6.6.1、更改初始化配置文件的主机地址和主机名字段

[root@kubernetes-master03 ~]# vim /application/kubernetes/init_kubernetes.yaml 

apiVersion: kubeadm.k8s.io/v1beta2
kind: InitConfiguration
localAPIEndpoint:
  advertiseAddress: 192.168.110.131 #配置为本机的IP
  bindPort: 6443
nodeRegistration:
  criSocket: /var/run/dockershim.sock
  name: kubernetes-master03 #配置为本机主机名
  taints:
  - effect: NoSchedule
    key: node-role.kubernetes.io/master
---

6.6.2、初始化

[root@kubernetes-master03 ~]# kubeadm init --config=/application/kubernetes/init_kubernetes.yaml
...
...
[bootstrap-token] Using token: wzsclp.2puihpckhg9otv4n
[bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles
[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstrap-token] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstrap-token] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[bootstrap-token] Creating the "cluster-info" ConfigMap in the "kube-public" namespace
[addons]: Migrating CoreDNS Corefile
[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxy

Your Kubernetes control-plane has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  https://kubernetes.io/docs/concepts/cluster-administration/addons/

Then you can join any number of worker nodes by running the following on each as root:

kubeadm join 192.168.110.131:6443 --token wzsclp.2puihpckhg9otv4n \
    --discovery-token-ca-cert-hash sha256:424baf048926a2a42ea651443ff3803da3366135f76b75aa843970abc1bf009e

6.6.3、创建kubectl配置文件

[root@kubernetes-master03 ~]# mkdir -p $HOME/.kube
[root@kubernetes-master03 ~]# sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
[root@kubernetes-master03 ~]# sudo chown $(id -u):$(id -g) $HOME/.kube/config

6.6.4、修改apiserver地址为nginx负载地址

[root@kubernetes-master03 ~]# vim .kube/config 

apiVersion: v1
clusters:
- cluster:
    certificate-authority-data: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUN5RENDQWJDZ0F3SUJBZ0lCQURBTkJna3Foa2lHOXcwQkFRc0ZBREFWTVJNd0VRWURWUVFERXdwcmRXSmwKY201bGRHVnpNQjRYRFRJd01EUXdPREE0TWprek9Wb1hEVE13TURRd05qQTRNamt6T1Zvd0ZURVRNQkVHQTFVRQpBeE1LYTNWaVpYSnVaWFJsY3pDQ0FTSXdEUVlKS29aSWh2Y05BUUVCQlFBRGdnRVBBRENDQVFvQ2dnRUJBT1ZhClZpYU9WZTlKZmpTTHpuV08rVld6bmFLMFpUYnpCWk9HZ0hTRWNENitYZTFBODhKVHR5aCtWSGtGd1M2dFk2bU8KQXl1TnRibUhMaWlnREVlWUREbUlyMUtVa24wdFg0RmV6TzFaYzIzV1NwMG91Wnprbk1XQXVxTmYzRHBhL0x2Rgovc3VsMzd1RkJHOW1KVitJUFQ5MUNxSFRlbHpmMnhXUlhBVFhBYWhKL2xCSzdQaXVORzdvVW5OR2xVOEpSZFRvClBjQ1ozcytpSEFUamxSSWE3ajdvUmVzYUgxUUxvSFRmeVFEbWdWNnl3TzljTVc4VEt5a0FKbk53MDcyR3gra2YKSGJmMnlPUm5ieXYzNVRDeW9nQk8rYkM5OEpCNFVGQ1JvbFNYS1QxN25oR2pDVkd1cTNlV05TRnBQWWozWHZNdgo5N3VEelRyTk9nK1dPSitzL29jQ0F3RUFBYU1qTUNFd0RnWURWUjBQQVFIL0JBUURBZ0trTUE4R0ExVWRFd0VCCi93UUZNQU1CQWY4d0RRWUpLb1pJaHZjTkFRRUxCUUFEZ2dFQkFMaGJPZ2kvWjRaLytuMC93Y3ltRktQQzI0SHoKUGtBcUIrNzd4ZmVWQUEvWCtzMFA2WThhZnVyb0oyL2VZY3NHbHhHUG15NGd5ell2TkhvTi9SVjNHc1dhVS9yZwpaK2xwZm0yOWJEUjNodXEyQnArNHl4czRnKzl5N1JrOGNRYzlWQlFmZmJhblk3N1kzclIzNGJFZ2FlL1FjbVd3CitLbFdrdFJKUDIrNTU3Vjl0VjdwRnBwbjVjekZqTE9xMXhaaUhObmRQRVhSNVNiZk9yQVFkbkRIVThrSG1BV1kKWU8zYjBjYk9yL05CeG9zVTNqUnRyK01oTE5SWDQ3OTdxcXN1bmNxbWF5VGErYjBlNy8wTU5mQS8vZEZsL0s0bgoyaUhmN2wzbHkzSUVaQUNaOW1RaFBNME9QN2dwa1pKTjhSbXJYS21sTzYzak1QdWl2Nmc5Rk95SGdUOD0KLS0tLS1FTkQgQ0VSVElGSUNBVEUtLS0tLQo=
    server: https://192.168.110.230:6443
  name: kubernetes
contexts:

6.6.5、查看pod以及node状态

[root@kubernetes-master03 ~]# kubectl get nodes
NAME                  STATUS   ROLES    AGE    VERSION
kubernetes-master01   Ready    master   118m   v1.16.8
kubernetes-master02   Ready    master   56m    v1.16.8
kubernetes-master03   Ready    master   30m    v1.16.8
[root@kubernetes-master03 ~]# 
[root@kubernetes-master03 ~]# kubectl get pods -A -o wide
NAMESPACE     NAME                                          READY   STATUS    RESTARTS   AGE    IP                NODE                  NOMINATED NODE   READINESS GATES
kube-system   coredns-5644d7b6d9-4p799                      1/1     Running   0          87m    10.244.0.6        kubernetes-master01   <none>           <none>
kube-system   coredns-5644d7b6d9-s259f                      1/1     Running   0          71s    10.244.2.2        kubernetes-master03   <none>           <none>
kube-system   kube-apiserver-kubernetes-master01            1/1     Running   0          57m    192.168.110.128   kubernetes-master01   <none>           <none>
kube-system   kube-apiserver-kubernetes-master02            1/1     Running   0          50m    192.168.110.130   kubernetes-master02   <none>           <none>
kube-system   kube-apiserver-kubernetes-master03            1/1     Running   0          28m    192.168.110.131   kubernetes-master03   <none>           <none>
kube-system   kube-controller-manager-kubernetes-master01   1/1     Running   0          117m   192.168.110.128   kubernetes-master01   <none>           <none>
kube-system   kube-controller-manager-kubernetes-master02   1/1     Running   0          49m    192.168.110.130   kubernetes-master02   <none>           <none>
kube-system   kube-controller-manager-kubernetes-master03   1/1     Running   0          28m    192.168.110.131   kubernetes-master03   <none>           <none>
kube-system   kube-flannel-ds-amd64-58m8x                   1/1     Running   0          30m    192.168.110.131   kubernetes-master03   <none>           <none>
kube-system   kube-flannel-ds-amd64-5jsd9                   1/1     Running   0          56m    192.168.110.130   kubernetes-master02   <none>           <none>
kube-system   kube-flannel-ds-amd64-zr82z                   1/1     Running   0          105m   192.168.110.128   kubernetes-master01   <none>           <none>
kube-system   kube-proxy-hgsxr                              1/1     Running   0          118m   192.168.110.128   kubernetes-master01   <none>           <none>
kube-system   kube-proxy-lf9vn                              1/1     Running   0          30m    192.168.110.131   kubernetes-master03   <none>           <none>
kube-system   kube-proxy-txmxc                              1/1     Running   0          56m    192.168.110.130   kubernetes-master02   <none>           <none>
kube-system   kube-scheduler-kubernetes-master01            1/1     Running   0          117m   192.168.110.128   kubernetes-master01   <none>           <none>
kube-system   kube-scheduler-kubernetes-master02            1/1     Running   0          49m    192.168.110.130   kubernetes-master02   <none>           <none>
kube-system   kube-scheduler-kubernetes-master03            1/1     Running   0          29m    192.168.110.131   kubernetes-master03   <none>           <none>

7、初始化worker节点

7.1、使用ansible在worker节点批量执行命令加入集群

[root@jumpserver ~]# ansible test_k8s_worker -m shell -a "kubeadm join 192.168.110.131:6443 --token wzsclp.2puihpckhg9otv4n --discovery-token-ca-cert-hash sha256:424baf048926a2a42ea651443ff3803da3366135f76b75aa843970abc1bf009e"

7.2、修改kubelet配置文件将apiserver地址改成负载地址

[root@jumpserver ~]# ansible test_k8s_worker -m shell -a "sed -i 's#server: https://192.168.110.131:6443#server: https://192.168.110.230:6443#g' /etc/kubernetes/kubelet.conf"

7.3、重启kubelet服务

[root@jumpserver ~]# ansible test_k8s_worker -m service -a "name=kubelet state=restarted"

7.3、在master节点给worker节点打label

[root@kubernetes-master03 ~]# kubectl label node kubernetes-node01 node-role.kubernetes.io/worker=''
node/kubernetes-node01 labeled
[root@kubernetes-master03 ~]# kubectl label node kubernetes-node02 node-role.kubernetes.io/worker=''
node/kubernetes-node02 labeled
[root@kubernetes-master03 ~]# kubectl label node kubernetes-node03 node-role.kubernetes.io/worker=''
node/kubernetes-node03 labeled
[root@kubernetes-master03 ~]#

7.4、查看集群状态

[root@kubernetes-master03 ~]# kubectl get node
NAME                  STATUS   ROLES    AGE   VERSION
kubernetes-master01   Ready    master   17h   v1.16.8
kubernetes-master02   Ready    master   16h   v1.16.8
kubernetes-master03   Ready    master   16h   v1.16.8
kubernetes-node01    Ready    worker   14m   v1.16.8
kubernetes-node02    Ready    worker   14m   v1.16.8
kubernetes-node03    Ready    worker   14m   v1.16.8
[root@kubernetes-master03 ~]#

至此,kubernetes高可用集群部署完成。

六、安装helm

1、什么是helm

在linux中,应用的安装靠yum、apt-get等工具,helm是专门为kubernetes而生的包管理工具,可以为k8s快速安装,上传我们制作好的软件。我们都知道,在k8s中,要部署运行一个应用要编写资源清单文件。

而同一类型的应用部署多个可能需要写多个清单文件,这就为我们造成了不易维护、重复工作的烦恼。而helm中可以有模板的概念。

模板将同一类型的应用写成一个大的模板,而后根据需要传递不同的参数(也可以事先在values中定义好)来达到根据不同需要安装不同环境下的多个应用。

2、helm的术语

2.1、chart:helm的打包格式叫做chart,由一系列的资源清单文件和helm特定的格式和子目录组成的一个目录就叫做chart,一个chart就是部署一个应用的所有文件集合

2.2、release:使用helm install会将chart安装部署到k8s中,而部署后使用helm list看到的每一个条目就是一个release。

2.3、repository:由很多个chart组成的仓库,其用途就是用来放置chart,以便以后进行快速安装。仓库可以在本地,也可以在云上。

2.4、tiller:helm3.0版本之前,helm客户端不直接和apiserver通信,而是将请求发给tiller组件,由tiller代为转发给apiserver。在3.0版本之后已经不用了。

2、在master节点上安装helm

[root@jumpserver src]# wget https://get.helm.sh/helm-v3.1.0-linux-amd64.tar.gz
[root@jumpserver src]# ansible test_k8s_master -m unarchive -a "src=/usr/local/src/helm-v3.1.0-linux-amd64.tar.gz dest=/usr/local/ copy=yes owner=root mode=755"
[root@jumpserver src]# ansible test_k8s_master -m shell -a "cp /usr/local/linux-amd64/helm /usr/local/bin/"
[root@jumpserver src]# ansible test_k8s_master -m shell -a "helm version"
192.168.110.130 | CHANGED | rc=0 >>
version.BuildInfo{Version:"v3.1.0", GitCommit:"b29d20baf09943e134c2fa5e1e1cab3bf93315fa", GitTreeState:"clean", GoVersion:"go1.13.7"}

192.168.110.131 | CHANGED | rc=0 >>
version.BuildInfo{Version:"v3.1.0", GitCommit:"b29d20baf09943e134c2fa5e1e1cab3bf93315fa", GitTreeState:"clean", GoVersion:"go1.13.7"}

192.168.110.128 | CHANGED | rc=0 >>
version.BuildInfo{Version:"v3.1.0", GitCommit:"b29d20baf09943e134c2fa5e1e1cab3bf93315fa", GitTreeState:"clean", GoVersion:"go1.13.7"}

helm安装好后默认是没有仓库的,需要手动添加

[root@jumpserver src]# ansible test_k8s_master -m shell -a "helm repo add alirepo https://apphub.aliyuncs.com/"
[root@jumpserver src]# ansible test_k8s_master -m shell -a "helm repo update"
[root@jumpserver src]# ansible test_k8s_master -m shell -a "helm repo list"
192.168.110.130 | CHANGED | rc=0 >>
NAME   	URL                         
alirepo	https://apphub.aliyuncs.com/

192.168.110.131 | CHANGED | rc=0 >>
NAME   	URL                         
alirepo	https://apphub.aliyuncs.com/

192.168.110.128 | CHANGED | rc=0 >>
NAME   	URL                                             
stable 	https://kubernetes-charts.storage.googleapis.com
alirepo	https://apphub.aliyuncs.com/

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK