9

Kubernetes Pod调度说明

 3 years ago
source link: http://www.yunweipai.com/38985.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

简介

Scheduler 是 Kubernetes 的调度器,主要任务是把定义的Pod分配到集群的节点上,听起来非常简单,但要考虑需要方面的问题:

  • 公平:如何保证每个节点都能被分配到资源
  • 资源高效利用:集群所有资源最大化被使用
  • 效率:调度性能要好,能够尽快的对大批量的Pod完成调度工作
  • 灵活:允许用户根据自己的需求控制调度的流程

Scheduler 是作为单独的服务运行的,启动之后会一直监听API Server,获取 podSpec.NodeName为空的Pod,对每个Pod都会创建一个buiding,表明该Pod应该放在哪个节点上

调度过程

调度流程:首先过滤掉不满足条件的节点,这个过程称为predicate;然后对通过的节点按照优先级的顺序,这个是priority;最后从中选择优先级最高的节点。如果中间有任何一步报错,则直接返回错误信息。

Predicate有一系列的算法可以使用:

  • PodFitsResources:节点上剩余的资源是否大于 Pod 请求的资源
  • PodFitsHost:如果Pod指定了nodeName,检查节点名称是否和nodeName匹配
  • PodFitsHostPort:节点上已经使用的port是否和Pod申请的port冲突
  • PodSelectorMatches:过滤和Pod指定的 label 不匹配的节点
  • NoDiskConflict:已经 mount 的 volume 和 Pod 指定的volume不冲突,除非他们都是只读

如果在predicate过程中没有适合的节点,Pod会一直处于Pending状态,不断重新调度,直到有节点满足条件,经过这个步骤,如果多个节点满足条件,就会进入priority过程:按照优先级大小对节点排序,优先级由一系列键值对组成,键是该优先级的名称,值是它的权重,这些优先级选项包括:

  • LeastRequestedPriority:通过计算CPU和Memory的使用率来决定权重,使用率越低权重越高,换句话说,这个优先级倾向于资源使用率低的节点
  • BalanceResourceAllocation:节点上CPU和Memory使用率非常及接近,权重就越高,这个要和上边的一起使用,不可单独使用
  • ImageLocalityPriority:倾向于已经要使用镜像的节点,镜像的总大小值越大,权重越高

通过算法对所有的优先级项目和权重进行计算,得出最终的结果

自定义调度器

除了Kubernetes自带的调度器,也可以编写自己的调度器,通过spec.schedulername参数指定调度器的名字,可以为Pod选择某个调度器进行调度,比如下边的Pod选择my-scheduler进行调度,而不是默认的default-scheduler

apiVersion: v1
kind: Pod
metadata:
  name: scheduler-test
  labels:
    name: example-scheduler
spec:
  schedulername: my-scheduler
  containers:
  - name: Pod-test
    image: nginx:v1

apiVersion : v1

kind : Pod

metadata :

   name : scheduler - test

   labels :

     name : example - scheduler

spec :

   schedulername : my - scheduler

   containers :

   - name : Pod - test

     image : nginx : v1

Kubernetes的调度方法整体共有以下几种:

  • 亲和性:包括Node的亲和性和Pod的亲和性
  • 污点(Taint)和容忍(Toleration)
  • 固定调度策略

因为文章不宜太长,本文先重点介绍 “亲和性” 这个调度方法,后续会再详细介绍其他两种。

亲和性

注意,以下所有的测试都是1Master、1Node的情况下:

[root@Centos8 scheduler]# kubectl get node
NAME          STATUS   ROLES    AGE    VERSION
centos8       Ready    master   134d   v1.15.1
testcentos7   Ready    <none>   133d   v1.15.1

[ root @ Centos8 scheduler ] # kubectl get node

NAME           STATUS   ROLES     AGE     VERSION

centos8       Ready     master    134d    v1 . 15.1

testcentos7   Ready      & lt ; none & gt ;    133d    v1 . 15.1

节点(Node)亲和性

pod.spec.affinity.nodeAffinity

  • preferredDuringSchedulingIgnoredDuringExecution:软策略
    • 软策略是偏向于,更想(不)落在某个节点上,但如果实在没有,落在其他节点也可以
  • requiredDuringSchedulingIgnoredDuringExecution:硬策略
    • 硬策略是必须(不)落在指定的节点上,如果不符合条件,则一直处于Pending状态

演示

requiredDuringSchedulingIgnoredDuringExecution硬策略

# vim node-affinity-required.yaml

apiVersion: v1
kind: Pod
metadata:
  name: affinity-required
  labels:
    app: node-affinity-pod
spec:
  containers:
  - name: with-node-required
    image: nginx:1.2.1
    imagePullPolicy: IfNotPresent
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: kubernetes.io/hostname    #节点名称
            operator: NotIn     #不是
            values:
            - testcentos7           #node节点

# vim node-affinity-required.yaml

apiVersion : v1

kind : Pod

metadata :

   name : affinity - required

   labels :

     app : node - affinity - pod

spec :

   containers :

   - name : with - node - required

     image : nginx : 1.2.1

     imagePullPolicy : IfNotPresent

   affinity :

     nodeAffinity :

       requiredDuringSchedulingIgnoredDuringExecution :

         nodeSelectorTerms :

         - matchExpressions :

           - key : kubernetes . io / hostname      #节点名称

             operator : NotIn      #不是

             values :

             - testcentos7            #node节点

以上策略表示:此Pod不要落在node名称为testcentos7的节点上,下边开始创建

[root@Centos8 ~]# kubectl get node --show-labels #查看node节点标签 NAME STATUS ROLES AGE VERSION LABELS centos8 Ready master 133d v1.15.1 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=centos8,kubernetes.io/os=linux,node-role.kubernetes.io/master=

testcentos7 Ready 133d v1.15.1 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=testcentos7,kubernetes.io/os=linux

目前只有两个节点,一个master 一个node,策略中表示此Pod不在testcentos7这个节点上

Pod创建之后,因为除去testcentos7节点已再无其他node,所以一直处于Pending状态

[root@Centos8 scheduler]# kubectl create -f node-affinity-required.yaml pod/affinity-required created [root@Centos8 scheduler]# kubectl get pod NAME READY STATUS RESTARTS AGE affinity-required 0/1 Pending 0 4s

[root@Centos8 scheduler]# kubectl describe pod affinity-required default-scheduler 0/2 nodes are available: 1 node(s) didn't match node selector, 1 node(s) had taints that the pod didn't tolerate.

[ root @ Centos8 ~ ] # kubectl get node --show-labels    #查看node节点标签

NAME           STATUS   ROLES     AGE     VERSION   LABELS

centos8       Ready     master    133d    v1 . 15.1    beta . kubernetes . io / arch = amd64 , beta . kubernetes . io / os = linux , kubernetes . io / arch = amd64 , kubernetes . io / hostname = centos8 , kubernetes . io / os = linux , node - role . kubernetes . io / master =

testcentos7   Ready        133d    v1 . 15.1    beta . kubernetes . io / arch = amd64 , beta . kubernetes . io / os = linux , kubernetes . io / arch = amd64 , kubernetes . io / hostname = testcentos7 , kubernetes . io / os = linux

## 目前只有两个节点,一个master 一个node,策略中表示此Pod不在testcentos7这个节点上

## Pod创建之后,因为除去testcentos7节点已再无其他node,所以一直处于Pending状态

[ root @ Centos8 scheduler ] # kubectl create -f node-affinity-required.yaml

pod / affinity - required created

[ root @ Centos8 scheduler ] # kubectl get pod

NAME                 READY   STATUS     RESTARTS   AGE

affinity - required    0 / 1      Pending    0            4s

[ root @ Centos8 scheduler ] # kubectl describe pod affinity-required

default - scheduler    0 / 2 nodes are available : 1 node ( s ) didn 't match node selector, 1 node(s) had taints that the pod didn' t tolerate .

将yaml文件中,NotIn改为In

affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: kubernetes.io/hostname    #节点名称
            operator: In        #是,存在
            values:
            - testcentos7           #node节点

affinity :

     nodeAffinity :

       requiredDuringSchedulingIgnoredDuringExecution :

         nodeSelectorTerms :

         - matchExpressions :

           - key : kubernetes . io / hostname      #节点名称

             operator : In          #是,存在

             values :

             - testcentos7            #node节点

再次创建,已经落在指定node节点中

[root@Centos8 scheduler]# kubectl get pod -o wide 
NAME                READY   STATUS    RESTARTS   AGE   IP             NODE     
affinity-required   1/1     Running   0          11s  10.244.3.219  testcentos7

[ root @ Centos8 scheduler ] # kubectl get pod -o wide

NAME                 READY   STATUS     RESTARTS   AGE   IP             NODE    

affinity - required    1 / 1      Running    0            11s    10.244.3.219    testcentos7

preferredDuringSchedulingIgnoredDuringExecution软策略

vim node-affinity-preferred.yaml

apiVersion: v1
kind: Pod
metadata:
  name: affinity-preferred
  labels:
    app: node-affinity-pod
spec:
  containers:
  - name: with-node-preferred
    image: nginx:1.2.1
    imagePullPolicy: IfNotPresent
  affinity:
    nodeAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 100    #权重为100,软策略中权重越高匹配到的机会越大
        preference:  #更偏向于
          matchExpressions:
          - key: kubernetes.io/hostname #node名称
            operator: In  #等于,为
            values:
            - testcentos7   #node真实名称

apiVersion : v1

kind : Pod

metadata :

   name : affinity - preferred

   labels :

     app : node - affinity - pod

spec :

   containers :

   - name : with - node - preferred

     image : nginx : 1.2.1

     imagePullPolicy : IfNotPresent

   affinity :

     nodeAffinity :

       preferredDuringSchedulingIgnoredDuringExecution :

       - weight : 100      #权重为100,软策略中权重越高匹配到的机会越大

         preference :    #更偏向于

           matchExpressions :

           - key : kubernetes . io / hostname #node名称

             operator : In    #等于,为

             values :

             - testcentos7    #node真实名称

以上策略表示:此Pod更想落在node节点名称为testcentos7的node中,开始创建

[root@Centos8 scheduler]# kubectl create -f node-affinity-prefered.yaml 
pod/affinity-prefered created

[root@Centos8 scheduler]# kubectl get pod -o wide 
NAME                READY   STATUS    RESTARTS   AGE   IP             NODE     
affinity-prefered   1/1     Running   0          9s    10.244.3.220 testcentos7

[ root @ Centos8 scheduler ] # kubectl create -f node-affinity-prefered.yaml

pod / affinity - prefered created

[ root @ Centos8 scheduler ] # kubectl get pod -o wide

NAME                 READY   STATUS     RESTARTS   AGE   IP             NODE    

affinity - prefered    1 / 1      Running    0            9s      10.244.3.220 testcentos7

以上正常落在testcentos7节点下

更改一下策略,将node节点名称随便更改为不存在的node名称,例如kube-node2

affinity:
    nodeAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 1    #权重为1,软策略中权重越高匹配到的机会越大
        preference:  #更偏向于
          matchExpressions:
          - key: kubernetes.io/hostname #node名称
            operator: In  #等于,为
            values:
            - kube-node2   #node真实名称

affinity :

     nodeAffinity :

       preferredDuringSchedulingIgnoredDuringExecution :

       - weight : 1      #权重为1,软策略中权重越高匹配到的机会越大

         preference :    #更偏向于

           matchExpressions :

           - key : kubernetes . io / hostname #node名称

             operator : In    #等于,为

             values :

             - kube - node2    #node真实名称

以上策略表示:此Pod更想落在node名称为kube-node2的节点上,开始创建

[root@Centos8 scheduler]# kubectl create -f node-affinity-prefered.yaml pod/affinity-prefered created

##创建后,同样是落在了testcentos7节点上,虽然它更想落在kube-node2节点上,但因为没有这个节点,只好落在testcentos7节点中 [root@Centos8 scheduler]# kubectl get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE

affinity-prefered 1/1 Running 0 17s 10.244.3.221 testcentos7

[ root @ Centos8 scheduler ] # kubectl create -f node-affinity-prefered.yaml

pod / affinity - prefered created

##创建后,同样是落在了testcentos7节点上,虽然它更想落在kube-node2节点上,但因为没有这个节点,只好落在testcentos7节点中

[ root @ Centos8 scheduler ] # kubectl get pod -o wide

NAME                 READY   STATUS     RESTARTS   AGE   IP             NODE    

affinity - prefered    1 / 1      Running    0            17s    10.244.3.221 testcentos7

软硬策略合体

vim node-affinity-common.yaml

apiVersion: v1
kind: Pod
metadata:
  name: affinity-node
  labels:
    app: node-affinity-pod
spec:
  containers:
  - name: with-affinity-node
    image: nginx:v1
    imagePullPulicy: IfNotPresent
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: kubernetes.io/hostname
            operator: NotIn
            values:
            - k8s-node2
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 1 
        preference:
        matchExpressions:
        - key: source
          operator: In
          values:
           - hello

apiVersion : v1

kind : Pod

metadata :

   name : affinity - node

   labels :

     app : node - affinity - pod

spec :

   containers :

   - name : with - affinity - node

     image : nginx : v1

     imagePullPulicy : IfNotPresent

   affinity :

     nodeAffinity :

       requiredDuringSchedulingIgnoredDuringExecution :

         nodeSelectorTerms :

         - matchExpressions :

           - key : kubernetes . io / hostname

             operator : NotIn

             values :

             - k8s - node2

       preferredDuringSchedulingIgnoredDuringExecution :

       - weight : 1

         preference :

         matchExpressions :

         - key : source

           operator : In

           values :

           - hello

软硬结合达到一个更为准确的node选择,以上文件意思为此Pod必须不存在k8s-node2节点中,其他的节点都可以,但最好落在label中source的值为hello的节点中

键值运算关系

  • In:label 的值在某个列表里
  • NotIn:label 的值不在某个列表中
  • Gt:label 的值大于某个值
  • Lt:label 的值小于某个值
  • Exists:某个 label 存在
  • DoesNotExist:某个 label 不存在

如果nodeSelectorTerms下面有多个选项,满足任何一个条件就可以了;如果matchExpressions有多个选项,则必须满足这些条件才能正常调度

Pod亲和性

pod.spec.affinity.podAffinity/podAntiAffinity

  • preferedDuringSchedulingIgnoredDuringExecution:软策略
    • 软策略是偏向于,更想(不)落在某个节点上,但如果实在没有,落在其他节点也可以
  • requiredDuringSchedulingIgnoredDuringExecution:硬策略
    • 硬策略是必须(不)落在指定的节点上,如果不符合条件,则一直处于Pending状态

演示

先创建一个测试Pod

vim pod.yaml

apiVersion: v1
kind: Pod
metadata:
  name: pod-1
  labels:
    app: nginx
    type: web
spec:
  containers:
  - name: pod-1
    image: nginx:1.2.1
    imagePullPolicy: IfNotPresent
    ports:
    - name: web
      containerPort: 80

apiVersion : v1

kind : Pod

metadata :

   name : pod - 1

   labels :

     app : nginx

     type : web

spec :

   containers :

   - name : pod - 1

     image : nginx : 1.2.1

     imagePullPolicy : IfNotPresent

     ports :

     - name : web

       containerPort : 80

[root@Centos8 scheduler]# kubectl create -f pod.yaml 
pod/pod-1 created

[root@Centos8 scheduler]# kubectl get pod --show-labels
NAME    READY   STATUS    RESTARTS   AGE   LABELS
pod-1   1/1     Running   0          4s    app=nginx,type=web

[ root @ Centos8 scheduler ] # kubectl create -f pod.yaml

pod / pod - 1 created

[ root @ Centos8 scheduler ] # kubectl get pod --show-labels

NAME     READY   STATUS     RESTARTS   AGE   LABELS

pod - 1    1 / 1      Running    0            4s      app = nginx , type = web

requiredDuringSchedulingIgnoredDuringExecution Pod硬策略

vim pod-affinity-required.yaml

apiVersion: v1
kind: Pod
metadata:
  name: affinity-required
  labels:
    app: pod-3
spec:
  containers:
  - name: with-pod-required
    image: nginx:1.2.1
    imagePullPolicy: IfNotPresent
  affinity:
    podAffinity:   #在同一域下
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
          - key: app    #标签key
            operator: In
            values:
            - nginx     #标签value
        topologyKey: kubernetes.io/hostname #域的标准为node节点的名称

apiVersion : v1

kind : Pod

metadata :

   name : affinity - required

   labels :

     app : pod - 3

spec :

   containers :

   - name : with - pod - required

     image : nginx : 1.2.1

     imagePullPolicy : IfNotPresent

   affinity :

     podAffinity :    #在同一域下

       requiredDuringSchedulingIgnoredDuringExecution :

       - labelSelector :

           matchExpressions :

           - key : app      #标签key

             operator : In

             values :

             - nginx      #标签value

         topologyKey : kubernetes . io / hostname #域的标准为node节点的名称

以上文件策略为:此Pod必须要和包含label为app:nginx的pod在同一node下

创建测试:

[root@Centos8 scheduler]# kubectl create -f pod-affinity-required.yaml 
pod/affinity-required created


[root@Centos8 scheduler]# kubectl get pod -o wide 
NAME                READY   STATUS    RESTARTS   AGE   IP             NODE  
affinity-required   1/1     Running   0          43s   10.244.3.224 testcentos7
pod-1               1/1     Running   0          10m   10.244.3.223 testcentos7

# 和此标签的Pod在同一node节点下

[ root @ Centos8 scheduler ] # kubectl create -f pod-affinity-required.yaml

pod / affinity - required created

[ root @ Centos8 scheduler ] # kubectl get pod -o wide

NAME                 READY   STATUS     RESTARTS   AGE   IP             NODE  

affinity - required    1 / 1      Running    0            43s    10.244.3.224 testcentos7

pod - 1                1 / 1      Running    0            10m    10.244.3.223 testcentos7

# 和此标签的Pod在同一node节点下

将podAffinity改为podAnitAffinity,使它们不在用于node节点下

apiVersion: v1
kind: Pod
metadata:
  name: required-pod2
  labels:
    app: pod-3
spec:
  containers:
  - name: with-pod-required
    image: nginx:1.2.1
    imagePullPolicy: IfNotPresent
  affinity:
    podAntiAffinity:   
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
          - key: app    #标签key
            operator: In
            values:
            - nginx     #标签value
        topologyKey: kubernetes.io/hostname #域的标准为node节点的名称

apiVersion : v1

kind : Pod

metadata :

   name : required - pod2

   labels :

     app : pod - 3

spec :

   containers :

   - name : with - pod - required

     image : nginx : 1.2.1

     imagePullPolicy : IfNotPresent

   affinity :

     podAntiAffinity :   

       requiredDuringSchedulingIgnoredDuringExecution :

       - labelSelector :

           matchExpressions :

           - key : app      #标签key

             operator : In

             values :

             - nginx      #标签value

         topologyKey : kubernetes . io / hostname #域的标准为node节点的名称

此策略表示,必须要和label为app:nginx的pod在不用的node节点上

创建测试:

[root@Centos8 scheduler]# kubectl create -f pod-affinity-required.yaml pod/required-pod2 created

[root@Centos8 scheduler]# kubectl get pod NAME READY STATUS RESTARTS AGE affinity-required 1/1 Running 0 9m40s pod-1 1/1 Running 0 19m required-pod2 0/1 Pending 0 51s

由于我这里只有一个节点,所以required-pod2只能处于Pending状态

[ root @ Centos8 scheduler ] # kubectl create -f pod-affinity-required.yaml

pod / required - pod2 created

[ root @ Centos8 scheduler ] # kubectl get pod

NAME                 READY   STATUS     RESTARTS   AGE

affinity - required    1 / 1      Running    0            9m40s

pod - 1                1 / 1      Running    0            19m

required - pod2        0 / 1      Pending    0            51s

## 由于我这里只有一个节点,所以required-pod2只能处于Pending状态

preferedDuringSchedulingIgnoredDuringExecution Pod软策略

vim pod-affinity-prefered.yaml

apiVersion: v1
kind: Pod
metadata:
  name: affinity-prefered
  labels:
    app: pod-3
spec:
  containers:
  - name: with-pod-prefered
    image: nginx:v1
    imagePullPolicy: IfNotPresent
  affinity:
    podAntiAffinity:    #不在同一个域下
      preferedDuringSchedulingIgnoredDuringExecution:
      - weight: 1
        podAffinityTerm:
          labelSelector:
            matchExpressions:
            - key: app
              operator: In
              values:
              - pod-2
          topologyKey: kubernetes.io/hostname

apiVersion : v1

kind : Pod

metadata :

   name : affinity - prefered

   labels :

     app : pod - 3

spec :

   containers :

   - name : with - pod - prefered

     image : nginx : v1

     imagePullPolicy : IfNotPresent

   affinity :

     podAntiAffinity :      #不在同一个域下

       preferedDuringSchedulingIgnoredDuringExecution :

       - weight : 1

         podAffinityTerm :

           labelSelector :

             matchExpressions :

             - key : app

               operator : In

               values :

               - pod - 2

           topologyKey : kubernetes . io / hostname

软策略和硬策略的方法基本类似,只是添加了权重,表示更喜欢而已,也可以接受其他,在此就不再演示

亲和性/反亲和性调度策略比较如下:

调度策略 匹配标签 操作符 拓扑域支持 调度目标 nodeAffinity 主机 In,NotIn,Exists,DoesNotExists,Gt,Lt 否 指定主机 podAffinity Pod In,NotIn,Exists,DoesNotExists,Gt,Lt 是 pod与指定pod在一拓扑域 podAnitAffinity Pod In,NotIn,Exists,DoesNotExists,Gt,Lt 是 pod与指定pod不在一拓扑域

本文链接:http://www.yunweipai.com/38985.html


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK