Kubernetes Pod调度说明
source link: http://www.yunweipai.com/38985.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
简介
Scheduler 是 Kubernetes 的调度器,主要任务是把定义的Pod分配到集群的节点上,听起来非常简单,但要考虑需要方面的问题:
- 公平:如何保证每个节点都能被分配到资源
- 资源高效利用:集群所有资源最大化被使用
- 效率:调度性能要好,能够尽快的对大批量的Pod完成调度工作
- 灵活:允许用户根据自己的需求控制调度的流程
Scheduler 是作为单独的服务运行的,启动之后会一直监听API Server,获取 podSpec.NodeName为空的Pod,对每个Pod都会创建一个buiding,表明该Pod应该放在哪个节点上
调度过程
调度流程:首先过滤掉不满足条件的节点,这个过程称为predicate;然后对通过的节点按照优先级的顺序,这个是priority;最后从中选择优先级最高的节点。如果中间有任何一步报错,则直接返回错误信息。
Predicate有一系列的算法可以使用:
- PodFitsResources:节点上剩余的资源是否大于 Pod 请求的资源
- PodFitsHost:如果Pod指定了nodeName,检查节点名称是否和nodeName匹配
- PodFitsHostPort:节点上已经使用的port是否和Pod申请的port冲突
- PodSelectorMatches:过滤和Pod指定的 label 不匹配的节点
- NoDiskConflict:已经 mount 的 volume 和 Pod 指定的volume不冲突,除非他们都是只读
如果在predicate过程中没有适合的节点,Pod会一直处于Pending状态,不断重新调度,直到有节点满足条件,经过这个步骤,如果多个节点满足条件,就会进入priority过程:按照优先级大小对节点排序,优先级由一系列键值对组成,键是该优先级的名称,值是它的权重,这些优先级选项包括:
- LeastRequestedPriority:通过计算CPU和Memory的使用率来决定权重,使用率越低权重越高,换句话说,这个优先级倾向于资源使用率低的节点
- BalanceResourceAllocation:节点上CPU和Memory使用率非常及接近,权重就越高,这个要和上边的一起使用,不可单独使用
- ImageLocalityPriority:倾向于已经要使用镜像的节点,镜像的总大小值越大,权重越高
通过算法对所有的优先级项目和权重进行计算,得出最终的结果
自定义调度器
除了Kubernetes自带的调度器,也可以编写自己的调度器,通过spec.schedulername参数指定调度器的名字,可以为Pod选择某个调度器进行调度,比如下边的Pod选择my-scheduler进行调度,而不是默认的default-scheduler
apiVersion: v1 kind: Pod metadata: name: scheduler-test labels: name: example-scheduler spec: schedulername: my-scheduler containers: - name: Pod-test image: nginx:v1
apiVersion : v1
kind : Pod
metadata :
name : scheduler - test
labels :
name : example - scheduler
spec :
schedulername : my - scheduler
containers :
- name : Pod - test
image : nginx : v1
Kubernetes的调度方法整体共有以下几种:
- 亲和性:包括Node的亲和性和Pod的亲和性
- 污点(Taint)和容忍(Toleration)
- 固定调度策略
因为文章不宜太长,本文先重点介绍 “亲和性” 这个调度方法,后续会再详细介绍其他两种。
亲和性
注意,以下所有的测试都是1Master、1Node的情况下:
[root@Centos8 scheduler]# kubectl get node NAME STATUS ROLES AGE VERSION centos8 Ready master 134d v1.15.1 testcentos7 Ready <none> 133d v1.15.1
[ root @ Centos8 scheduler ] # kubectl get node
NAME STATUS ROLES AGE VERSION
centos8 Ready master 134d v1 . 15.1
testcentos7 Ready & lt ; none & gt ; 133d v1 . 15.1
节点(Node)亲和性
pod.spec.affinity.nodeAffinity
-
preferredDuringSchedulingIgnoredDuringExecution:软策略
- 软策略是偏向于,更想(不)落在某个节点上,但如果实在没有,落在其他节点也可以
-
requiredDuringSchedulingIgnoredDuringExecution:硬策略
- 硬策略是必须(不)落在指定的节点上,如果不符合条件,则一直处于Pending状态
演示
requiredDuringSchedulingIgnoredDuringExecution硬策略
# vim node-affinity-required.yaml apiVersion: v1 kind: Pod metadata: name: affinity-required labels: app: node-affinity-pod spec: containers: - name: with-node-required image: nginx:1.2.1 imagePullPolicy: IfNotPresent affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: kubernetes.io/hostname #节点名称 operator: NotIn #不是 values: - testcentos7 #node节点
# vim node-affinity-required.yaml
apiVersion : v1
kind : Pod
metadata :
name : affinity - required
labels :
app : node - affinity - pod
spec :
containers :
- name : with - node - required
image : nginx : 1.2.1
imagePullPolicy : IfNotPresent
affinity :
nodeAffinity :
requiredDuringSchedulingIgnoredDuringExecution :
nodeSelectorTerms :
- matchExpressions :
- key : kubernetes . io / hostname #节点名称
operator : NotIn #不是
values :
- testcentos7 #node节点
以上策略表示:此Pod不要落在node名称为testcentos7的节点上,下边开始创建
[root@Centos8 ~]# kubectl get node --show-labels #查看node节点标签 NAME STATUS ROLES AGE VERSION LABELS centos8 Ready master 133d v1.15.1 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=centos8,kubernetes.io/os=linux,node-role.kubernetes.io/master=
testcentos7 Ready 133d v1.15.1 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=testcentos7,kubernetes.io/os=linux
目前只有两个节点,一个master 一个node,策略中表示此Pod不在testcentos7这个节点上
Pod创建之后,因为除去testcentos7节点已再无其他node,所以一直处于Pending状态
[root@Centos8 scheduler]# kubectl create -f node-affinity-required.yaml pod/affinity-required created [root@Centos8 scheduler]# kubectl get pod NAME READY STATUS RESTARTS AGE affinity-required 0/1 Pending 0 4s
[root@Centos8 scheduler]# kubectl describe pod affinity-required default-scheduler 0/2 nodes are available: 1 node(s) didn't match node selector, 1 node(s) had taints that the pod didn't tolerate.
[ root @ Centos8 ~ ] # kubectl get node --show-labels #查看node节点标签
NAME STATUS ROLES AGE VERSION LABELS
centos8 Ready master 133d v1 . 15.1 beta . kubernetes . io / arch = amd64 , beta . kubernetes . io / os = linux , kubernetes . io / arch = amd64 , kubernetes . io / hostname = centos8 , kubernetes . io / os = linux , node - role . kubernetes . io / master =
testcentos7 Ready 133d v1 . 15.1 beta . kubernetes . io / arch = amd64 , beta . kubernetes . io / os = linux , kubernetes . io / arch = amd64 , kubernetes . io / hostname = testcentos7 , kubernetes . io / os = linux
## 目前只有两个节点,一个master 一个node,策略中表示此Pod不在testcentos7这个节点上
## Pod创建之后,因为除去testcentos7节点已再无其他node,所以一直处于Pending状态
[ root @ Centos8 scheduler ] # kubectl create -f node-affinity-required.yaml
pod / affinity - required created
[ root @ Centos8 scheduler ] # kubectl get pod
NAME READY STATUS RESTARTS AGE
affinity - required 0 / 1 Pending 0 4s
[ root @ Centos8 scheduler ] # kubectl describe pod affinity-required
default - scheduler 0 / 2 nodes are available : 1 node ( s ) didn 't match node selector, 1 node(s) had taints that the pod didn' t tolerate .
将yaml文件中,NotIn改为In
affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: kubernetes.io/hostname #节点名称 operator: In #是,存在 values: - testcentos7 #node节点
affinity :
nodeAffinity :
requiredDuringSchedulingIgnoredDuringExecution :
nodeSelectorTerms :
- matchExpressions :
- key : kubernetes . io / hostname #节点名称
operator : In #是,存在
values :
- testcentos7 #node节点
再次创建,已经落在指定node节点中
[root@Centos8 scheduler]# kubectl get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE affinity-required 1/1 Running 0 11s 10.244.3.219 testcentos7
[ root @ Centos8 scheduler ] # kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE
affinity - required 1 / 1 Running 0 11s 10.244.3.219 testcentos7
preferredDuringSchedulingIgnoredDuringExecution软策略
vim node-affinity-preferred.yaml
apiVersion: v1 kind: Pod metadata: name: affinity-preferred labels: app: node-affinity-pod spec: containers: - name: with-node-preferred image: nginx:1.2.1 imagePullPolicy: IfNotPresent affinity: nodeAffinity: preferredDuringSchedulingIgnoredDuringExecution: - weight: 100 #权重为100,软策略中权重越高匹配到的机会越大 preference: #更偏向于 matchExpressions: - key: kubernetes.io/hostname #node名称 operator: In #等于,为 values: - testcentos7 #node真实名称
apiVersion : v1
kind : Pod
metadata :
name : affinity - preferred
labels :
app : node - affinity - pod
spec :
containers :
- name : with - node - preferred
image : nginx : 1.2.1
imagePullPolicy : IfNotPresent
affinity :
nodeAffinity :
preferredDuringSchedulingIgnoredDuringExecution :
- weight : 100 #权重为100,软策略中权重越高匹配到的机会越大
preference : #更偏向于
matchExpressions :
- key : kubernetes . io / hostname #node名称
operator : In #等于,为
values :
- testcentos7 #node真实名称
以上策略表示:此Pod更想落在node节点名称为testcentos7的node中,开始创建
[root@Centos8 scheduler]# kubectl create -f node-affinity-prefered.yaml pod/affinity-prefered created [root@Centos8 scheduler]# kubectl get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE affinity-prefered 1/1 Running 0 9s 10.244.3.220 testcentos7
[ root @ Centos8 scheduler ] # kubectl create -f node-affinity-prefered.yaml
pod / affinity - prefered created
[ root @ Centos8 scheduler ] # kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE
affinity - prefered 1 / 1 Running 0 9s 10.244.3.220 testcentos7
以上正常落在testcentos7节点下
更改一下策略,将node节点名称随便更改为不存在的node名称,例如kube-node2
affinity: nodeAffinity: preferredDuringSchedulingIgnoredDuringExecution: - weight: 1 #权重为1,软策略中权重越高匹配到的机会越大 preference: #更偏向于 matchExpressions: - key: kubernetes.io/hostname #node名称 operator: In #等于,为 values: - kube-node2 #node真实名称
affinity :
nodeAffinity :
preferredDuringSchedulingIgnoredDuringExecution :
- weight : 1 #权重为1,软策略中权重越高匹配到的机会越大
preference : #更偏向于
matchExpressions :
- key : kubernetes . io / hostname #node名称
operator : In #等于,为
values :
- kube - node2 #node真实名称
以上策略表示:此Pod更想落在node名称为kube-node2的节点上,开始创建
[root@Centos8 scheduler]# kubectl create -f node-affinity-prefered.yaml pod/affinity-prefered created
##创建后,同样是落在了testcentos7节点上,虽然它更想落在kube-node2节点上,但因为没有这个节点,只好落在testcentos7节点中 [root@Centos8 scheduler]# kubectl get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE
affinity-prefered 1/1 Running 0 17s 10.244.3.221 testcentos7
[ root @ Centos8 scheduler ] # kubectl create -f node-affinity-prefered.yaml
pod / affinity - prefered created
##创建后,同样是落在了testcentos7节点上,虽然它更想落在kube-node2节点上,但因为没有这个节点,只好落在testcentos7节点中
[ root @ Centos8 scheduler ] # kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE
affinity - prefered 1 / 1 Running 0 17s 10.244.3.221 testcentos7
软硬策略合体
vim node-affinity-common.yaml
apiVersion: v1 kind: Pod metadata: name: affinity-node labels: app: node-affinity-pod spec: containers: - name: with-affinity-node image: nginx:v1 imagePullPulicy: IfNotPresent affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: kubernetes.io/hostname operator: NotIn values: - k8s-node2 preferredDuringSchedulingIgnoredDuringExecution: - weight: 1 preference: matchExpressions: - key: source operator: In values: - hello
apiVersion : v1
kind : Pod
metadata :
name : affinity - node
labels :
app : node - affinity - pod
spec :
containers :
- name : with - affinity - node
image : nginx : v1
imagePullPulicy : IfNotPresent
affinity :
nodeAffinity :
requiredDuringSchedulingIgnoredDuringExecution :
nodeSelectorTerms :
- matchExpressions :
- key : kubernetes . io / hostname
operator : NotIn
values :
- k8s - node2
preferredDuringSchedulingIgnoredDuringExecution :
- weight : 1
preference :
matchExpressions :
- key : source
operator : In
values :
- hello
软硬结合达到一个更为准确的node选择,以上文件意思为此Pod必须不存在k8s-node2节点中,其他的节点都可以,但最好落在label中source的值为hello的节点中
键值运算关系
- In:label 的值在某个列表里
- NotIn:label 的值不在某个列表中
- Gt:label 的值大于某个值
- Lt:label 的值小于某个值
- Exists:某个 label 存在
- DoesNotExist:某个 label 不存在
如果nodeSelectorTerms下面有多个选项,满足任何一个条件就可以了;如果matchExpressions有多个选项,则必须满足这些条件才能正常调度
Pod亲和性
pod.spec.affinity.podAffinity/podAntiAffinity
-
preferedDuringSchedulingIgnoredDuringExecution:软策略
- 软策略是偏向于,更想(不)落在某个节点上,但如果实在没有,落在其他节点也可以
-
requiredDuringSchedulingIgnoredDuringExecution:硬策略
- 硬策略是必须(不)落在指定的节点上,如果不符合条件,则一直处于Pending状态
演示
先创建一个测试Pod
vim pod.yaml
apiVersion: v1 kind: Pod metadata: name: pod-1 labels: app: nginx type: web spec: containers: - name: pod-1 image: nginx:1.2.1 imagePullPolicy: IfNotPresent ports: - name: web containerPort: 80
apiVersion : v1
kind : Pod
metadata :
name : pod - 1
labels :
app : nginx
type : web
spec :
containers :
- name : pod - 1
image : nginx : 1.2.1
imagePullPolicy : IfNotPresent
ports :
- name : web
containerPort : 80
[root@Centos8 scheduler]# kubectl create -f pod.yaml pod/pod-1 created [root@Centos8 scheduler]# kubectl get pod --show-labels NAME READY STATUS RESTARTS AGE LABELS pod-1 1/1 Running 0 4s app=nginx,type=web
[ root @ Centos8 scheduler ] # kubectl create -f pod.yaml
pod / pod - 1 created
[ root @ Centos8 scheduler ] # kubectl get pod --show-labels
NAME READY STATUS RESTARTS AGE LABELS
pod - 1 1 / 1 Running 0 4s app = nginx , type = web
requiredDuringSchedulingIgnoredDuringExecution Pod硬策略
vim pod-affinity-required.yaml
apiVersion: v1 kind: Pod metadata: name: affinity-required labels: app: pod-3 spec: containers: - name: with-pod-required image: nginx:1.2.1 imagePullPolicy: IfNotPresent affinity: podAffinity: #在同一域下 requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: app #标签key operator: In values: - nginx #标签value topologyKey: kubernetes.io/hostname #域的标准为node节点的名称
apiVersion : v1
kind : Pod
metadata :
name : affinity - required
labels :
app : pod - 3
spec :
containers :
- name : with - pod - required
image : nginx : 1.2.1
imagePullPolicy : IfNotPresent
affinity :
podAffinity : #在同一域下
requiredDuringSchedulingIgnoredDuringExecution :
- labelSelector :
matchExpressions :
- key : app #标签key
operator : In
values :
- nginx #标签value
topologyKey : kubernetes . io / hostname #域的标准为node节点的名称
以上文件策略为:此Pod必须要和包含label为app:nginx的pod在同一node下
创建测试:
[root@Centos8 scheduler]# kubectl create -f pod-affinity-required.yaml pod/affinity-required created [root@Centos8 scheduler]# kubectl get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE affinity-required 1/1 Running 0 43s 10.244.3.224 testcentos7 pod-1 1/1 Running 0 10m 10.244.3.223 testcentos7 # 和此标签的Pod在同一node节点下
[ root @ Centos8 scheduler ] # kubectl create -f pod-affinity-required.yaml
pod / affinity - required created
[ root @ Centos8 scheduler ] # kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE
affinity - required 1 / 1 Running 0 43s 10.244.3.224 testcentos7
pod - 1 1 / 1 Running 0 10m 10.244.3.223 testcentos7
# 和此标签的Pod在同一node节点下
将podAffinity改为podAnitAffinity,使它们不在用于node节点下
apiVersion: v1 kind: Pod metadata: name: required-pod2 labels: app: pod-3 spec: containers: - name: with-pod-required image: nginx:1.2.1 imagePullPolicy: IfNotPresent affinity: podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: app #标签key operator: In values: - nginx #标签value topologyKey: kubernetes.io/hostname #域的标准为node节点的名称
apiVersion : v1
kind : Pod
metadata :
name : required - pod2
labels :
app : pod - 3
spec :
containers :
- name : with - pod - required
image : nginx : 1.2.1
imagePullPolicy : IfNotPresent
affinity :
podAntiAffinity :
requiredDuringSchedulingIgnoredDuringExecution :
- labelSelector :
matchExpressions :
- key : app #标签key
operator : In
values :
- nginx #标签value
topologyKey : kubernetes . io / hostname #域的标准为node节点的名称
此策略表示,必须要和label为app:nginx的pod在不用的node节点上
创建测试:
[root@Centos8 scheduler]# kubectl create -f pod-affinity-required.yaml pod/required-pod2 created
[root@Centos8 scheduler]# kubectl get pod NAME READY STATUS RESTARTS AGE affinity-required 1/1 Running 0 9m40s pod-1 1/1 Running 0 19m required-pod2 0/1 Pending 0 51s
由于我这里只有一个节点,所以required-pod2只能处于Pending状态
[ root @ Centos8 scheduler ] # kubectl create -f pod-affinity-required.yaml
pod / required - pod2 created
[ root @ Centos8 scheduler ] # kubectl get pod
NAME READY STATUS RESTARTS AGE
affinity - required 1 / 1 Running 0 9m40s
pod - 1 1 / 1 Running 0 19m
required - pod2 0 / 1 Pending 0 51s
## 由于我这里只有一个节点,所以required-pod2只能处于Pending状态
preferedDuringSchedulingIgnoredDuringExecution Pod软策略
vim pod-affinity-prefered.yaml
apiVersion: v1 kind: Pod metadata: name: affinity-prefered labels: app: pod-3 spec: containers: - name: with-pod-prefered image: nginx:v1 imagePullPolicy: IfNotPresent affinity: podAntiAffinity: #不在同一个域下 preferedDuringSchedulingIgnoredDuringExecution: - weight: 1 podAffinityTerm: labelSelector: matchExpressions: - key: app operator: In values: - pod-2 topologyKey: kubernetes.io/hostname
apiVersion : v1
kind : Pod
metadata :
name : affinity - prefered
labels :
app : pod - 3
spec :
containers :
- name : with - pod - prefered
image : nginx : v1
imagePullPolicy : IfNotPresent
affinity :
podAntiAffinity : #不在同一个域下
preferedDuringSchedulingIgnoredDuringExecution :
- weight : 1
podAffinityTerm :
labelSelector :
matchExpressions :
- key : app
operator : In
values :
- pod - 2
topologyKey : kubernetes . io / hostname
软策略和硬策略的方法基本类似,只是添加了权重,表示更喜欢而已,也可以接受其他,在此就不再演示
亲和性/反亲和性调度策略比较如下:
调度策略 匹配标签 操作符 拓扑域支持 调度目标 nodeAffinity 主机 In,NotIn,Exists,DoesNotExists,Gt,Lt 否 指定主机 podAffinity Pod In,NotIn,Exists,DoesNotExists,Gt,Lt 是 pod与指定pod在一拓扑域 podAnitAffinity Pod In,NotIn,Exists,DoesNotExists,Gt,Lt 是 pod与指定pod不在一拓扑域本文链接:http://www.yunweipai.com/38985.html
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK