k8s集群调度-阿里云开发者社区

k8s集群调度

2023-08-28 166

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

简介： k8s集群调度

屏幕截图 2023-08-28 163846.png

Scheduler调度器

scheduler是kubernetes的调度器，主要的任务是把定义的pod分配到集群的节点上，听起来非常简单，但有很多要考虑的问题：

公平：如何保证每个节点都能被分配资源

资源高效利用：集群所有资源最大化被使用

效率：允许用户根据自己的需求控制调度的逻辑

灵活：允许用户根据自己的需求控制调度的逻辑

scheduler是作为单独的程序运行的，启动之后一直监听api server，获取podspec,nodename为空的pod，对每个pod都会创建一个binding，表明该pod应该放到哪个节点上。

一、调度过程

调度分为几个部分：

首先过滤掉不满足条件的节点，这个过程为predicate；

然后对通过的节点按照优先级排序，这个过程为priority;

最后从中选择优先级最高的节点，如果中间步骤有错误，直接报错。

Predicate的算法：

PodFitsResources: 节点上剩余的资源是否大于pod请求的资源

PodFitsHost: 如果pod指定了nodename，检查节点名称是否和nodename匹配

PodFitsHostPorts: 节点上已经使用的port是否和pod申请的port冲突

PodSelectorMatches: 过滤掉和pod指定的label不匹配的节点

NoDiskConflict：已经mount的volume和pod指定的volume不冲突，除非都是只读

注：资源、nodename匹配、port冲突、标签匹配、持久卷支持

如果predicate过程中没有合适的节点，pod会一直在pending状态，不断重试调度，直到有节点满足条件。经过这个步骤，如果有多个节点满足条件，就据需priorities过程：按照优先级大小对节点排序。

优先级由一系列键值对组成，键是该优先级项的名称，值是权重，优先级项包括：

LeastRequestedPriority: 通过计算cpu和memory的使用率决定权重，使用率越低权重越高

BalanceResourceAllocation:节点上cpu和memory使用率越接近，权重越高，和上面一起用

ImageLocalityPriority:倾向已经有要使用镜像的节点，镜像大小值越大，权重越高

通过算法对所有的优先级项目和权重进行计算，得出最终结果。

除了k8s自带的调度器，也可以通过spec:schedulername参数指定自定义的调度器。

二、节点亲和性（pod与节点之间）

pod.spec.nodeAffinity

preferredDuringSchedulingIgnoredDuringExecution: 软策略

requiredDuringSchedulingIgnoredDuringExecution: 硬策略

硬策略：必须在满足条件下执行

---yaml
apiVersion: v1
kind: Pod
metadata:
  name: affinity
  labels:
    app: node-affinity-pod
spec:
  containers:
  - name: with-node-affinity
    image: nginx
    imagePullPolicy: IfNotPresent
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
          - matchExpressions:
            - key: kubernetes.io/hostname
              operator: NotIn
              values:
              - node02

(key字段来自标签：kubectl get node --show-labels)

键值运算关系：

In: label的值在某个列表中

NotIn: label的值不在某个列表中

Gt: label的值大于某个值

Lt: label的值小于某个值

Exists: 某个label存在

DoesNotExist: 某个label不存在

软策略：满足条件执行，不满足就放弃,在其他条件下执行

---yaml
apiVersion: v1
kind: Pod
metadata:
  name: affinity
  labels:
    app: node-affinity-pod
spec:
  containers:
  - name: with-node-affinity
    image: nginx
    imagePullPolicy: IfNotPresent
  affinity:
    nodeAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 1
        preference:
          matchExpressions:
            - key: kubernetes.io/hostname
              operator: In
              values:
              - node03

三、pod亲和性（pod之间）

pod.spec.affinity.podAffinity/podAntiAffinity

preferredDuringSchedulingIgnoredDuringExecution: 软策略

requiredDuringSchedulingIgnoredDuringExecution: 硬策略

硬策略：匹配条件就在同一主机

vim pod1.yaml
apiVersion: v1
kind: Pod
metadata:
  name: node1
  labels:
    app: node1
spec:
  containers:
  - name: with-node-affinity
    image: nginx
    imagePullPolicy: IfNotPresent
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: kubernetes.io/hostname
            operator: In
            values:
            - node01   
vim pod2.yaml
apiVersion: v1
kind: Pod
metadata:
  name: pod2
  labels:
    app: pod2
spec:
  containers:
  - name: pod2
    image: nginx
    imagePullPolicy: IfNotPresent
  affinity:
    podAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
          - key: app
            operator: In
            values:
            - node1
        topologyKey: kubernetes.io/hostname

---------------------------------------------

亲和性/反亲和性调度策略比较：

调度策略匹配标签拓扑域支持调度目标

nodeAffinity 主机否指定主机

podAffinity pod 是 pod与指定pod同一拓扑域

podAntiAffinity pod 是 pod与指定pod不在同一拓扑域

---------------------------------------------

四、Taint和Toleration

节点亲和性，是pod的一种属性（偏好或硬性要求），它使pod被吸引到一类特定的节点，Taint则相反，它使节点能够排斥一类特定的pod。

Taint和Toleration相互配合，可以用来避免pod被分配到不适合的节点上。每个节点上都可以应用一个或多个Taint，这表示对于那些容忍这些Taint的pod，是不会被该节点接受的，如果将Toleration应用于pod上，则表示这些pod可以（但不要求）被调度到具有匹配Taint的节点上。

（一）污点（Taint）

1.Taint的组成

使用kubectl taint命令可以给某个node节点设置污点，node被设置上污点后就和pod产生了一种排斥的关系，可以让node拒绝pod的调度执行，甚至将node上已存在的pod驱逐出去。

每个污点的组成：

key=value:effect

每个污点有一个key和value作为污点的标签，其中value可以为空，effect描述污点的作用。当前Taint effect支持如下三个选项：

NoSchedule: 表示k8s不会将pod调度到具有该污点的node上

PreferNoSchedule: 表示k8s将尽量避免将pod调度到具有该污点的node上

NoExecute: 表示k8s不会将pod调度到具有该污点的node上，还会把已有的pod驱逐

2.污点的设置、查看、去除

设置污点：

kubectl taint nodes node01 check=lhy:NoExecute

查看污点：

kubectl describe nodes node01 | grep Taint

去除污点：

kubectl taint nodes node01 check:NoExecute-

（二）容忍（Tolerations）

设置了污点的node将根据Taint的effect：NoSchedule、PreferNoSchedule、NoExecute和pod之间产生互斥的关系，pod将在一定程度上不会被调度到node上，但我们可以在pod上设置Toleration,意思是设置了容忍的pod将可以容忍污点的存在，可以被调度到存在污点的node上。

1.在pod的yaml中设置：

pod.spec.tolerations
    spec:
      tolerations:
      - key: check
        operator: Equal
        value: lhy
        effect: NoExecute
        tolerationSeconds: 3600

注：

其中key,value,effect要与node上的Taint保持一致

operator的值为Exists将会忽略value值

tolerationSeconds用于描述当pod需要被驱逐时还可以保留运行的时间

2.当不指定key值时，表示容忍所有的污点key：

tolerations:

- operator: Exists

3.当不指定effect值时，表示容忍所有的污点：

tolerations:

- key: key1

operator: Exists

4.当有多个master存在时，为防止资源浪费，可以如下设置：

kubectl taint nodes master node-role.kubernetes.io/master=:PreferNoSchedule

五、指定调度节点

1.pod.spec.nodeName

指定node名直接调度到对应node节点上，强制匹配跳过Scheduler调度

apiVersion: apps/v1
kind: Deployment
metadata:
  name: bdqn1
spec:
  selector:
    matchLabels:
      app: bdqn1
  replicas: 5
  template:
    metadata:
      labels:
        app: bdqn1
    spec:
      nodeName: node02
      containers:
        - name: bdqn1
          image: nginx
          imagePullPolicy: IfNotPresent
          ports:      
            - containerPort: 80

2. pod.spec.nodeSelector

通过label-selector机制选择节点，由调度器匹配label，然后调度pod到对应节点，强制约束

apiVersion: apps/v1
kind: Deployment
metadata:
  name: bdqn2
spec:
  selector:
    matchLabels:
      app: bdqn2
  replicas: 3
  template:
    metadata:
      labels:
        app: bdqn2
    spec:
      nodeSelector:
        disk: ssd
      containers:
        - name: bdqn2
          image: nginx
          imagePullPolicy: IfNotPresent
          ports:      
            - containerPort: 80

查看标签：

kubectl get nodes --show-labels

设置node标签：

kubectl label node node01 disk=ssd

去除node标签：

kubectl label node node01 disk-

k8s集群调度

Scheduler调度器

一、调度过程

二、节点亲和性（pod与节点之间）

三、pod亲和性（pod之间）

四、Taint和Toleration

五、指定调度节点

热门文章

最新文章

相关课程

相关电子书

相关实验场景

推荐镜像

热门

活动广场

任务中心

开发者评测

高校计划

乘风者计划

训练营

阿里云MVP

话题

直播

下载

镜像站

技术资料

插件

k8s集群调度

Scheduler调度器

一、调度过程

二、节点亲和性（pod与节点之间）

三、pod亲和性（pod之间）

四、Taint和Toleration

五、指定调度节点

热门文章

最新文章

相关课程

相关电子书

相关实验场景

推荐镜像