Pod 节点亲和性
pod 自身的亲和性调度有两种表示形式
podaffinity:pod 和 pod 更倾向腻在一起,把相近的 pod 结合到相近的位置,比如同一区域,同一机架,这样的话 pod 和 pod 之间更好通信,比方说有两个机房,这两个机房部署的集群有 1000 台主机,那么我们希望把 nginx 和 tomcat 都部署同一个地方的 node 节点上,可以提高通信效率。
podunaffinity:pod 和 pod 更倾向不腻在一起,如果部署两套程序,那么这两套程序更倾向于反亲和性,这样相互之间不会有影响。
第一个 pod 随机选则一个节点,做为评判后续的 pod 能否到达这个 pod 所在的节点上的运行方式,这就称为 pod 亲和性;我们怎么判定哪些节点是相同位置的,哪些节点是不同位置的。我们在定义 pod 亲和性时需要有一个前提,哪些 pod 在同一个位置,哪些 pod 不在同一个位置,这个位置是怎么定义的,标准是什么?以节点名称为标准,这个节点名称相同的表示是同一个位置,节点名称不相同的表示不是一个位置,或者其他方式。
[root@k8smaster ~]# kubectl explain pods.spec.affinity.podAffinity KIND: Pod VERSION: v1 RESOURCE: podAffinity <Object> DESCRIPTION: Describes pod affinity scheduling rules (e.g. co-locate this pod in the same node, zone, etc. as some other pod(s)). Pod affinity is a group of inter pod affinity scheduling rules. FIELDS: preferredDuringSchedulingIgnoredDuringExecution <[]Object> #软亲和性 requiredDuringSchedulingIgnoredDuringExecution <[]Object> #硬亲和性 [root@k8smaster ~]# kubectl explain pods.spec.affinity.podAffinity.requiredDuringSchedulingIgnoredDuringExecution KIND: Pod VERSION: v1 RESOURCE: requiredDuringSchedulingIgnoredDuringExecution <[]Object> DESCRIPTION: If the affinity requirements specified by this field are not met at scheduling time, the pod will not be scheduled onto the node. If the affinity requirements specified by this field cease to be met at some point during pod execution (e.g. due to a pod label update), the system may or may not try to eventually evict the pod from its node. When there are multiple elements, the lists of nodes corresponding to each podAffinityTerm are intersected, i.e. all terms must be satisfied. Defines a set of pods (namely those matching the labelSelector relative to the given namespace(s)) that this pod should be co-located (affinity) or not co-located (anti-affinity) with, where co-located is defined as running on a node whose value of the label with key <topologyKey> matches that of any node on which a pod of the set of pods is running FIELDS: labelSelector <Object> #我们要判断 pod 跟别的 pod 亲和,跟哪个 pod 亲和,需要靠 labelSelector,通过 labelSelector选则一组能作为亲和对象的 pod 资源 namespaces <[]string> #labelSelector 需要选则一组资源,那么这组资源是在哪个名称空间中呢,通过 namespace 指定,如果不指定 namespaces,那么就是当前创建 pod 的名称空间 topologyKey <string> -required- #位置拓扑的键,这个是必须字段 怎么判断是不是同一个位置 rack=rack1 row=row1 使用 rack 的键是同一个位置 使用 row 的键是同一个位置 [root@k8smaster ~]# kubectl explain pods.spec.affinity.podAffinity.requiredDuringSchedulingIgnoredDuringExecution.labelSelector KIND: Pod VERSION: v1 RESOURCE: labelSelector <Object> DESCRIPTION: A label query over a set of resources, in this case pods. A label selector is a label query over a set of resources. The result of matchLabels and matchExpressions are ANDed. An empty label selector matches all objects. A null label selector matches no objects. FIELDS: matchExpressions <[]Object> matchLabels <map[string]string> 例:pod 节点亲和性 #定义两个 pod,第一个 pod 做为基准,第二个 pod 跟着它走 [root@k8smaster ~]# kubectl delete pods pod-first pod "pod-first" deleted [root@k8smaster ~]# vim pod-required-affinity-demo.yaml apiVersion: v1 kind: Pod metadata: name: pod-first labels: app2: myapp2 tier: frontend spec: containers: - name: myapp image: nginx --- apiVersion: v1 kind: Pod metadata: name: pod-second labels: app: backend tier: db spec: containers: - name: busybox image: busybox:latest imagePullPolicy: IfNotPresent command: ["sh","-c","sleep 3600"] affinity: podAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - {key: app2, operator: In, values: ["myapp2"]} topologyKey: kubernetes.io/hostname
busybox如果不写sleep会自动关闭,sleep是等待多长时间。然后下面就是pod亲和性,硬亲和性,然后通过label(标签选择器选择),之前我们也通过帮助文档看过了选择器里面的字段(也可以深入看match里面有啥)
下面的key意思是选择app2=myapp2的标签做亲和性,如果不写第二个pod会找不到和哪个做亲和性。
最后一行可以直接用nodes里的已有标签来位置拓扑的键。kubectl get nodes --show-labels查看标签。
kubernetes.io/hostname标签对应的是具体的k8snode节点名,如果frist调度到node2或者node1,第二个pod也跟着调度到哪个节点(根据主机名做位置)
#上面表示创建的 pod 必须与拥有 app2=myapp2 标签的 pod 在一个节点上 [root@k8smaster ~]# kubectl apply -f pod-required-affinity-demo.yaml pod/pod-first created pod/pod-second created [root@k8smaster ~]# kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE pod-first 1/1 Running 0 26s 10.244.1.21 k8snode2 <none> pod-second 1/1 Running 0 26s 10.244.1.22 k8snode2 <none> #上面说明第一个 pod 调度到哪,第二个 pod 也调度到哪,这就是 pod 节点亲和性 [root@k8smaster ~]# kubectl delete -f pod-required-affinity-demo.yaml pod "pod-first" deleted pod "pod-second" deleted
pod 节点反亲和性
定义两个 pod,第一个 pod 做为基准,第二个 pod 跟它调度节点相反 同样基于node名字作为基准 [root@k8smaster ~]# vim pod-required-anti-affinity-demo.yaml apiVersion: v1 kind: Pod metadata: name: pod-first labels: app1: myapp1 tier: frontend spec: containers: - name: myapp image: nginx --- apiVersion: v1 kind: Pod metadata: name: pod-second labels: app: backend tier: db spec: containers: - name: busybox image: busybox imagePullPolicy: IfNotPresent command: ["sh","-c","sleep 3600"] affinity: podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - {key: app1, operator: In, values: ["myapp1"]} topologyKey: kubernetes.io/hostname [root@k8smaster ~]# kubectl apply -f pod-required-anti-affinity-demo.yaml pod/pod-first created pod/pod-second created [root@k8smaster ~]# kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE pod-first 1/1 Running 0 21s 10.244.1.23 k8snode2 <none> pod-second 1/1 Running 0 21s 10.244.2.20 k8snode <none> #两个 pod 不在一个 node 节点上,这就是 pod 节点反亲和性 [root@k8smaster ~]# kubectl delete -f pod-required-anti-affinity-demo.yaml pod "pod-first" deleted pod "pod-second" deleted #例3:换一个 topologykey [root@k8smaster ~]# kubectl label nodes k8snode zone=foo --overwrite node/k8snode not labeled [root@k8smaster ~]# kubectl label nodes k8snode2 zone=foo --overwrite node/k8snode2 labeled #然后去pp node 和当前目录下都删除掉k8s的pod。 [root@k8smaster pp]# kubectl delete -f . resourcequota "mem-cpu-quota" deleted pod "pod-test" deleted [root@k8smaster node]# kubectl delete -f . pod "demo-pod-1" deleted pod "demo-pod" deleted pod "pod-node-affinity-demo-2" deleted pod "pod-node-affinity-demo" deleted [root@k8smaster node]# kubectl get pods NAME READY STATUS RESTARTS AGE [root@k8smaster node]# vim pod-first-required-anti-affinity-demo-1.yaml apiVersion: v1 kind: Pod metadata: name: pod-first labels: app3: myapp3 tier: frontend spec: containers: - name: myapp image: nginx [root@k8smaster node]# vim pod-second-required-anti-affinity-demo-1.yaml apiVersion: v1 kind: Pod metadata: name: pod-second labels: app: backend tier: db spec: containers: - name: busybox image: busybox imagePullPolicy: IfNotPresent command: ["sh","-c","sleep 3600"] affinity: podAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - {key: app3, operator: In, values: ["myapp3"]} topologyKey: zone #如果写在一起,可能启动顺序会有错误,比如第二个pod先启动。不管pod调度到哪个节点,都都是以zone标签作为位置。 [root@k8smaster node]# kubectl apply -f pod-first-required-anti-affinity-demo-1.yaml pod/pod-first created [root@k8smaster node]# kubectl apply -f pod-second-required-anti-affinity-demo-1.yaml pod/pod-second created [root@k8smaster node]# kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE pod-first 1/1 Running 0 21s 10.244.1.23 k8snode2 <none> pod-second 0/1 pending 0 21s <none> <none> <none>
第二个节点现是 pending,因为两个节点是同一个位置(因为配置了一样的zone标签,如果pod1调度到有zone标签的node上,那么第二个pod就永远不会调度到有zone标签的node上,因为我们要求的是反亲和性)现在没有不是同一个位置的了,所以就会处于 pending 状态,如果在反亲和性这个位置把 required 改成 preferred,那么也会运行。
podaffinity:pod 节点亲和性,pod 倾向于哪个 pod
nodeaffinity:node 节点亲和性,pod 倾向于哪个 node