1
votes

I was trying to spread pods evenly in all zones, but couldn't make it work properly.

In my k8s cluster, nodes are spread across 3 az's. Now suppose min node count is 1 and there are 2 nodes at the moment, first one is totally full of pods. Now when I create one deployment (replica 2) with topology spread constraints as ScheduleAnyway then since 2nd node has enough resources both the pods are deployed in that node. I don't want that. I tried changing condition to DoNotSchedule. But since I have only 3 az's, I am only able to schedule 3 pods and it's triggering new node for all 3 pods. I want to make sure that relpica's are spread in 3 all az's.

Here is snippet from deployment spec. Does anyone know what should be the way out?

      topologySpreadConstraints:
      - maxSkew: 1
        topologyKey: topology.kubernetes.io/zone
        whenUnsatisfiable: ScheduleAnyway
        labelSelector:
          matchExpressions:
          - key: app
            operator: In
            values:
            - "my-app" 
2
So you tried DoNotSchedule. Doesn't it do what you want? What's your issue? - SYN
See I am using topologyKey as zone. I have 3 az's, so It works fine for 3 pods. basically it will create 3 nodes, but now when 4th pod comes up. Its stuck on pending, I guess this is because it wants new zone to schedule new pod. I can see NotTriggerScaleUp in logs and reason is node didn't match pod topology. - Ayush walia
Please run kubectl describe <your pending pod> and paste the output to the question. - Mikołaj Głodziak
Hello @Ayushwalia. Does any of the below answers helped you? - Wytrzymały Wiktor

2 Answers

0
votes

You need to tweak the attribute max skew.

Assign the attribute a higher value

Refer the example given at :

https://kubernetes.io/docs/concepts/workloads/pods/pod-topology-spread-constraints/

0
votes

If I good understand your problem you can use node Affinity rule and maxSkew field.

Please take a look at this my answer or have a look at it below. In it, I have described how you can force your pods to split between nodes. All you need to do is set the key and value parameters in matchExpressions section accordingly. Additionally, you may find the requiredDuringSchedulingIgnoredDuringExecution field and the preferredDuringSchedulingIgnoredDuringExecution field very useful.


Look at this example yaml file:

spec:
  topologySpreadConstraints:
  - maxSkew: 1
    topologyKey: kubernetes.io/hostname
    whenUnsatisfiable: DoNotSchedule
    labelSelector:
    matchLabels:
      example: app
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: kubernetes.io/hostname
            operator: In
            values:
            - worker-1
            - worker-2
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 50
        preference:
          matchExpressions:
          - key: kubernetes.io/hostname
            operator: In
            values:
            - worker-1

Idea of this configuration: I'm using nodeAffinity here to indicate on which nodes pod can be placed:

- key: kubernetes.io/hostname

and

values:
- worker-1
- worker-2

It is important to set the following line:

- maxSkew: 1

According to the documentation:

maxSkew describes the degree to which Pods may be unevenly distributed. It must be greater than zero.

Thanks to this, the difference in the number of assigned feeds between nodes will always be maximally equal to 1.

This section:

      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 50
        preference:
          matchExpressions:
          - key: kubernetes.io/hostname
            operator: In
            values:
            - worker-1

is optional however, it will allow you to adjust the feed distribution on the free nodes even better. Here you can find a description with differences between: requiredDuringSchedulingIgnoredDuringExecution and preferredDuringSchedulingIgnoredDuringExecution:

Thus an example of requiredDuringSchedulingIgnoredDuringExecution would be "only run the pod on nodes with Intel CPUs" and an example preferredDuringSchedulingIgnoredDuringExecution would be "try to run this set of pods in failure zone XYZ, but if it's not possible, then allow some to run elsewhere".