18
votes

I have one kubernetes cluster with 4 nodes and one master. I am trying to run 5 nginx pod in all nodes. Currently sometimes the scheduler runs all the pods in one machine and sometimes in different machine.

What happens if my node goes down and all my pods were running in same node? We need to avoid this.

How to enforce scheduler to run pods on the nodes in round-robin fashion, so that if any node goes down then at at least one node should have NGINX pod in running mode.

Is this possible or not? If possible, how can we achieve this scenario?

5

5 Answers

19
votes

Use podAntiAfinity

Reference: Kubernetes in Action Chapter 16. Advanced scheduling

The podAntiAfinity with requiredDuringSchedulingIgnoredDuringExecution can be used to prevent the same pod from being scheduled to the same hostname. If prefer more relaxed constraint, use preferredDuringSchedulingIgnoredDuringExecution.

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: nginx
spec:
  replicas: 5
  template:
    metadata:
      labels:                                            
        app: nginx                                   
    spec:
      affinity:
        podAntiAffinity:                                 
          requiredDuringSchedulingIgnoredDuringExecution:   <---- hard requirement not to schedule "nginx" pod if already one scheduled.
          - topologyKey: kubernetes.io/hostname     <---- Anti affinity scope is host     
            labelSelector:                               
              matchLabels:                               
                app: nginx        
      container:
        image: nginx:latest

Kubelet --max-pods

You can specify the max number of pods for a node in kubelet configuration so that in the scenario of node(s) down, it will prevent K8S from saturating another nodes with pods from the failed node.

10
votes

I think the inter-pod anti-affinity feature will help you. Inter-pod anti-affinity allows you to constrain which nodes your pod is eligible to schedule on based on labels on pods that are already running on the node. Here is an example.

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  labels:
    run: nginx-service
  name: nginx-service
spec:
  replicas: 3
  selector:
    matchLabels:
      run: nginx-service
  template:
    metadata:
      labels:
        service-type: nginx
    spec:
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: service-type
                operator: In
                values:
                - nginx
            topologyKey: kubernetes.io/hostname
      containers:
      - name: nginx-service
        image: nginx:latest

Note: I use preferredDuringSchedulingIgnoredDuringExecution here since you have more pods than nodes.

For more detailed information, you can refer to the Inter-pod affinity and anti-affinity (beta feature) part of following link: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/

2
votes

Use Pod Topology Spread Constraints

As of 2021, (v1.19 and up) you can use Pod Topology Spread Constraints topologySpreadConstraints by default and I found it more suitable than podAntiAfinity for this case.

The major difference is that Anti-affinity can restrict only one pod per node, whereas Pod Topology Spread Constraints can restrict N pods per nodes.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-example-deployment
spec:
  replicas: 6
  selector:
    matchLabels:
      app: nginx-example
  template:
    metadata:
      labels:
        app: nginx-example
    spec:
      containers:
      - name: nginx
        image: nginx:latest
      # This sets how evenly spread the pods
      # For example, if there are 3 nodes available,
      # 2 pods are scheduled for each node.
      topologySpreadConstraints:
      - maxSkew: 1
        topologyKey: kubernetes.io/hostname
        whenUnsatisfiable: DoNotSchedule
        labelSelector:
          matchLabels:
            app: nginx-example

For more details see KEP-895 and an official blog post.

0
votes

The scheduler should spread your pods if your containers specify resource request for the amount of memory and CPU they need. See http://kubernetes.io/docs/user-guide/compute-resources/

0
votes

We can use Taint or toleration to avoid pods deployed into an node or not to deploy into a node.


Tolerations are applied to pods, and allow (but do not require) the pods to schedule onto nodes with matching taints.

Taints and tolerations work together to ensure that pods are not scheduled onto inappropriate nodes. One or more taints are applied to a node; this marks that the node should not accept any pods that do not tolerate the taints.

A sample deployment yaml will be like

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  labels:
    run: nginx-service
  name: nginx-service
spec:
  replicas: 3
  selector:
    matchLabels:
      run: nginx-service
  template:
    metadata:
      labels:
        service-type: nginx
    spec:
      containers:
      - name: nginx-service
        image: nginx:latest
      tolerations:
      - key: "key1"
        operator: "Equal"
        value: "value1"
        effect: "NoSchedule"

You can find more information at https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/#:~:text=Node%20affinity%2C%20is%20a%20property,onto%20nodes%20with%20matching%20taints.