1
votes

I am having a cluster on EKS with cluster autoscaler enabled. Lets assume there are 3 nodes node-1,node-2,node3. The nodes can have maximum of 10 pods each. Now when the 31st pod comes into picture the CA will launch a new node and schedule the pod on that. Now maybe lets say the 4 pods from node 2 are not required and they go down. Now according to the requirement if a new pod is launched the scheduler places the new pod on the 4th node (launched by the CA) and not on the second node. Also I want that going down further if the pods are removed from the nodes then the new pods should come into the already existing node and not in a new node put up by CA. I tried updating the EKS default scheduler config file using a scheduler plugin but am unable to do so.

I think we can create a second scheduler but I am not aware of the process properly. Any workaround or suggestions will help a lot.

This is the command: "kube-scheduler --config custom.config" and this is the error "attempting to acquire leader lease kube-system/kube-scheduler..."

This is my custom.config file

apiVersion: kubescheduler.config.k8s.io/v1beta1
clientConnection:
   kubeconfig: /etc/kubernetes/scheduler.conf
kind: KubeSchedulerConfiguration
percentageOfNodesToScore: 100
profiles:
- schedulerName: kube-scheduler-new
  plugins:
   score:
  disabled:
    - name: '*'
  enabled:
    - name: NodeResourcesMostAllocated
1
What behavior do you want that's different from the default scheduler behavior? Do you want to try to pack pods on to the fewest number of nodes, instead of balancing pods across the set of nodes that currently exists?David Maze
yes, if the nodes already are there then new pods should run as much as possible on the existing nodes.Kaustubh
So if nodes 1 and 2 are at 100% capacity, 3 is at 60% capacity, and 4 is at 20% capacity, and the minimum cluster size is 3 nodes, you want to force a new pod on to node 3 instead of 4? (The cluster autoscaler will terminate a node if its utilization is below 50% and all of its pods can be moved elsewhere; so in my example node 4 could be terminated even if a new pod got placed there, but if all four nodes had 70% utilization it wouldn't be.)David Maze
Okay this looks good. I think I missed on this. Thanks for this link. Anyways if we do need to customise the behaviour of default scheduler on eks, can we do it? Or can we add a second scheduler in eks? I have been researching a lot on this lately but I was not able to find any example or solution on this. It would be quite helpful if you can provide any insight on this as well.Kaustubh
I agree with @DavidMaze. Instead of replace the Kubernetes scheduler, I recommend another approach. Ask, do I want to schedule different replicas onto separate nodes or different availability zones? Or do you want them on the same node / AZ? Consider adding a Node affinity or anti-affinity configuration to your deployments. Depending on your needs, there may be other built-in configuration options for Kubernetes primitives that will do what you want. Hope this helps!Joel Van Hollebeke

1 Answers

1
votes

How to manage pods scheduling?

Custom scheduler is, of course one way to go if you have some specific use case but if you just want to have some particular node that you want to schedule the pod into to Kubernetes provides an options to do so.

Scheduling algorithm selection can be broken into two parts:

  • Filtering the list of all nodes to obtain a list of acceptable nodes the pod can be scheduled to.
  • Prioritizing the acceptable nodes and choosing the best one. If multiple nodes have the highest score, round-robin is used to ensure pods are deployed across all of them evenly

Kubernetes works great if you let scheduler decides which nodes the pod should go and it comes with tools that will give scheduler hints:

  • Taints and tolerations can be used to repel some certain pods from the node. This is very useful if you want to partition your cluster and allow some certain people schedule into some specific nodes. Can be also used when you have some hardware nodes and some pod require it (like in your question where you want a pod to be schedule on node 2). They come with 3 effects:
  1. NoSchedule which means there will be no scheduling
  2. PreferNoSchedule which means scheduler will try to avoid scheduling
  3. NoExecute also affects scheduling and affects pods already running on the node. IF you add this taint to node, pods that are running on the node and don't tolerate that will be evicted.
  • Node affinity on the other side can be used to attract some certain pods into specific nodes. Similar to tains node affinity does give me some options for fine tuning your scheduling preferences:

    1. requiredDuringSchedulingIgnoredDuringExecution which can be used as hard requirement and tell scheduler that rules must be met for pod to be scheduled onto node.
    2. preferredDuringSchedulingIgnoredDuringExecution which can be used as soft requirement and tell scheduler to try to enforce it but it does not have to be guaranteed
  • PodAffinity can be used if you for example want your front-end pods to run on the same node as your database pod. This can be in similar way described as hard or soft requirement respectively.

  • podAntiAffinity can be used if you wish not have some certain pod to be running with each other.