k8s - how scheduler assigns the nodes

Question

I am just curious to know how k8s master/scheduler will handle this.

Lets consider I have a k8s master with 2 nodes. Assume that each node has 8GB RAM and each node running a pod which consumes 3GB RAM.

node A - 8GB
   - pod A - 3GB
node B - 8GB
   - pod B - 3GB

Now I would like to schedule another pod, say pod C, which requires 6GB RAM.

Question:

Will the k8s master shift pod A or B to other node to accommodate the pod C in the cluster or will the pod C be in the pending status?
If the pod C is going to be in pending status, how to use the resources efficiently with k8s?

Unfortunately I could not try this with my minikube. If you know how k8s scheduler assigns the nodes, please clarify.

Diego Mendes Diego Mendes · Accepted Answer · 2019-02-22T10:35:57

Most of the Kubernetes components are split by responsibility and workload assignment is no different. We could define the workload assignment process as Scheduling and Execution.

The Scheduler as the name suggests will be responsible for the Scheduling step, The process can be briefly described as, "get a list of pods, if it is not scheduled to a node, assign it to one node with capacity to run the pod". There is a nice blog post from Julia Evan here explaining Schedulers.

And Kubelet is responsible for the Execution of pods scheduled to it's node. It will get a list of POD Definitions allocated to it's node, make sure they are running with the right configuration, if not running start then.

With that in mind, the scenario you described will have the behavior expected, the POD will not be scheduled, because you don't have a node with capacity available for the POD.

Resource Balancing is mainly decided at scheduling level, a nice way to see it is when you add a new node to the cluster, if there are no PODs pending allocation, the node will not receive any pods. A brief of the logic used to Resource balancing can be seen on this PR

The solutions,

Kubernetes ships with a default scheduler. If the default scheduler does not suit your needs you can implement your own scheduler as described here. The idea would be implement and extension for the Scheduler to ReSchedule PODs already running when the cluster has capacity but not well distributed to allocated the new load.

Another option is use tools created for scenarios like this, the Descheduler is one, it will monitor the cluster and evict pods from nodes to make the scheduler re-allocate the PODs with a better balance. There is a nice blog post here describing these scenarios.

PS: Keep in mind that the total memory of a node is not allocatable, depending on which provider you use, the capacity allocatable will be much lower than the total, take a look on this SO: Cannot create a deployment that requests more than 2Gi memory

k8s - how scheduler assigns the nodes

2 Answers