It is clear from the documentation that whenever pods are in Pending state because there is no node that has enough free resources to respect the pods resource request - the cluster autoscaler will create another node within 30 seconds of the pod creation (for reasonably sized clusters).
However, consider the case that a node is pretty packed. Let's say the node has 2 CPU cores and it contains 4 pods that define 0.5 CPU request and 1.0 CPU limit. Suddenly there is load, and all 4 pods are suddenly requesting an additional 0.5 CPU that the node is not able to give since all of it's CPU is already taken by the 4 running pods.
In this situation, I'd expect Kubernetes to 'understand' that there are Pending resource requests by running pods that cannot be served and 'move' (destroy and create) those pods to another node that can respect their request (plus the resources they are currently using). In case no such node exists - I'd expected Kubernetes to create an additional node and move the pods there.
However, I don't see this happening. I see that the pods are running on the same node (I guess that node can be called over-provisioned) regardless of resource requests that cannot be respected and performance suffers as a result.
My question is whether this behaviour is avoidable by any means apart from setting the ratio between pod resource requests and limits to 1:1 (where a pod cannot request more resources than initially allocated). Obviously I would to avoid setting requests and limits to be the same to avoid under-provisioning and pay for more than I need.