Imagine this hypothetical situation (that just bit me in practice):
- All worker instances in a Kubernetes cluster die (say due to a spot price fluctuations), and a new one comes back automatically.
- The scheduler then attempts to schedule pods onto the node in some arbitrary order but they can't all fit because the number of nodes is smaller than before.
- All
default
namespace pods make it on but thekube-system
namespace DNS pod doesn't - Now most everything trying to run on the cluster is hung because there's no DNS on the cluster.
Is there any way to use the QoS tiers in Kubernetes to get the scheduler to proritize scheduling the kube-system
pods before the other namespaces? Or is there some other way I should be fixing this problem?