0
votes

I have some stateless applications where I want one pod to be scheduled on each node (limited by a node selector). If I have 3 nodes in the cluster and one goes down then I should still have 2 pods (one on each node).

This is exactly what DaemonSets do, but DaemonSets have a couple of caveats to their usage (such as not supporting node draining, and tools such as Telepresence not supporting them). So I would like to emulate the behaviour of DaemonSets using Deployments.

My first idea was to use horizontal pod autoscaler with custom metrics, so the desired replicas would be equal to the number of nodes. But even after implementing this, it still wouldn't guarantee that one pod would be scheduled per node (I think?).

Any ideas on how to implement this?

2
the only way to really gurantee is to have a custom scheduler that will do that. I don't think this is hackable with affinities or HPA. BTW not supporting node draining how is that? I thought this is a flag in kubectl only (--ignore-daemonsets). EDIT: I guess you refer to graceful shutdown. - Thomas Jungblut
@ThomasJungblut Could you expand on why HPA + affinities wouldn't work? - bcoughlan
they simply do not guarantee this behaviour. They work, but mind your actual requirements here. - Thomas Jungblut
Just wanted to add this as a possible solution to the problem github.com/kubernetes-incubator/cluster-proportional-autoscaler - bcoughlan

2 Answers

1
votes

Design for Availability

If I have 3 nodes in the cluster and one goes down then I should still have 2 pods (one on each node).

I understand this as that you want to design your cluster for Availability. So the most important thing is that your replicas (pods) is spread on different nodes, to reduce the effect if a node goes down.

Schedule pods on different nodes

Use PodAntiAffinity and topologyKey for this.

deploy the redis cluster so that no two instances are located on the same host.

See Kubernetes documentation: Never co-located in the same node and the ZooKeeper High Availability example

1
votes

You can consider the below combination

  1. HPA to update the replicas to number of nodes based on custom metrics. I think you have already done this
  2. Use node Affinity and AntiAffinity to run only one pod on each node