I've tried a few approaches to this, the docs suggest that there is a way of getting autoscaling to deal with queues (without an external solution) https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale-walkthrough/ but doesn't explain how
I created a deployment which deploys pods that pull from a redis queue (there is a redis service in the cluster). I want to create a system where pods are scaled horizontally to deal with pulling tasks from the queue and executing them. When executing the task can take an unpredictable and variable amount of time.
If pod A pulls a task from the queue and is busy I want to spin up Pod B to pull the next task. At the moment I am using polling so that if the queue is empty the pod in question will just keep trying to pull from the queue.
I've used horizontal pod autoscaling which at least scales out when pod1 is working but because pod2 when running doesn't decrease the average utilization, it just keeps spinning up new pods up to the maximum. For my use case, this is semi-fine, because if the queue is empty, any pods getting an empty queue will contribute to utilization percentage coming down, and in theory when the queue is empty, the excess pods will all spin down... but doesn't feel very efficient, and the problem is that the autoscaler will scale down pods that are in the middle of running jobs.
I've looked at using the newer metrics api, but it seems ill need to create a custom metrics api to implement this which seems extreme for such a simple use case.
I've also looked at using Jobs but this doesn't seem to accommodate autoscaling at all?
I really want to be able to descale based on the CPU utilization for the specific pod that's about to get scaled-down rather than an average of all the pods.