HPA auto-scaling at deployment based on HTTP requests count

Question

I have an issue with HPA configuration, based on HTTP requests rate. I am using a rate based on a Prometheus metric - sum(rate(http_server_requests_seconds_count[5m])) - but at start-up HPA is auto-scaling to the maximum number of pods despite no HTTP requests being received. See extract below from kubectl describe hpa showing that it is scaling on the metric and this happens within seconds of the deployment.

Normal  SuccessfulRescale  23m (x4 over 128m)   horizontal-pod-autoscaler  New size: 2; reason: pods metric rate_5m_http_server_requests_seconds_count above target
Normal  SuccessfulRescale  23m (x4 over 128m)   horizontal-pod-autoscaler  New size: 3; reason: pods metric rate_5m_http_server_requests_seconds_count above target

Is it possible to tell Kubernetes not to scale for the first N seconds/minutes or is there another way around this problem?

Maybe, tweaking the value of kube-controller-manager --horizontal-pod-autoscaler-tolerance flag can help you. — Eduardo Baitello
Thanks @EduardoBaitello but I don't believe that would help. Initially I accidentally set the expected value to 5000 thinking it was the total requests over 5 minutes, not the rate over the last 5 minutes. It even auto-scaled with this value so I am not sure it is even checking the value when doing the auto-scaling. — James Hargreaves
Note, I found a few bugs which seem to be related: github.com/kubernetes/kubernetes/issues/72775 github.com/kubernetes/kubernetes/issues/84142 — James Hargreaves
It certainly looks like a bug to me. It looks to be the same bug as reported above though based on different metrics. — James Hargreaves

Unknown Unknown · Accepted Answer · 2019-11-12T09:12:07

As mentioned by @James in the comments it is a bug that is being tracked here and here.

I am posting this as a community wiki for better visibility.

HPA auto-scaling at deployment based on HTTP requests count

1 Answers