I have an issue with HPA configuration, based on HTTP requests rate. I am using a rate based on a Prometheus metric - sum(rate(http_server_requests_seconds_count[5m]))
- but at start-up HPA is auto-scaling to the maximum number of pods despite no HTTP requests being received. See extract below from kubectl describe hpa showing that it is scaling on the metric and this happens within seconds of the deployment.
Normal SuccessfulRescale 23m (x4 over 128m) horizontal-pod-autoscaler New size: 2; reason: pods metric rate_5m_http_server_requests_seconds_count above target
Normal SuccessfulRescale 23m (x4 over 128m) horizontal-pod-autoscaler New size: 3; reason: pods metric rate_5m_http_server_requests_seconds_count above target
Is it possible to tell Kubernetes not to scale for the first N seconds/minutes or is there another way around this problem?
--horizontal-pod-autoscaler-tolerance
flag can help you. – Eduardo Baitello5000
thinking it was the total requests over 5 minutes, not the rate over the last 5 minutes. It even auto-scaled with this value so I am not sure it is even checking the value when doing the auto-scaling. – James Hargreaves