Kubernetes HPA - Scale up cooldown

Question

I am running a Kubernetes cluster v1.16(currently newest version on GKE) with HPA that scales the deployments base on custom metrics(Specifically rabbitmq messages count fetched from google cloud monitoring).

The Problem

The deployments scale up very fast to maximum pod count when the message count is temporarily high.

Information

The HPA --horizontal-pod-autoscaler-sync-period is set to 15 seconds on GKE and can't be changed as far as I know.

My custom metrics are updated every 30 seconds.

I believe that what causes this behavior is that when there is a high message count in the queues every 15 seconds the HPA triggers a scale up and after few cycles it reaches maximum pod capacity.

In kubernetes api v1.18 you can control scale up stabilization time, but I can't find a similar feature in v1.16.

My Question

How can I make the HPA scale up more gradually?

Edit 1

Sample HPA of one of my deployments:

apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
  name: my-deployment-hpa
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-deployment
  minReplicas: 6
  maxReplicas: 100
  metrics:
  - type: External
    external:
      metricName: "custom.googleapis.com|rabbit_mq|v1-compare|messages_count"
      metricSelector:
        matchLabels:
          metric.labels.name: production
      targetValue: 500

Tom Klino Tom Klino · Accepted Answer · 2020-10-25T13:51:53

First, a good piece of information to know, is that there is a built-in cooldown in Kubernetes for autoscalers. Quoting from Kubernetes in Action:

Currently, a scale-up will occur only if no rescaling event occurred in the last three minutes. A scale-down event is performed even less frequently—every five minutes. Keep this in mind so you don’t wonder why the autoscaler refuses to perform a rescale operation even if the metrics clearly showthat it should.

It might be that this statement is outdated, but unless it changed, this is hardcoded, and each scale up/down event should not scale more that 100% of existing pods.

That said, you're not out of options either way, here are some approaches you can take:

Pass your custom metric for scaling through a time average function - last time I did this was using prometheus and promql might be different than what you are using, but if you share more configuration in your question, I'm sure I could help find the syntax.
You can try using Keda - It has a cooldownPeriod object that you can place in the ScaledObject custom resource it comes with.

Kubernetes HPA - Scale up cooldown

2 Answers