Trying to understand the meaning of averageUtilization in Kubernetes autoscaling

Question

The docs says:

For per-pod resource metrics (like CPU), the controller fetches the metrics from the resource metrics API for each Pod targeted by the HorizontalPodAutoscaler. Then, if a target utilization value is set, the controller calculates the utilization value as a percentage of the equivalent resource request on the containers in each Pod. If a target raw value is set, the raw metric values are used directly. The controller then takes the mean of the utilization or the raw value (depending on the type of target specified) across all targeted Pods, and produces a ratio used to scale the number of desired replicas.

Assume I have a Pod with:

        resources:
          limits:
            cpu: "0.3"
            memory: 500M
          requests:
            cpu: "0.01"
            memory: 40M

and now I have an autoscaling definition as:

type: Resource
resource:
  name: cpu
  target:
    type: Utilization
    averageUtilization: 60

Which according to the docs:

With this metric the HPA controller will keep the average utilization of the pods in the scaling target at 60%. Utilization is the ratio between the current usage of resource to the requested resources of the pod

So, I'm not understanding something here. If request is the minimum resources required to run the app, how would scaling be based on this value? 60% of 0.01 is nothing, and the service would be constantly scaling.

Mandraenke Mandraenke · Accepted Answer · 2021-10-26T10:47:13

Your misunderstanding might be that the value of request is not necessarily the minimum your app need to run.

It is what you (the developer, admin, DevOps) request from the Kubernetes cluster for a pod in your application to run and it helps the scheduler to pick the right node for your workload (say the in one that has sufficient resources available). So, don't pick this value too small or too high.

Apart from that, autoscaling works as you described it. In this case, the cluster calculates how much of your requested CPU is used and will scale out when more than 60% are in use. Keep in mind, that Kubernetes does not look at every single pod but on the average of all pods in that group.

For example, given two pods running, one pod could run on 100% of requests and the other one at (almost) 0%. The average would be around 50% so no autoscaling happens in the case of the Horizontal Pod Autoscaler.

In production, I personally try to do a guess on the right values and then look at the metrics and adjust the values to my real-world workload. Prometheus is your friend or at least the metrics server:

https://github.com/prometheus-operator/kube-prometheus https://github.com/kubernetes-sigs/metrics-server

Trying to understand the meaning of averageUtilization in Kubernetes autoscaling

1 Answers