Kubernetes HPA and Scaling Down

Question

I have a kubernetes HPA set up in my cluster, and it works as expected scaling up and down instances of pods as the cpu/memory increases and decreases.

The only thing is that my pods handle web requests, so it occasionally scales down a pod that's in the process of handling a web request. The web server never gets a response back from the pod that was scaled down and thus the caller of the web api gets an error back.

This all makes sense theoretically. My question is does anyone know of a best practice way to handle this? Is there some way I can wait until all requests are processed before scaling down? Or some other way to ensure that requests complete before HPA scales down the pod?

I can think of a few solutions, none of which I like:

Add retry mechanism to the caller and just leave the cluster as is.
Don't use HPA for web request pods (seems like it defeats the purpose).
Try to create some sort of custom metric and see if I can get that metric into Kubernetes (e.x https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/#support-for-custom-metrics)

Any suggestions would be appreciated. Thanks in advance!

Jonas Jonas · Accepted Answer · 2019-11-11T15:32:59

Graceful shutdown of pods

You must design your apps to support graceful shutdown. First your pod will receive a SIGTERM signal and after 30 seconds (can be configured) your pod will receive a SIGKILL signal and be removed. See Termination of pods

SIGTERM: When your app receives termination signal, your pod will not receive new requests but you should try to fulfill responses of already received requests.

Design for idempotency

Your apps should also be designed for idempotency so you can safely retry failed requests.

Kubernetes HPA and Scaling Down

1 Answers

Graceful shutdown of pods

Design for idempotency