Python Celery Autoscaling Kubernetes

Question

I'm currently using AWS EBS but switching over to Kubernetes for managing our celery worker pool. According to the Celery documentation when you want to stop a celery worker there are two types of signals that can be sent to the worker. One being TERM and another being KILL. When TERM is used it waits for the worker to finish what it is currently executing before stopping. When KILL is used to stops it immediately and this can cause the loss of tasks. My question is with Kubernetes autoscaling when scaling down the worker pool, how do I ensure TERM is sent to the workers rather than KILL? I was experiencing this scale down problem with AWS EBS also where Celery flower was showing the loss of tasks when scaling down.

I also have the same issue! I have auto scaled celery workers' deployment based on number of messages in task queue. As soon as all the messages get consumed by workers it starts to scale down and the celery workers still have uncompleted tasks running in them. Any solution to this ? — sap

David Maze David Maze · Accepted Answer · 2020-08-05T11:12:01

The chain of events here is:

The HorizontalPodAutoscaler decreases the replicas: on the Deployment it controls.
The Deployment decreases the replicas: on its corresponding ReplicaSet.
The ReplicaSet deletes Pods as required.
Kubernetes sends each container SIGTERM and waits for it to shut down.
If the container hasn't already exited, it sends each container SIGKILL.

The shutdown sequence in particular is described in Termination of Pods. Your process will get SIGTERM, and then SIGKILL 30 seconds later if it hasn't already exited. Nothing is special about the HPA deleting a pod as opposed to any other path.

Python Celery Autoscaling Kubernetes

1 Answers