I'm currently using AWS EBS but switching over to Kubernetes for managing our celery worker pool. According to the Celery documentation when you want to stop a celery worker there are two types of signals that can be sent to the worker. One being TERM and another being KILL. When TERM is used it waits for the worker to finish what it is currently executing before stopping. When KILL is used to stops it immediately and this can cause the loss of tasks. My question is with Kubernetes autoscaling when scaling down the worker pool, how do I ensure TERM is sent to the workers rather than KILL? I was experiencing this scale down problem with AWS EBS also where Celery flower was showing the loss of tasks when scaling down.
2
votes
1 Answers
4
votes
The chain of events here is:
- The HorizontalPodAutoscaler decreases the
replicas:on the Deployment it controls. - The Deployment decreases the
replicas:on its corresponding ReplicaSet. - The ReplicaSet deletes Pods as required.
- Kubernetes sends each container SIGTERM and waits for it to shut down.
- If the container hasn't already exited, it sends each container SIGKILL.
The shutdown sequence in particular is described in Termination of Pods. Your process will get SIGTERM, and then SIGKILL 30 seconds later if it hasn't already exited. Nothing is special about the HPA deleting a pod as opposed to any other path.