I have implemented the Job Observer Pattern using SQS and ECS. Job descriptions are pushed to the SQS queue for processing. The job processing run on an ECS Cluster within an Auto-Scaling Group running ECS Docker Tasks.
Each ECS Task does:
- Read message from SQS queue
- Execute job on data (~1 hour)
- Delete message
- Loop while there are more messages
I would like to scale down the cluster when there is no more work for each Instance, eventually to zero instances.
Looking at this similar post, the answers suggest scale-in would need to be handled outside of ASG in some way. Instances would self-scale-in, either by explicitly self-terminating or by toggling ASG Instance Protection off when there are no more messages.
This also doesn't handle the case of running multiple ECS Tasks on a single instance, as an individual task shouldn't terminate if other Tasks are running in parallel.
Am I limited to self scale-in and only one Task per Instance? Any way to only terminate once all ECS Tasks on an instance have exited? Any other scale-in alternatives?