Auto-scaling ECS Cluster to/from zero instances

Question

I have implemented the Job Observer Pattern using SQS and ECS. Job descriptions are pushed to the SQS queue for processing. The job processing run on an ECS Cluster within an Auto-Scaling Group running ECS Docker Tasks.

Each ECS Task does:

Read message from SQS queue
Execute job on data (~1 hour)
Delete message
Loop while there are more messages

I would like to scale down the cluster when there is no more work for each Instance, eventually to zero instances.

Looking at this similar post, the answers suggest scale-in would need to be handled outside of ASG in some way. Instances would self-scale-in, either by explicitly self-terminating or by toggling ASG Instance Protection off when there are no more messages.

This also doesn't handle the case of running multiple ECS Tasks on a single instance, as an individual task shouldn't terminate if other Tasks are running in parallel.

Am I limited to self scale-in and only one Task per Instance? Any way to only terminate once all ECS Tasks on an instance have exited? Any other scale-in alternatives?

Can you check if the instance is executing a job with a simple application installed on your instances? For example by getting the CPU/memory utilization? — Mahdi

at0mzk at0mzk · Accepted Answer · 2016-10-19T08:38:51

You could use CloudWatch Alarms with Actions:

detect and terminate worker instances that have been idle for a certain period of time

Auto-scaling ECS Cluster to/from zero instances

3 Answers