How long will it take for an instance to be removed when using capacity provider with ECS?

Question

I have enabled capacity provider for ECS cluster and had set min, max and desired count on EC2 auto-scaling group as 1,3 and 2 respectively. Also I have enabled auto-scaling for ECS task with min, max and desired count as 2,6 and 2 respectively.

each of these two task were launched on two separate instance when I deployed the whole setup using terraform, when the load test was run, ECS task and EC2 instance were successfully scaled-out to 6 and 3. But after the load test got completed ECS task were scaled-in but EC2-instance is still not removed.

Also does the target_capacity in managed_scaling indicate the threshold used to create auto scaling policy for EC2 cluster?

 resource "aws_autoscaling_group" "asg" {
  ...
  min_size             = var.asg_min_size
  max_size             = var.asg_max_size
  desired_capacity     = var.asg_desired_capacity
  protect_from_scale_in = true
  tags = [
    {
      "key"                 = "Name"
      "value"               = local.name
      "propagate_at_launch" = true
    },
    {
      "key"                 = "AmazonECSManaged"
      "value"               = ""
      "propagate_at_launch" = true 
    }
  ]
}

resource "aws_ecs_capacity_provider" "capacity_provider" {
   name = local.name

   auto_scaling_group_provider {
      auto_scaling_group_arn         = aws_autoscaling_group.asg.arn
      managed_termination_protection = "ENABLED"

      managed_scaling {
           maximum_scaling_step_size = 4
           minimum_scaling_step_size = 1
           status                    = "ENABLED"
           target_capacity           = 70
      }
   }

  
   provisioner "local-exec" {
      when = destroy

      command = "aws ecs put-cluster-capacity-providers --cluster ${self.name} --capacity-providers [] --default-capacity-provider-strategy []"
   }
}

resource "aws_ecs_cluster" "cluster" {
  name      = local.name
  capacity_providers = [
    aws_ecs_capacity_provider.capacity_provider.name,
  ]
  tags = merge(
    {
      "Name"        = local.name,
      "Environment" = var.environment,
      "Description" = var.description,
      "Service"     = var.service,
    },
    var.tags
  )
}

ydaetskcoR ydaetskcoR · Accepted Answer · 2021-06-22T22:20:35

You've set a target capacity of 70 for your capacity provider so the capacity provider doesn't want to go over that utilisation.

When you have 2 tasks running on 2 different instances then you would have 100% utilisation as capacity is calculated by how many instances are running non daemonset tasks. So if you have a target capacity of anything less than 100 it won't want to scale in to leave only non empty instances available.

If you had a target capacity of 60 and your ASG was allowed to scale to a max of 4 instances it would still attempt to scale out to that and leave 2 empty instances because only having 1 instance available would leave the capacity at 66.6 which is higher than that lower target capacity.

This AWS deep dive blog post on ECS capacity providers is a good read if you are just starting to use capacity providers. I think I must have read it about a dozen times when I started to use them and I still find the scaling mechanism slightly unusual. It does solve some key issues for us around naive ASG scaling based on ECS memory/CPU reservation though. It also allows scale to zero if you don't want any spare capacity (eg set a target capacity of 100) and don't mind waiting for instances to scale out.

How long will it take for an instance to be removed when using capacity provider with ECS?

1 Answers