0
votes

Terraform destroy does not destroy the spot instances it created, when same tf file runs with different workspaces

I have a requirement for my CI infrastructure to create and destroy the ECS cluster with spot EC2 instances to build and test my code. For this, I am using terraform to create aws_spot_fleet_request with a set of launch configurations. Since I will be having multiple branches in repo, I will be running this terraform with uniq workspace name for each branches. When two branches apply my terraform, it creates unique instances based on the workspace correctly. However, when one of it comes to the stage of running the terraform destroy, it just waits for destroying and ends up with error below

ws_spot_fleet_request.arm_ct_spot_resource: Still destroying... (ID: , 3m30s elapsed) aws_spot_fleet_request.arm_ct_spot_resource: Still destroying... (ID: , 3m40s elapsed) aws_spot_fleet_request.arm_ct_spot_resource: Still destroying... (ID: , 3m50s elapsed) aws_spot_fleet_request.arm_ct_spot_resource: Still destroying... (ID: , 4m0s elapsed) aws_spot_fleet_request.arm_ct_spot_resource: Still destroying... (ID: , 4m10s elapsed) aws_spot_fleet_request.arm_ct_spot_resource: Still destroying... (ID: , 4m20s elapsed) aws_spot_fleet_request.arm_ct_spot_resource: Still destroying... (ID: , 4m30s elapsed) aws_spot_fleet_request.arm_ct_spot_resource: Still destroying... (ID: , 4m40s elapsed) aws_spot_fleet_request.arm_ct_spot_resource: Still destroying... (ID: , 4m50s elapsed) aws_spot_fleet_request.arm_ct_spot_resource: Still destroying... (ID: , 5m0s elapsed) Releasing state lock. This may take a few moments...

Error: Error applying plan:

1 error(s) occurred:

  • aws_spot_fleet_request.arm_ct_spot_resource (destroy): 1 error(s) occurred:

  • aws_spot_fleet_request.arm_ct_spot_resource: error deleting spot request (): fleet still has (1) running instances

Terraform does not automatically rollback in the face of errors. Instead, your Terraform state file has been partially updated with any resources that successfully completed. Please address the error above and apply again to incrementally change your infrastructure.

resource "aws_spot_fleet_request" "arm_ct_spot_resource" {
  iam_fleet_role                      = "${aws_iam_role.fleet.arn}"
  target_capacity                     = "${var.instance_count}"
  terminate_instances_with_expiration = true
  allocation_strategy                 = "lowestPrice"
  wait_for_fulfillment                = true

  launch_specification {
    instance_type               = "t3.2xlarge"
    ami                         = "${data.aws_ami.ecs_agent_image.id}"
    vpc_security_group_ids      = ["${aws_security_group.security_group_sg.id}"]
    subnet_id                   = "${element(data.terraform_remote_state.environment_state.vpc_service_subnet_ids_2, 0)}"
    iam_instance_profile        = "${aws_iam_instance_profile.arm_iam_profile.name}"
    associate_public_ip_address = true
    key_name                    = "${var.key_name}"
    weighted_capacity           = 1

    # Tags defined in locals only.
    tags = "${merge(
      local.common_tags,
      map(
        "Name", "aws_instance for ${var.environment_id}"
      )
    )}"

    root_block_device {
      volume_size = "${var.disk_size}"
    }

    user_data = <<EOF
#!/bin/bash
echo ECS_CLUSTER="${aws_ecs_cluster.arm_cluster.name}" >> /etc/ecs/ecs.config
EOF
  }

  launch_specification {
    instance_type               = "c5.9xlarge"
    ami                         = "${data.aws_ami.ecs_agent_image.id}"
    vpc_security_group_ids      = ["${aws_security_group.security_group_sg.id}"]
    subnet_id                   = "${element(data.terraform_remote_state.environment_state.vpc_service_subnet_ids_2, 0)}"
    iam_instance_profile        = "${aws_iam_instance_profile.arm_iam_profile.name}"
    associate_public_ip_address = true
    key_name                    = "${var.key_name}"
    weighted_capacity           = 4

    # Tags defined in locals only.
    tags = "${merge(
      local.common_tags,
      map(
        "Name", "aws_instance for ${var.environment_id}"
      )
    )}"

    root_block_device {
      volume_size = "${var.disk_size}"
    }

    user_data = <<EOF
#!/bin/bash
echo ECS_CLUSTER="${aws_ecs_cluster.arm_cluster.name}" >> /etc/ecs/ecs.config
EOF
  }
}
1

1 Answers

0
votes

Sounds like you are deploying multiple terraform resources to the same shared component. If you destroy one terraform resource, and there are other resources still deployed that use that component then you wont be able to destroy as expected. This is typical with EC2 resources that use a shared security group, if your try to destroy a security group that is still attached to an EC2 the destroy will fail.