1
votes

I am using Terraform to set up a small Fargate cluster of three apache server tasks. The tasks hang on pending, and then the cluster stops them and creates new pending tasks, and the cycle continues.

The AWS docs say it could be because:

  • The Docker daemon is unresponsive

The docs say to setup CloudWatch to see CPU usage and increase container size if needed. I have upped both the CPU/memory to 1024/2048, which didn't fix the problem.

  • The Docker image is large

Unlikely? The image is nothing but httpd:2.4

  • The ECS container agent lost connectivity with the Amazon ECS service in the middle of a task launch

The docs provide some commands to run in the container instance. To do this it looks like I have to either set up AWS Systems Manager or SSH in directly. I will take this route if I can't find any problems with my Terraform config.

  • The ECS container agent takes a long time to stop an existing task

Unlikely because I am launching a completely new ECS cluster


Below are the ECS and IAM sections of my Terraform file. Why might my Fargate tasks be stuck on pending?

#
# ECS
#
resource "aws_ecs_cluster" "main" {
  name = "main-ecs-cluster"
}

resource "aws_ecs_task_definition" "app" {
  family                   = "app"
  network_mode             = "awsvpc"
  requires_compatibilities = ["FARGATE"]
  cpu                      = 256
  memory                   = 512
  execution_role_arn       = aws_iam_role.task_execution.arn
  task_role_arn            = aws_iam_role.task_execution.arn
  container_definitions = <<DEFINITION
  [
    {
      "image": "httpd:2.4",
      "cpu": 256,
      "memory": 512,
      "name": "app",
      "networkMode": "awsvpc",
      "portMappings": [
        {
          "containerPort": 80,
          "hostPort": 80,
          "protocol": "tcp"
        }
      ]
    }
  ]
  DEFINITION
}

resource "aws_ecs_service" "main" {
  name            = "tf-ecs-service"
  cluster         = aws_ecs_cluster.main.id
  task_definition = aws_ecs_task_definition.app.arn
  desired_count   = 2
  launch_type     = "FARGATE"

  network_configuration {
    security_groups = [aws_security_group.main.id]
    subnets         = [
      aws_subnet.public1.id,
      aws_subnet.public2.id,
    ]
  }
}

#
# IAM
#
resource "aws_iam_role" "task_execution" {
  name               = "my-first-service-task-execution-role"
  assume_role_policy = data.aws_iam_policy_document.task_execution.json
}

data "aws_iam_policy_document" "task_execution" {
  statement {
    actions = ["sts:AssumeRole"]

    principals {
      type        = "Service"
      identifiers = ["ecs-tasks.amazonaws.com"]
    }
  }
}

resource "aws_iam_role_policy_attachment" "task_execution" {
  role       = aws_iam_role.task_execution.name
  policy_arn = "arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy"
}
2
Could be many reasons, wrong credentails, no connection to container registry to pull the image (e.g. ecr). If you go to ecs console, go to task or service there should be some message or info on why it fails to launch. Have you check the ecs concole and tasks for any messages?Marcin
In the ECS console I see Stopped reason: Task failed to startbrietsparks
But if you go to details, like on this screenshot usually there is more info there. There is nothing more in your case?Marcin
Since you use awsvpc check if you enabled public ip for them (assuming you run your ecs service in public subnet). Also which container instances are you referent to? fargate does not have them for you to login or execute any commands on them.Marcin
No the tasks are in a private subnet. Look like my options are public subnet, NAT gateway, or pull from an image that exists in an ECR instance. sourcebrietsparks

2 Answers

3
votes

Based on the discussion in the comments it was determined that the issue is caused by the lack of internet access for the Fargate tasks.

This is because the tasks run in a private subnet, while task use httpd image from docker hub. Pulling images from the hub requires internet access.

Possible solutions are use of NAT gateway/instance, using tasks in the public subnet or having custom image in ECR..

2
votes

Public subnet / public IP may not be correct solution for many security reasons.

Consider placing your tasks in private subnets.

  1. You will be able to pull images if you configure connection to the internet through NAT pulling image from ECR using routing through NAT gateway

or you can use BETTER solution:

  1. Your ECS FARGATE can pull images from ECR even if you place in PRIVATE subnet without connection to the internet. Please check AWS PrivateLink for ECR diagram: pulling image from ECS using PrivateLink - VPC endpoints