I am using Terraform to set up a small Fargate cluster of three apache server tasks. The tasks hang on pending, and then the cluster stops them and creates new pending tasks, and the cycle continues.
The AWS docs say it could be because:
- The Docker daemon is unresponsive
The docs say to setup CloudWatch to see CPU usage and increase container size if needed. I have upped both the CPU/memory to 1024/2048, which didn't fix the problem.
- The Docker image is large
Unlikely? The image is nothing but httpd:2.4
- The ECS container agent lost connectivity with the Amazon ECS service in the middle of a task launch
The docs provide some commands to run in the container instance. To do this it looks like I have to either set up AWS Systems Manager or SSH in directly. I will take this route if I can't find any problems with my Terraform config.
- The ECS container agent takes a long time to stop an existing task
Unlikely because I am launching a completely new ECS cluster
Below are the ECS and IAM sections of my Terraform file. Why might my Fargate tasks be stuck on pending?
#
# ECS
#
resource "aws_ecs_cluster" "main" {
name = "main-ecs-cluster"
}
resource "aws_ecs_task_definition" "app" {
family = "app"
network_mode = "awsvpc"
requires_compatibilities = ["FARGATE"]
cpu = 256
memory = 512
execution_role_arn = aws_iam_role.task_execution.arn
task_role_arn = aws_iam_role.task_execution.arn
container_definitions = <<DEFINITION
[
{
"image": "httpd:2.4",
"cpu": 256,
"memory": 512,
"name": "app",
"networkMode": "awsvpc",
"portMappings": [
{
"containerPort": 80,
"hostPort": 80,
"protocol": "tcp"
}
]
}
]
DEFINITION
}
resource "aws_ecs_service" "main" {
name = "tf-ecs-service"
cluster = aws_ecs_cluster.main.id
task_definition = aws_ecs_task_definition.app.arn
desired_count = 2
launch_type = "FARGATE"
network_configuration {
security_groups = [aws_security_group.main.id]
subnets = [
aws_subnet.public1.id,
aws_subnet.public2.id,
]
}
}
#
# IAM
#
resource "aws_iam_role" "task_execution" {
name = "my-first-service-task-execution-role"
assume_role_policy = data.aws_iam_policy_document.task_execution.json
}
data "aws_iam_policy_document" "task_execution" {
statement {
actions = ["sts:AssumeRole"]
principals {
type = "Service"
identifiers = ["ecs-tasks.amazonaws.com"]
}
}
}
resource "aws_iam_role_policy_attachment" "task_execution" {
role = aws_iam_role.task_execution.name
policy_arn = "arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy"
}
Stopped reason: Task failed to start
– brietsparksawsvpc
check if you enabled public ip for them (assuming you run your ecs service in public subnet). Also which container instances are you referent to? fargate does not have them for you to login or execute any commands on them. – Marcin