33
votes

I am using cloud formation template to build the infrastructure (ECS fargate cluster). Template executed successfully and stack has been created successfully. However, task has failed with the following error:

Task failed ELB health checks in (target-group arn:aws:elasticloadbalancing:eu-central-1:890543041640:targetgroup/prc-service-devTargetGroup/97e3566c8b307abf)

I am not getting what and where to look for this to troubleshoot the issue. as it is fargate cluster, I am not getting how to login to container and execute some health check queries to debug further.

Can someone please help me to guide further on this and help me? Due to this error, I am not even able to access my web app. As ALB won't route the traffic if it is unhealthy.

What I did

After some googling, I found this post: https://aws.amazon.com/premiumsupport/knowledge-center/troubleshoot-unhealthy-checks-ecs/

However, I guess, this is related to EC2 compatibility in fargate. But in my case, EC2 is not there.

If you feel, I can paste the entire template as well.

please help

7
please suggest. stuck up hereuser2315104

7 Answers

17
votes

This is resolved. It was the issue with the following points:

  • Docker container port mapping with host port were incorrect
  • ALB health check interval time was very short. Due to that, ALB was giving up immediately, not waiting for docker container to up and running properly.

after making these changes, it worked properly

9
votes

There are quite a few of different possible reasons for this issue, not only the open ports:

  • Improper IAM permissions for the ecsServiceRole IAM role
  • Container instance security group Elastic Load Balancing load
  • balancer not configured for all Availability Zones Elastic Load
  • Balancing load balancer health check misconfigured
  • Unable to update the service servicename: Load balancer container name or port changed in task definition

Therefore AWS created an own website in order to address the possibilities of this error:

https://docs.aws.amazon.com/en_en/AmazonECS/latest/developerguide/troubleshoot-service-load-balancers.html

Edit: in my case the health check code of my application was different. The default is 200 but you can also add a range such as 200-499.

3
votes

I got this error message because the security group between the ECS service and the load balancer target group was only allowing HTTP and HTTPS traffic.

Apparently the health check happens over some other port and or protocol as updating the security group to allow all traffic on all ports (as suggested at https://docs.aws.amazon.com/AmazonECS/latest/userguide/create-application-load-balancer.html) made the health check work.

1
votes

I had this exact same problem. I was able to get around the issue by:

  1. navigate to EC2 service
  2. then select Target Group in the side panel
  3. select your target group for your load balancer
  4. select the health check tab
  5. make sure the health check for your EC2 instance is the same as the health check in the target group. This will tell your ELB to route its traffic to this endpoint when conducting its health check. In my case my health check path was /health.
0
votes

As mentioned by tschumann above, check the security group around the ECS cluster. If using Terraform, allow ingress to all docker ephemeral ports with something like below:

resource "aws_security_group" "ecs_sg" {
  name    = "ecs_security_group"
  vpc_id  = "${data.aws_vpc.vpc.id}"

}

resource "aws_security_group_rule" "ingress_docker_ports" {
  type              = "ingress"
  from_port         = 32768
  to_port           = 61000
  protocol          = "-1"
  cidr_blocks       = ["${data.aws_vpc.vpc.cidr_block}"]
  security_group_id = "${aws_security_group.ecs_sg.id}"
}
0
votes

Possibly helpful for someone.. our target group health check path was set to /, which for our services pointed to Swagger and worked well. After updating to use Springfox instead of manually generating swagger.json, / now performs a 302 redirect to /swagger-ui.html, which caused the health check to fail. Since this was for a Spring Boot service we simply pointed the health check path in the target group to /health instead (OOTB Spring status page).

0
votes

Let me share my experience.

In my case everything was correct, except the host on which the server listens, it was localhost which makes the server not reachable from the outside world and respectively the health check didn't work. It should be 0.0.0.0 or empty in some libraries.