I have configured an AWS ECS Cluster with 3 instances (m5.large), with one instance across each availability zones (A, B, and C). The Service is configured as follows:
- Service type: REPLICA
- Number of Tasks: 3
- Minimum Healthy Percent: 30
- Maximum Percent: 100
- Placement Templates: AZ Balanced Spread
- Service AutoScaling: No.
In the Task Definition, I have used the following:
- Network Mode: awsvpc
- Task Memory: --
- Task CPU: --
At the container level, I have configured only Memory Soft Limit:
- Soft Limit: 2048 MB
- Hard Limit: --
I have used awslogs for logging. The above configuration works and when I start the service, there is one docker running in each of the instances. The 'docker stats' in one of the instances shows the following:
MEM USAGE / LIMIT
230MiB / 7.501GiB
And the container instance (ECS Console) shows the following:
Resources Registered Available
CPU 2048 2048
Memory 7680 5632
Ports 5 ports
The above results are the same across all the 3 instances -- 2 GB of memory has been reserved (soft limit) and upper memory limit is instance memory of nearly 8 GB (no hard limit set). Everything works as expected so far.
But when I re-deploy the code (using force deploy) from Jenkins, I get the following error in the Jenkins Log:
"message": "(service App-V1-Service) was unable to place a task because no container instance met all of its requirements. The closest matching (container-instance 90d4ba21-4b19-4e31-c42d-d7223b34f17b) has insufficient memory available. For more information, see the Troubleshooting section of the Amazon ECS Developer Guide.
In Jenkins, the job shows up as 'Success', but it is the old version of the code that is running. There is sufficient memory available on all the three instances. Also, the I have changed the Minimum Healthy Percent to 30 hoping that ECS can stop the container and re-delpoy the new one. Any solution or pointers to debug this further will be of great help.