I am interested in listening/reacting to the event, that a service cannot start start a task because of insufficient cpu or memory. This information can be viewed in the console, if i chose the specific service and look in its "Events" tab. There, an event like the following would be displayed:
"service X was unable to place a task because no container instance met all of its requirements. The closest matching container-instance Y has insufficient CPU units available. For more information, see the Troubleshooting section."
The container instances in the cluster are managed in an AutoScalingGroup, so the appropriate action would be to react to this event, by scaling in an additional instance, which would then allow the task to be scheduled to run. Now, my problem is, how do i react to this event?
I have a LogGroup that contains data from the following files from all the EC2 instances in the cluster:
- /var/log/dmesg
- /var/log/messages
- /var/log/docker
- /var/log/ecs/ecs-init.log.*
- /var/log/ecs/ecs-agent.log.*
(The EC2 instances are based on amazon-ecs-optimized images)
Initially, i thought i that the "service X was unable to place a task..." message would appear in one of these log files (more specifically in the ecs-agent.log or ecs-init.log), but that was not the case.
I then realized that "ECS Evenets" is a thing (see more at http://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs_cwe_events.html). But unfortunately, this specific event, is not one that is supported by the "ECS Events". Only: Container Instance State Change Events and Task State Change Events. NOT "Service State Change Events". Even though, one would think that the events from the "Events" tab in the service would be streamed as well, they are not. I came to realize the documentation even says that:
"You can use Amazon ECS event stream for CloudWatch Events to receive near real-time notifications regarding the current state of both the container instances within an Amazon ECS cluster, and the current state of all tasks running on those container instances."
And thereby, "Amazon ECS Event Stream for CloudWatch Events" is not steaming service events (and thereby not events for tasks that are prevented from running). I really hope that "Service State Change Events" would be included in the future, that way i could make a CloudWatch Event Rule that matches this event, triggers a Lambda function which would then determine if the event was an event of type "service X was unable to place a task...", and based on that, manipulate the AutoScalingGroup to scale in an additional instance to the cluster.
But as stated, this is not supported at the moment. Is there any other way that i can "listen" for this event? I even thought about running a lambda every 2-3 minutes that uses the CLI to invoke "aws ecs describe-services --service X" to output the list of events, and then match on the "service X was unable to place a task..." event. But that just seems wrong...
Any help is very appreciated. Thanks!