4
votes

I've read ECS Monitoring documentation, but not found how to alert on ECS Task memory limit hit with CloudWatch Events or Metrics help. I have situation, when ECS container breaks default task hard limit 512 Mb and restarts. CloudWatch Event triggers to ECS Task state change, e.g. from RUNNING to STOPPED, but in event detail "stoppedReason" you may find only "Task failed ELB health checks in ...", despite I definitely know the actual reason was memory limit break and container murder from Docker side. Here is Event Rule Pattern:

{
  "source": [
    "aws.ecs"
  ],
  "detail-type": [
    "ECS Task State Change"
  ],
  "detail": {
    "lastStatus": [
      "STOPPED"
    ]
  }
}

CloudWatch MemoryUtilization Metric for ServiceName dimension doesn't help much either, because the minimum period (range) is 1 minute to trigger alert, but container kill-restart cycle runs quicker. It's not enough time to catch the spike. I guess the same is relevant for the ClusterName dimension (in other words for entire cluster).

I wonder how to get notification about Task (Container, Container Instance) hard memory limit break?

3

3 Answers

1
votes

Alternatively, you can set up alarm using SNS service on cloudwatch metrics to notify yourself when the memory utilization exceeds a limit.

0
votes

Assuming

  • your aim is to identify the fact that the reason for restart was memory usage and not something else.
  • built-in memory metric is not reported at high enough frequency.

You could just write your own custom high resolution metric from inside your ECS, which reports the memory usage say every one second.

0
votes

When you create auto scaling for ecs service select memory utilization as thresh hold and ecs will create cloud watch alarm for the same, then goto cloud watch alarm dashboard and modify that alarm with SNS notification. when memory utilization goes high or low you will get notified.

  • you create the same alarm manually also in cloudwatch dashboard.