0
votes

I have set cloudwatch alarm to trigger SNS mail whenever some keywords are found in cloudwatch logs. (using metric filter)

  1. When those keywords are detected, Alarm state gets changed from insufficient data to alarm & triggers SNS topic
  2. Now, to move from Alarm state alarm to insufficient data it takes time randomly.

Is there any specific way it works, I expect it to come back to Alarm state insufficient data immediately after alarm state.

Any help would be appreciated. Thanks

2
What is the configuration of your alarm? Can you show a picture, or describe it? It would be some statistic (eg Average, Sum, Count) of some metric over some period of time.John Rotenstein
Configuration: For a lambda log group I have created a metric filter. Metric: Cloudwatch alarm has a metric filter which detects specific keywords from cloudwatch logs.Abdul Salam
The alarm would have some statistic (eg Average, Sum, Count) of some metric over some period of time. What are they set to?John Rotenstein
Statistic is Sum and time is 1 minute.Abdul Salam
Well, that means that the alarm would be in the ALARM state if the Sum of the metric count for the past 1 minute exceeds the threshold you requested. If there is no metric sent for a minute, it would return to INSUFFICIENT_DATA.John Rotenstein

2 Answers

1
votes

The alarm has a metric period of 60 seconds and some evaluation period (let suppose 3; total equal 3 * 60 = 3 mints evaluation window). The alarm will be in Alarm state if all the last 3 datapoints at 60 seconds interval are in Alarm State (above the threshold). If any 1 in last 3 datapoint is below threshold then the Alarm will transition to OK. BUT, if the latest all 3 datapoints are missing (say your metric filter did not match and as a result no metric was pushed), the Alarm waits longer than 3 periods to transition to InsufficientData and this is by design to accommodate network delays or processing delay.

https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/AlarmThatSendsEmail.html

0
votes

Came across the same situation, used a period of 1 min and some x > threshold.

The state changes to Alarm immediately whenever the metric exceeds the threshold. But to change back to OK/ Insufficient data takes 6 mins. This happens only for missing data.

As per AWS Support this is the expected behavior of Cloudwatch Alarms, clear explanation can be found here https://forums.aws.amazon.com/thread.jspa?threadID=284182