12
votes

From the docs:

No matter what value you set for how to treat missing data, when an alarm evaluates whether to change state, CloudWatch attempts to retrieve a higher number of data points than specified by Evaluation Periods. The exact number of data points it attempts to retrieve depends on the length of the alarm period and whether it is based on a metric with standard resolution or high resolution. The timeframe of the data points that it attempts to retrieve is the evaluation range.

The docs go on to give an example of an alarm with 'EvaluationPeriods' and 'DatapointsToAlarm' set to 3. They state that Cloudwatch chooses the 5 most recent datapoints. Part of my question is, Where are they getting 5? It's not clear from the docs.

The second part of my question is, why have this behavior at all (or at least, why have it by default)? If I set my evaluation period to 3, my Datapoints to Alarm to 3, and tell Cloudwatch to 'TreatMissingData' as 'breaching,' I'm going to expect 3 periods of missing data to trigger an alarm state. This doesn't necessarily happen, as illustrated by an example in the docs.

1

1 Answers

2
votes

I had the same questions. As best I can tell, the 5 can be explained if I am thinking about standard collection intervals vs standard resolution correctly. In other words, if we assume a standard collection interval of 5 minutes and a standard 1-minute resolution, then within the 5 minutes of the collection interval, 5 separate data points are collected. The example states you need 3 data points over 3 evaluation periods, which is less than the 5 data points CloudWatch has collected. CloudWatch would then have all the data points it needs within the 5-data-point evaluation range defined by a single collection. As an example, if 4 of the 5 expected data points are missing from the collection, you have one defined data point and thus need 2 more within the evaluation range to reach the three needed for alarm evaluation. These 2 (not the 4 that are actually missing from the collection) are considered the "missing" data points in the documentation - I find this confusing. The tables in the AWS documentation provide examples for how the different treatments of the "missing" 2 data points impact the alarm evaluations.

Regardless of whether this is the correct interpretation, this could be better explained in the documentation.