I have a Cloudwatch Alarm which receives data from a Canary. My canary attempts to visit a website, and if the website is up and responding, then the datapoint is 0, if the server returns some sort of error then the datapoint is 1. Pretty standard canary stuff I hope. This canary runs every 30 minutes.
My Cloudwatch alarm is configured as follows:
With the expected behaviour that if my canary cannot reach the website 3 times in a row, then the alarm should go off.
Unfortunately, this is not what's happening. My alarm was triggered with the following canary data:
- Feb 8 @ 7:51 PM (MST)
- Feb 8 @ 8:22 PM (MST)
- Feb 8 @ 9:52 PM (MST)
How is it possible that these three datapoints would trigger my alarm?
My actual email was received as follows:
You are receiving this email because your Amazon CloudWatch Alarm "...." in the US West (Oregon) region has entered the ALARM state, because "Threshold Crossed: 3 out of the last 3 datapoints [1.0 (09/02/21 04:23:00), 1.0 (09/02/21 02:53:00), 1.0 (09/02/21 02:23:00)] were greater than or equal to the threshold (1.0) (minimum 3 datapoints for OK -> ALARM transition)." at "Tuesday 09 February, 2021 04:53:30 UTC".
I am even more confused because the times on these datapoints do not align. If I convert these times to MST, we have:
- Feb 8 @ 7:23 PM
- Feb 8 @ 7:53 PM
- Feb 8 @ 9:23 PM
The time range on the reported datapoints is a two hour window, when I have clearly specified my evaluation period as 1.5 hours.
If I view the "metrics" chart in cloudwatch for my alarm it makes even less sense:
The points in this chart as shown as:
- Feb 9 @ 2:30 UTC
- Feb 9 @ 3:00 UTC
- Feb 9 @ 4:30 UTC
Which, again, appears to be a 2 hour evaluation period.
Help? I don't understand this.
How can I configure my alarm to fire if my canary cannot reach the website 3 times in a row (waiting 30 minutes in-between checks)?