We're using Grafana to monitor certain events and fire alarms. The data is stored in Prometheus (but we're not using the Prometheus Alert Manager).
Last night we had an issue with one of our metrics that we currently do not have an alarm on. I would like to add one, but I'm struggling to determine the best way to do so.
In this case, the Y axis for this metric is pretty low, and overnight (02:00-07:00 on the left of the graph) you can see the metric drops near to zero.
We'd like to detect the sharp drop on the right hand side at 8pm. We detected the drop to completely zero at ~9pm (the flatline), but I'd like to identify the sudden drop.
Our prometheus query is:
sum(rate({__name__=~"metric_name_.+"}[1m])) by (grouping)
I've tried looking at a few things like:
sum(increase({__name__=~"metric_name_.+"}[1m])) by (grouping)
But they broadly all end up with a similar looking graph to the one below, but with a variance on the Y-axis scale and make it tricky to differentiate between "near zero & quiet" and "near zero because the metrics have dropped off a cliff".
What combination of Grafana and Prometheus settings can we use to identify this change effectively?
