2
votes

In GCP Monitoring and alerting, I was trying to add an alert policy of API request count in sum within one day. (maps-backend.googleapis.com)

My metrics setting images

As the above attached picture shows, I can see correct value and chart on my metrics. At any point my value is above 500, but the trigger of threshold 300 doesn't trigger at all. But when I set my threshold to 100, I got an alert email that told me the value was like one hundred something, which means my notification channel is working fine.

Here is my settings for my metrics and alert condition:

Resource type: Consumed API
Metric: Request count

Filter
credential_id: XXXXXX
credential_id: XXXXXX
service: maps-backend.googleapis.com

Aggregator: sum

In Advanced Aggregation
Aligner: sum
Alignment Period: 1 day

Configuration
Condition triggers if: Any time series violates
Condition: is above
Threshold: 300
For: most recent value

How to correctly monitor and alert API request count/usage in sum within one day?

1
Once a policy has triggered an alert, an unresolved incident appears. It never gets resolved because the aggregated number of API requests shown on your graph never falls below the threshold. This incident prevents the policy from triggering subsequent incidents and sending subsequent alerts regarding the same incident. As soon as you resolve it manually, alerting policy gets back to the "armed condition", and gets ready to trigger the next incident and notification. - mebius99
Thanks for answering! However, first i set threshold to 100, and I got and alert mail. Then I set threshold back to 300, I got an email that told me the problem is resolved. I also tried with new alert policy with threshold 300 and it doesnt trigger either. I think the problem is that even though I can see my value is way above 300, but actually the monitoring value is always below 300. (which I believed is always around 100-200) - Yao
Each time you change alerting configuration, it triggers once based on the new condition, but after triggering it remains silent until you resolve the linked incident. Regarding the values and thresholds I guess it is all about effects that Alignment brings as described here: Monitoring > Doc > Selecting metrics > Additional configuration > Alignment. The absolute values depend on the alignment period, at least for the "sum" aligner. Perhaps it makes sense to experiment with the Alignment settings. - mebius99
Thanks a lot! However, I still cant fix my problem mentioned at the begining. As an alternitive, I monitor and alert it by Consumer Quota, Quota limit, my service name, and sum the count together during 1 day interval, and it works! - Yao
Nice to hear that you managed to find a workaround. If the workaround has proved that is does what you need, it would be nice if you posted an answer here and shared detailed settings with the community. - mebius99

1 Answers

0
votes

As a workaround, it is possible to monitor and alert it by Consumer Quota, Quota limit, the service name, then summarize during the daily interval. This way it does what is needed.