1
votes

I'm trying to create alerts on application insights which will alert me if more than 5% of my requests are over a certain threshold. I have written a query in the alerts section of Application Insights and designated it as a Metric measurement to alert on greater than the desired threshold

requests 
| where timestamp >= ago(15m) 
| where (tostring(customDimensions['ProviderName']) == 'ProviderX') 
| where (tostring(customDimensions['operationMethod']) == 'operationX') 
| extend responseTime = tolong(customDimensions['totalMilliseconds']) 
| summarize AggregatedValue = (percentile(responseTime, 95)) by bin(timestamp, 15m)

While this alert works and notifies me correctly there is one issue in that there are a large number of false positives due to the fact that in certain 15 minute windows there are a very small number of requests (less than 3). Thus I only want to alert when the value of the threshold is exceeded AND the number of relevant requests in the time period is over a certain threshold a swell, say 10.

I attempted to do this using a "Number of results Alert" in the alerts section of application insights.

requests
| where timestamp  >= ago(15m)
| where (tostring(customDimensions['ProviderName']) == 'ProviderX')
| where (tostring(customDimensions['operationMethod']) == 'OpeartionX')
| extend responseTime = tolong(customDimensions['totalMilliseconds'])
| summarize hasFailed = ((percentile(responseTime, 95) > 1000) and count() > 135)
| project iff(hasFailed, 1, 0)

What I was trying to achieve is have the alert return 1 if the test failed and then alert on this value. However the "Number of Results" only seems to alert on the count of results returned so this approach is not working either.

If someone could shed some light on a query that would be appropriate or an alternative strategy on how to implement this on Azure I would greatly appreciate it.

Thanks.

1

1 Answers

2
votes

If you would like to use the threshold alerting, I'm you could replace your first query with the following one:

requests 
| where timestamp >= ago(15m) 
| where (tostring(customDimensions['ProviderName']) == 'ProviderX') 
| where (tostring(customDimensions['operationMethod']) == 'operationX') 
| extend responseTime = tolong(customDimensions['totalMilliseconds']) 
| summarize AggregatedValue = iff(count() > 135, percentile(responseTime, 95), 0) by bin(timestamp, 15m)

If you would prefer the "Number of Results Alert" approach, I think that you could replace the last line of your second query with | where hasFailed == true so that you end up with one row when the condition is met and zero rows when it's not.