2
votes

We run an hybrid application that runs on different Azure Roles (2 Web Roles + 2 Worker Roles). Last weekend something went wrong and the service went into "Unresponsive" state and stayed into that state for two days (!) without being rebooted.

We decided than to integrate Azure Application Insights because we cannot stand a 2 days down without even know.

What I'd like to have is kinda of a heartbeat of my application. One of our worker roles has different services running concurrently. I'd like to monitor if this services are running and what are their performance (on a metric defined by me, let say "number of messages processed in a minute").

I would like to receive an alert if this metric, let say, goes down (or up) a threshold. I tried with a small demo application but I couldn't do that.

What I did, with Azure Application Insights API on my C# demo app: 1. inside an infinite loop with a 10 seconds wait after each loop, tracked a StartOperation 2. inside this StartOperation, tracked a TrackMetric passing a random value from 0 to 10 3. checked if everything was working on Azure (and it was) 4. defined an alert saying that an email has to be sent if that metric was less or equal to 1 in five minutes

Nothing arrived, but everything was correctly running. Than I stopped my service, I saw events dropping in Azure, but no alert raised. Is that normal?

How do you check a case like mine?

Thanks Marco

4

4 Answers

2
votes

You might be able to use Application Insights Web Tests functionality to check if the endpoint is available from the different geo regions and alert when it's not.

If all endpoints are authenticated you may expose simple "/ping" endpoint and run web tests against this.

However, it won't work for Worker Roles out of the box unless you register it to accept "/ping" over web protocols (doable for Worker roles, e.g. one can implement a WCF service that way).

2
votes

The problem is that Application Insights custom alerts are currently triggered only upon data arrival.

A strategy we’ve been using when faced with the same problem is having a separate service send out periodically the same metric but with a “zero meaning” value. In our specific case we use an availability metric in which “1” means healthy, whereas “0” means nothing, it is just used to elicit an alert in case there was no “1” sent for the defined duration.

You can use any wide set of possible mechanisms to send out the “0” metric, as long as it is independent of the service you actually want to monitor. You need to make sure they can’t fail at the same time.

Hope this helps, Maxim

1
votes

Don't think that App Insights will let you send an alert on lack of metrics, which is what happens when your instance becomes unresponsive.

If you have the budget for external tools, look into CloudMonix. It'll do exactly what you need with using default configuration (no need for agents, custom code, etc). Disclaimer: I'm affiliated with the product