4
votes

I have setup Active Geo-Replication for my primary Sql Azure database. How can I be notified that my primary database is unavailable due to datacenter issues so I can begin our application failover procedure? Also, how does Traffic Manager notify failover events?

1

1 Answers

1
votes

With most large scale outages your application connectivity will be impacted so presumably it will surface as an application alert of some sort. So your real question is what else do you need to check to make sure this is a real disaster and the failover is warranted. The answer to that question depends on how your failover process is setup. If it involves a human step, e.g. somebody has to approve the failover because of its impact, you may want to check the alerts in the portal. If it is a major incident of a regional scale in addition to an incident alert you will see your logical server marked as degraded.

If you wan to setup a fully automated process, after receiving an application alert you may want to check the replication connectivity status. You can do that by querying sys.dm_database_copies DMV on the target master or sys.dm_continuous_copy_status on the target database. Both expose is_interlink_connected, which will tell you if the replication link is unhealthy. Note it monitors the overall replication channel's health, not just your specific replication link. If the application receives repeated timeouts from the primary and is_interlink_connected=0 it suggests a outage is likely. But it is not a 100% guarantee and false positives are still possible. Your application target RTO should help you decide how long you can wait before forcing the failover (as a way to eliminate false positives).

Re the last question, there is some information re the monitoring methods here I am not sure there is an actual alert. You may have to poll the end point status in your profile.