We use WAWS and WA SQL Azure. This morning the Northern Europe data centre suffered an outage for 1hr 50min. Basically we could not access our websites or databases. Now back, although still rumbling on.
I have to admit that I felt a little helpless.
- When would it reappear?
- What has caused it ?
- Who do I contact?
I have a feeling the cause is network related. May be the Load balancer ?
So what can we do when this happens, as usually MS engineers know about these "events" very quickly and are acting on them.
Some ideas I have had are:
1) Put a polite error page up if domain times out. Not sure how to do this. On an autoping service like pingdom or at the domain service where one defines the CNames. We reroute through to Azure. This communication is key to reassuring customers that issue is being sorted, and to prevent blank Azure 503 pages appearing.
2) Better information from Azure team, Decrease act of faith when service will be resumed.
3) Other actions required when this "event" happens.
I am sure this has impacted other Azure customers, and indeed other cloud customers. I suspect some are fellow Northern Europe users, and were impacted this morning like me. So what measures do you put in place to manage this issue, particularly around customer notice web pages which automatically appear.
EDIT1
Update from MS.
++++++++++++++++++++++++++++++++++++++++++++
SQL Databases - North Europe - Partial Performance Degradation
49 mins ago
Starting at 8/6/2014 6:56 UTC a subset of SQL customers may have experienced difficulty accessing their resources. A significant number of these SQL customers have already seen improvement. We have identified a potential root cause, and are working to restore service. The next update will be provided within two hours.
+++++++++++++++++++++++++++++++++++++++++++++
Partial Performance Degradation = no websites, no databases for us !