2
votes

For some unknown reason today and only for about 10 minutes, one of our Azure web apps had a meltdown related to the availability of its Azure SQL database. The web app gave a YSOD that said "the wait operation timed out." Elmah showed other errors like "login failed" and "A severe error occurred on the current command. The results, if any, should be discarded."

Looking at the app and database load I don't really see why. The database was set at an S0 level, and we increased it to S1, but even that seems strange. We rarely have more than 50% DTU utilization on the database, and the web app peaks at like 35 simultaneous requests. Overall it just doesn't seem like a load that should be outside of the capabilities of the S0 database.

All that sucks, yeah, but the biggest question is how do I even troubleshoot this? It's clearly a DB issue, but given the low load I don't know why. I certainly don't want to upgrade to a $300+ monthly Premium level for an app of this size.

Is there logging I can set up to figure this out? Some way to look back at what happened and draw definitive conclusions on how to prevent it from happening again?

1
This happens to my app occasionally. I can't see why. I have a RetryPolicy configured, and upping the default SqlConnetion ConnectionTimeout (in the connection strings) seemed to help a bit. Ultimately, I don't think MS guarantee that connection to the db will always be made and you can't prevent it happening again.Neil Thompson
What's strange is that the app has been running with very few issues until about 3/17, at which point I started having recurring but intermittent issues dozens of times a day.Josh Anderson
Also interesting is that on some browsers, the issue will continue to occur until cookies for that site are cleared. This sounds like it may be related to ServiceStack and how it handles caching.Josh Anderson
Could be to do with instance affinity? azure.microsoft.com/en-gb/blog/… I turned instance affinity off on my apps and it reduced the error count a lot.Neil Thompson

1 Answers

2
votes

Tools > Options > Designers > Table and Database Designers

Make sure the checkbox at the top for "Override connection string time-out value for table designer updates:" is checked, and increase the transaction timeout as needed.

Alternately uncheck the box and specify the timeout in your connection string.

I had this same issue where it would get up to enabling indexes but always fail on the same one. The default is 30 seconds, I bumped mine up to 900 (15 minutes) and let it run, which it did successfully. Probably only needed a little bit more than 30 seconds but oh well.