Since this seems one of the most popular questions on SO about Azure transient handling, I'll add this answer here.
Entity Framework does indeed have resiliency code built in (per Adam's answer)
BUT:
1) You must add code to activate it, manually
public class MyConfiguration : DbConfiguration
{
public MyConfiguration()
{
this.SetExecutionStrategy(
"System.Data.SqlClient",
() => new SqlAzureExecutionStrategy());
this.SetTransactionHandler(
SqlProviderServices.ProviderInvariantName,
() => new CommitFailureHandler());
}
...
The first method call activates retries, the second call sets a handler to avoid duplicate updates when retries happen.
Note: This class will be found and instantiated automatically, as discussed here: https://msdn.microsoft.com/en-us/library/jj680699(v=vs.113).aspx. Just make sure the class is in the same assembly as your DbContext class and has a public constructor with no parameters.
2) The built-in SqlAzureExecutionStrategy is not good enough. It doesn't cover all the transient errors. This is not surprising when you consider that the SQL Server team is working independently of Entity Framework, so they are unlikely to ever be completely in synch on what transient errors are possible. It's also difficult to figure that out yourself.
The solution we used, backed by a suggestion from another software company, is to create our own Execution Strategy, which retries every SqlException and TimeoutException, except for a few that we whitelist as not worth retrying (such as permission denied).
public class WhiteListSqlAzureExecutionStrategy : DbExecutionStrategy
{
public WhiteListSqlAzureExecutionStrategy()
{
}
protected override bool ShouldRetryOn(Exception exception)
{
var sqlException = exception as SqlException;
// If this is an SqlException then we want to always retry
// Unless the all the exception types are in the white list.
// With those errors there is no point in retrying.
if (sqlException != null)
{
var retry = false;
foreach (SqlError err in sqlException.Errors)
{
// Exception white list.
switch (err.Number)
{
// Primary Key violation
// https://msdn.microsoft.com/en-us/library/ms151757(v=sql.100).aspx
case 2627:
// Constraint violation
case 547:
// Invalid column name, We have seen this happen when the Snapshot helper runs for a column 'CreatedOn'
// This is not one of our columns and it appears to be using our execution strategy.
// An invalid column is also something that probably doesn't get resolved by retries.
case 207:
break;
// The server principal "username" is not able to access the database "dbname" under the current security context
// May occur when using restricted user - Entity Framework wants to access master for something
// probably not transient
case 916:
break;
// XXX permission denied on object. (XXX = select, etc)
// Should not occur if db access is correct, but occurred when using restricted user - EF accessing __MigrationHistory
case 229:
break;
// Invalid object name 'xxx'.
// Occurs at startup because Entity Framework looks for EdmMetadata, an old table
// (Perhaps only if it can't access __MigrationHistory?)
case 208:
break;
default:
retry = true;
break;
}
}
return retry;
}
if (exception is TimeoutException)
{
return true;
}
return false;
}
}
3) There used to be a kind of bug where EF would run the retries N^2 times instead of N, which made for much longer delays than you'd expect. (It's supposed to take up to about 26 seconds, but the bug made it take minutes.) However, this isn't so bad because in reality SQL Azure regularly has unavailability for more than a minute :(
https://entityframework.codeplex.com/workitem/2849
4) If you haven't been doing so already, you really need to dispose of your DbContext after it's used. It seems this is the point that the CommitFailureHandler runs it's purging to tidy up the __TransactionHistory table; if you don't dispose, this table will grow forever (although see next point).
5) You should probably call ClearTransactionHistory somewhere in your startup or in a background thread, to clear any leftovers in __TransactionHistory.