36
votes

I've been reading other questions on how to implement if-exists-insert-else-update semantics in EF, but either I'm not understanding how the answers work, or they are in fact not addressing the issue. A common solution offered is to wrap the work in a transaction scope (eg: Implementing if-not-exists-insert using Entity Framework without race conditions):

using (var scope = new TransactionScope()) // default isolation level is serializable
using(var context = new MyEntities())
{
    var user = context.Users.SingleOrDefault(u => u.Id == userId); // *
    if (user != null)
    {
        // update the user
        user.property = newProperty;
        context.SaveChanges();
    }
    else
    {
        user = new User
        {
             // etc
        };
        context.Users.AddObject(user);
        context.SaveChanges();
    }
}

But I fail to see how this solves anything, as for this to work, the line I have starred above should block if a second thread tries to access the same user ID, unblocking only when the first thread has finished its work. Using a transaction will not cause this however, and we'll get an UpdateException thrown due to the key violation that occurs when the second thread attempts to create the same user for a second time.

Instead of catching the exception caused by the race condition, it would be better to prevent the race condition from happening in the first place. One way to do this would be for the starred line to take out an exclusive lock on the database row that matches its condition, meaning that in the context of this block, only one thread at a time could work with a user.

It seems that this must be a common problem for users of the EF, so I'm looking for a clean, generic solution that I can use everywhere.

I'd really like to avoid using a stored procedure to create my user if possible.

Any ideas?

EDIT: I tried executing the above code concurrently on two different threads using the same user ID, and despite taking out serializable transactions, they were both able to enter the critical section (*) concurrently. This lead to an UpdateException being thrown when the second thread attempted to insert the same user ID that the first had just inserted. This is because, as pointed out by Ladislav below, a serializable transaction takes exclusive locks only after it has begun modifying data, not reading.

4
May be interesting: SQL Server Application Locksuser166390
Very nice, had no idea that existed.Mike Chamberlain
Actually, most of the time the default transaction level is READ COMMITTED. Maybe that's where you're going wrong?Sora2455

4 Answers

20
votes

When using serializable transaction SQL Server issues shared locks on read records / tables. Shared locks doesn't allow other transactions modifying locked data (transactions will block) but it allows other transactions reading data before the transaction which issued locks start modifying data. That is the reason why the example doesn't work - concurrent reads are allowed with shared locks until the first transaction starts modifying data.

You want isolation where select command locks the whole table exclusively for a single client. It must lock the whole table because otherwise it will not solve concurrency for inserting "the same" record. Granular control for locking records or tables by select commands is possible when using hints but you must write direct SQL queries to use them - EF has no support for that. I described approach for exclusively locking that table here but it is like creating sequential access to the table and it affects all other clients accessing this table.

If you are really sure that this operation happens just in your single method and there are not other applications using your database you can simply place the code into critical section (.NET synchronization for example with lock) and ensure on the .NET side that only single thread can access critical section. That is not so reliable solution but any playing with locks and transaction levels has a big impact on the database performance and throughput. You can combine this approach with optimistic concurrency (unique constraints, timestamps, etc).

3
votes

Just to add my way, not that it really deals with the annoyance of exceptions being thrown and transactions not quite cutting it as a scalable solution but it does avoid race conditions from causing problems where lock type solutions are not possible (easily managed) such as in distributed systems.

I very simply use the exception and try the insert first. I use a modification of your original code as an example:

using(var context = new MyEntities())
{
    EntityEntry entityUser = null;
    try 
    {
        user = new User
        {
             // etc
        };
        entityUser = context.Users.Add(user);
        context.SaveChanges(); // Will throw if the entity already exists
    } 
    catch (DbUpdateException x)
    when (x.InnerException != null && x.InnerException.Message.StartsWith("Cannot insert duplicate key row in object"))
    {
        if (entityUser != null)
        {
            // Detach the entity to stop it hanging around on the context
            entityUser.State = EntityState.Detached;
        }
        var user = context.Users.Find(userId);
        if (user != null) // just in case someone deleted it in the mean time
        {
            // update the user
            user.property = newProperty;
            context.SaveChanges();
        }
    }
}

It's not pretty, but it works and might be of use to someone.

1
votes

Maybe I am missing something, but when I simulate the example above in the SQL Management Studio, this is working as expected.

Both Serializable transactions check if the userId exists, and acquire range locks on the specified selection.

Assuming this userId does not exist, both transactions try to insert a new record with the userId - which is not possible. Because of their Serializable isolation level, both transactions cannot insert a new record into the users table because this would introduce phantom reads for the other transaction.

So this situation results in a deadlock because of the range locks. You'll end up with an deadlock and one transaction shall be victimized, the other one will succeed.

Is Entity Framework handling this any different? I suspect you'd end up with an UpdateException with a nested SqlException identifying the deadlock.

0
votes

You can change transaction isolation level using TransactionOptions for TransactionScope to more stricter (I guess, for your case it is RepeatableRead or Serializable), but remember that any locks decrease scalability.

Does it really matter to provide such level of concurrency control? Will your application be used in same cases in production environment? Here is good post by Udi Dahan about race conditions.