Azure Storage and Conditional Replace/Merge

Question

In Azure table storage's batch save operation, is there an efficient way to replace certain properties on an entity if it already exists; but update all properties for new entities?

Here is the scenario that I am talking about.

I have an entity called Order

public class Order : TableEntity
{
    public Order(String department, string orderId)
    {
        this.PartitionKey = department;
        this.RowKey = orderId;
    }

    public DateTime CreatedOn {get; set;}
    public String CreatedBy {get; set;}

    public DateTime UpdatedOn {get; set;}
    public String UpdatedBy {get; set;}

    //Class contains other properties which could add up to 1MB
}

Scenario

Azure Table Storage has order entities with RowKeys [0..100]
My API receives an upsert request for orders with RowKeys [50..150].
In a single batch transaction I need to update certain properties on orders [50-100] and create new order entities [101-150] on azure.
Note: On the existing orders [50..100], all properties except the CreatedOn, CreatedBy, PartitionKey and RowKey needs to be updated.

Can I do it in a single step without reading contents from the Table Store?

Here is a one way to do it (very rough pseudo code)

function Upsert(Dictionary<String, Order> ordersInput)
{
    //1. Read existing ordersInput from database
    var existingOrders = Retrieve(ordersInput.Values);

    //2. Update 'ordersInput' with existing data
    foreach(var existingOrder in existingOrders)
    {
        if(ordersInput.ContainsKey(existingOrder.RowKey)
        {
            ordersInput[existingOrder.RowKey].CreatedOn = existingOrder.CreatedOn;
            ordersInput[existingOrder.RowKey].CreatedBy = existingOrder.CreatedBy;
        }
    }

    //Save all merged orders to Azure
    SaveToAzure(existingOrders);
}

The issue I have with the above approach is that each order entity is 1 MB in size and reading all entities bogs down the API save operation.

Is there a more efficient way to perform the conditional merge entirely on azure?

I was also thinking about doing a batch insert in the following way

Do a batch Insert on all orders
If the previous step fails with "The specified entity already exists", then create two new batches (one with existing row keys and other with new ones) and take care of each one individually

(The above approach sounds hacky and I think it might cause a bunch of concurrency issues)

Gaurav Mantri Gaurav Mantri · Accepted Answer · 2014-02-17T03:10:49

Azure Table Storage natively support Upsert operation via InsertOrReplace and InsertOrMerge functions. InsertOrReplace will replace an entity completely with the new entity if it exists otherwise it will create a new one. InsertOrMerge will change the properties of an existing entity which are present in the new entity if the entity with the same PartitionKey/RowKey exists otherwise it will create a new one.

UPDATE

Here's an alternate approach. A few things you can do:

Query Projection: Query projection allows you to fetch only certain attributes of an entity. So when you're fetching existing entities, you only fetch PartitionKey and RowKey so that you can determine if an entity already exists or not. This would considerably reduce the response payload.
Nullable fields: Since you don't want to update CreatedOn and CreatedBy fields for existing entities, you would need to make them nullable especially CreatedOn.

Based on this, you would first fetch the existing entities. Retrieve operation would only return PartitionKey and RowKey (using query projection technique). Then you would loop through the ordersInput and see if the entity exists. If the entity exists, then you will set CreatedOn and CreatedBy as null and mark that entity to be Merged. If the entity does not exist, you will set all the properties and mark that entity to be Inserted. Then you will send this batch request to table service.

Azure Storage and Conditional Replace/Merge

1 Answers