In Azure table storage's batch save operation, is there an efficient way to replace certain properties on an entity if it already exists; but update all properties for new entities?
Here is the scenario that I am talking about.
I have an entity called Order
public class Order : TableEntity
{
public Order(String department, string orderId)
{
this.PartitionKey = department;
this.RowKey = orderId;
}
public DateTime CreatedOn {get; set;}
public String CreatedBy {get; set;}
public DateTime UpdatedOn {get; set;}
public String UpdatedBy {get; set;}
//Class contains other properties which could add up to 1MB
}
Scenario
- Azure Table Storage has order entities with RowKeys [0..100]
- My API receives an upsert request for orders with RowKeys [50..150].
- In a single batch transaction I need to update certain properties on orders [50-100] and create new order entities [101-150] on azure.
- Note: On the existing orders [50..100], all properties except the CreatedOn, CreatedBy, PartitionKey and RowKey needs to be updated.
Can I do it in a single step without reading contents from the Table Store?
Here is a one way to do it (very rough pseudo code)
function Upsert(Dictionary<String, Order> ordersInput)
{
//1. Read existing ordersInput from database
var existingOrders = Retrieve(ordersInput.Values);
//2. Update 'ordersInput' with existing data
foreach(var existingOrder in existingOrders)
{
if(ordersInput.ContainsKey(existingOrder.RowKey)
{
ordersInput[existingOrder.RowKey].CreatedOn = existingOrder.CreatedOn;
ordersInput[existingOrder.RowKey].CreatedBy = existingOrder.CreatedBy;
}
}
//Save all merged orders to Azure
SaveToAzure(existingOrders);
}
The issue I have with the above approach is that each order entity is 1 MB in size and reading all entities bogs down the API save operation.
Is there a more efficient way to perform the conditional merge entirely on azure?
I was also thinking about doing a batch insert in the following way
- Do a batch Insert on all orders
- If the previous step fails with "The specified entity already exists", then create two new batches (one with existing row keys and other with new ones) and take care of each one individually
(The above approach sounds hacky and I think it might cause a bunch of concurrency issues)