I need to port a whole partition of records in a table in Azure Table Storage from Partition1 to Partition2. Thousands, if not millions.
I know there is no way to port an entity from one partition to another in Azure Table Storage, you need to delete the old one and insert a new one, with updated PartitionKey, so my task is about doing the same for many records.
Is there something standard?
I came up with the following solution (simplified):
public async Task Migrate(string oldPartition, string newPartition)
{
TableContinuationToken token = null;
List<Task> migrationTasks = new List<Task>();
do
{
TableQuerySegment<MyTableEntity> entries = await GetEntriesSegment(
oldPartition,
token);
token = entries.ContinuationToken;
migrationTasks.Add(MigrateEntries(entries, newPartition));
} while (token != null)
await Task.WhenAll(migrationTasks);
}
private async Task MigrateEntries(IEnumerable<MyTableEntity> entries, string newPartition)
{
await Task.WhenAll(
InsertInBatches(entries.Select(
e => GetEntryWithUpdatedPartitionKey(e, newPartition)),
DeleteInBatches(entries));
}
GetEntriesSegmentwraps the logic to access the table and get the segmentGetEntryWithUpdatedPartitionKeysimply copies all fields from one object ofMyTableEntitytype into a newly created one, but using differentPartitionKeyInsertInBatchestakes care of splitting the collection of entries into batches of 100 (Azure Table Storage limitation) and performing batch inserts for all in parallel (via one moreawait Task.WhenAll(insertTasks)inside)DeleteInBatchestakes care of splitting the collection of entries into batches of 100 (Azure Table Storage limitation) and performing batch deletes for all in parallel (via one moreawait Task.WhenAll(deleteTasks)inside)
My main goal is to parallel everything. I.e., new entries should be read while already read ones are being deleted and new ones are being inserted.
Does this solution look reasonable? Do you know any proven by time (well tested, used in production projects) alternative?