3
votes

I have a mass upload system that I would like to do the following:

  1. Upload a chunk of data to a server that will put it as an uncommitted block on a block blob.
    • The uploader cannot know anything about the block/blob implementation. It just knows it's storing a chunk of data.
    • The server cannot preserve any state between calls either.
  2. Once all the chunks have been uploaded (the uploader sets a flag on the last chunk), the server will:
    1. get the list of uncommitted blocks on the blob (remember, it cannot preserve state, so it can't keep this list in memory) and then
    2. make a call to commit them (PutBlockList). They have to be committed in the proper order.

But the order of the blocks returned from the API is not the order that the documentation says it will be.

According to the Azure storage API site,

The list of uncommitted blocks is returned beginning with the most recently uploaded block to the oldest uploaded block. (https://msdn.microsoft.com/en-us/library/azure/dd179400.aspx , under the Remarks section at the bottom)

However, raw API calls and also the Microsoft Azure Storage SDK are both returning the blocks in alphabetical order, not in the order they were uploaded in any way.

Am I reading the documentation wrong? Could this be a bug in the API? The local storage emulator is also giving the same results.

Thanks!

2

2 Answers

5
votes

We checked things out storage service-side and here's the deal: The docs have a bug. Since day 1 the list of uncommitted blocks has been returned in alphabetical order. We will be updating the MSDN docs as soon as possible to remove the error and we're sorry for any inconvenience!

Here's some ideas for solving your problem:

  1. If you can't preserve any state locally, try in parallel to your put block calls storing the block id in the cloud. I'd recommend using an append blob to store these.
  2. Explore some other blob types.

    • An append blob might be better overall if you want to write data in the order it was uploaded. Append blobs have the same read behavior and throughput as block blobs, but don't allow you to update or delete blocks which have been already been put. But, to append data all you need to do is an appendBlock and that will add to the end of the blob -- no commit needed!

    • Page blobs will also allow you to put data without a commit. Unlike append blobs, they will allow modifications in the middle of the blob. However, they have strict limitations on data length being divisible by 512. So, if that wasn't a natural property of your data you'd need to deal with padding.

    The SDKs have great chunking support for append and page where you can just throw in the data and it will get put. With block of course there's chunking too but state is maintained client side.

  3. Go with the alphabetical ordering property and make your block id's alphabetical. Block id's must be valid base64 strings, less than 64 bytes and be the same length for each block. Then you can use the returned block list as you originally thought of.
0
votes

You can pass incremental blockId(or chunkId) for each call to blob.PutBlock:

var blockCount = 0;
...
var blockId = Convert.ToBase64String(BitConverter.GetBytes(blockCount));
blob.PutBlock(blockId, ms, null);
blocksCount++;

Then by knowing number of transferred blocks commit them:

var blockIds = Enumerable.Range(0, blocksCount).Select(b => Convert.ToBase64String(BitConverter.GetBytes(b)));
blob.PutBlockList(blockIds);