1
votes

We are looking to move approximately 1 PB of data to Archive Storage. One of the options we have been considering is GPV2 storage with Standard Performance and Archive access. I have some questions about pricing. I used the pricing calculator at:

https://azure.microsoft.com/en-ca/pricing/calculator/

The data that will be archived is large image files as well as contract documents. From my reading, block blobs would probably be optimum for archival. Is this right?

In a simple scenario, I would like to upload a 100 MB imagery file. I would have to have an empty blob in a container and use set blob tier to set it to either hot or cool. I would then copy the file and then use set blob tier again to change it to archive. How would Azure handle the copy? How many blocks would the file be broken into? I have read that each putblock will be counted as one transaction and the final PutBlocklist will be counted as one transaction. How many operations will be required? If the price is $1.10 for 100,000 operations what kind of cost can I estimate? Also, what would be the cost for changing the tier from hot or cool to archive?

The more expensive task is reading the data. After 180 days and the client wants to read the data. The blob within the container would have to be set from archive to hot or cool right? It will then take time to rehydrate the data. What will be the cost? How will the file be handled when reading the data and how many get blob operations will be required? What other operations will be needed. Looking at the Azure pricing site, it says the cost is $55.00 for a 100,000 operations. As reading is supposed to be much more expensive, I suppose there will be a large number of operations.

For organizing the data containers would be necessary. Any help on container creation/deletion costs would be helpful.

1

1 Answers

0
votes

Too many questions :). Let me try to answer them.

From my reading, block blobs would probably be optimum for archival. Is this right?

Yes. AFAIK, only block blobs are supported for archive tier.

In a simple scenario, I would like to upload a 100 MB imagery file. I would have to have an empty blob in a container and use set blob tier to set it to either hot or cool. I would then copy the file and then use set blob tier again to change it to archive.

You don't really have to do that. With Storage REST API version 2019-02-02, you can directly upload blob into archive tier. No need for uploading them in hot or cool tier and then change access tier to archive.

How many blocks would the file be broken into?

It depends. Maximum size of a block in a block blob can be 100 MB. Essentially the block size would depend on your Internet speed. I believe SDKs use 4MB block size.

I have read that each putblock will be counted as one transaction and the final PutBlocklist will be counted as one transaction. How many operations will be required?

Total operations required = number of blocks + 1 commit block list operation. So if you have a 100MB blob and you split them in 4MB blocks, total operations required will be 25 (100 MB/4MB) + 1 = 26.

If the price is $1.10 for 100,000 operations what kind of cost can I estimate?

In the example above it would be $1.10 * 26 / 10000 = $0.000286 for a single blob.

Also, what would be the cost for changing the tier from hot or cool to archive?

Please see my 2nd answer. You don't really need to do that.

The more expensive task is reading the data. After 180 days and the client wants to read the data. The blob within the container would have to be set from archive to hot or cool right?

That's correct.

It will then take time to rehydrate the data. What will be the cost? How will the file be handled when reading the data and how many get blob operations will be required? What other operations will be needed. Looking at the Azure pricing site, it says the cost is $55.00 for a 100,000 operations. As reading is supposed to be much more expensive, I suppose there will be a large number of operations.

You can find this information on Azure Storage Pricing page. Pricing depends on the region where data is stored so it basically would vary.

For organizing the data containers would be necessary. Any help on container creation/deletion costs would be helpful.

Again you can find this information on storage pricing page. Container creation is a single operation and I believe container deletion is a free operation.