python azure blob storage md5 check fails on blob upload using put_block_blob_from_path

Question

I am trying to upload a blob to azure blob storage with python sdk. I want to pass the MD5 hash for validation on the server side after upload.

Here's the code:

blob_service.put_block_blob_from_path(
        container_name='container_name',
        blob_name='upload_dir/'+object_name,
        file_path=object_name,
        content_md5=object_md5Hash
)

But I get this error:

AzureHttpError: The MD5 value specified in the request did not match with the MD5 value calculated by the server.

The file is ~200mb and the error throws instantly. Does not upload the file. So I suspect that it may be comparing the supplied hash with perhaps the hash of the first chunk or something.

Any ideas?

Emily Gerner Emily Gerner · Accepted Answer · 2016-01-21T17:15:30

This is sort of an SDK bug in that we should throw a better error message rather than hitting the service, but validating the content of a large upload that has to be chunked simply doesn't work. x_ms_blob_content_md5 will store the md5 but the service will not validate it. That is something you could do on download though. content_md5 is validated by the server for the body of a particular request but since there's more than one with chunked blobs it will never work.

So, if the blob is small enough (below BLOB_MAX_DATA_SIZE) to be put in a single request, content_md5 will work fine. Otherwise I'd simply recommend using HTTPS and storing MD5 in x_ms_blob_content_md5 if you think you might want to download with HTTP and validate it on download. HTTPS already provides validation for things like bit flips on the wire so using it for upload/download will do a lot. If you can't upload/download with HTTPS for one reason or another you can consider chunking the blob yourself using the put block and put block list APIs.

FYI: In future versions we do intend to add automatic MD5 calculation for both single put and chunked operations in the library itself which will fully solve this. For the next version, we will add an improved error message if content_md5 is specified for a chunked download.

python azure blob storage md5 check fails on blob upload using put_block_blob_from_path

3 Answers