0
votes

I am trying to upload a blob to azure blob storage with python sdk. I want to pass the MD5 hash for validation on the server side after upload.

Here's the code:

blob_service.put_block_blob_from_path(
        container_name='container_name',
        blob_name='upload_dir/'+object_name,
        file_path=object_name,
        content_md5=object_md5Hash
)

But I get this error:

AzureHttpError: The MD5 value specified in the request did not match with the MD5 value calculated by the server.

The file is ~200mb and the error throws instantly. Does not upload the file. So I suspect that it may be comparing the supplied hash with perhaps the hash of the first chunk or something.

Any ideas?

3

3 Answers

1
votes

This is sort of an SDK bug in that we should throw a better error message rather than hitting the service, but validating the content of a large upload that has to be chunked simply doesn't work. x_ms_blob_content_md5 will store the md5 but the service will not validate it. That is something you could do on download though. content_md5 is validated by the server for the body of a particular request but since there's more than one with chunked blobs it will never work.

So, if the blob is small enough (below BLOB_MAX_DATA_SIZE) to be put in a single request, content_md5 will work fine. Otherwise I'd simply recommend using HTTPS and storing MD5 in x_ms_blob_content_md5 if you think you might want to download with HTTP and validate it on download. HTTPS already provides validation for things like bit flips on the wire so using it for upload/download will do a lot. If you can't upload/download with HTTPS for one reason or another you can consider chunking the blob yourself using the put block and put block list APIs.

FYI: In future versions we do intend to add automatic MD5 calculation for both single put and chunked operations in the library itself which will fully solve this. For the next version, we will add an improved error message if content_md5 is specified for a chunked download.

0
votes

I reviewed the source code of the function put_block_blob_from_path of the Azure Blob Storage SDK. It explained the case in the function comment, please see the content below and refer to https://github.com/Azure/azure-storage-python/blob/master/azure/storage/blob/blobservice.py.

content_md5:

Optional. An MD5 hash of the blob content. This hash is used to verify the integrity of the blob during transport. When this header is specified, the storage service checks the hash that has arrived with the one that was sent. If the two hashes do not match, the operation will fail with error code 400 (Bad Request).

0
votes

I think there're two things going on here.

  • Bug in SDK - I believe you have discovered a bug in the SDK. I looked at the source code for this function on Github and what I found is that when a large blob is uploaded in chunks, the SDK is first trying to create an empty block blob. With block blobs, this is not required. When it creates the empty block blob, it does not send any data. But you're setting content-md5 and the SDK compares the content-md5 you sent with the content-md5 of empty content and because they don't match, you get an error.

To fix the issue in the interim, please modify the source code in blobservice.py and comment out the following lines of code:

    self.put_blob(
        container_name,
        blob_name,
        None,
        'BlockBlob',
        content_encoding,
        content_language,
        content_md5,
        cache_control,
        x_ms_blob_content_type,
        x_ms_blob_content_encoding,
        x_ms_blob_content_language,
        x_ms_blob_content_md5,
        x_ms_blob_cache_control,
        x_ms_meta_name_values,
        x_ms_lease_id,
    )

I have created a new issue on Github for this: https://github.com/Azure/azure-storage-python/issues/99.

  • Incorrect Usage - I noticed that you're passing the md5 hash of the file in content_md5 parameter. This will not work for you. You should actually pass md5 hash in x_ms_blob_content_md5 parameter. So your call should be:
blob_service.put_block_blob_from_path(
        container_name='container_name',
        blob_name='upload_dir/'+object_name,
        file_path=object_name,
        x_ms_blob_content_md5=object_md5Hash
)