0
votes

I have a (private) blob in Azure blob storage that was written through an account that has write and read access to it (it was written through this account by terraform). I am trying to fetch it through Python (without Azure SDK) and I have been unable to.

My request is as follows:

import datetime
import requests


key = ...
secret = ...
now = datetime.datetime.utcnow().strftime('%a, %d %b %Y %H:%M:%S GMT')
# the required settings, as per https://docs.microsoft.com/en-us/rest/api/storageservices/get-blob
headers = {'Authorization': 'SharedKey {}:{}'.format(key, secret),
           'Date': now,
           'x-ms-version': '2018-03-28'
           }

storage_account = ...
container = ...
url = 'https://{}.blob.core.windows.net/{}/terraform.tfstate'.format(storage_account, container)

response = requests.get(url, headers=headers)

print(response.status_code)
print(response.text)

This yields

400
<?xml version="1.0" encoding="utf-8"?><Error>
<Code>OutOfRangeInput</Code><Message>One of the request inputs is out of range. 
RequestId:...
Time:...</Message></Error>

I have validated that this file exists (Storage explorer) and that, when I access it via the console, I get the same URL as the one above, but with extra GET parameters.


For those wondering: the reason I decided not to use Azure SDK for Python: I only need to get a blob and pip install azure[blob] would add 88 dependencies to the project (IMO unacceptably high number for such a simple task).

1

1 Answers

1
votes

So, the reason is that the signature mentioned in the documentation is constructed from the request and is described here in detail.

The Python 3-equivalent of the whole thing is:

import base64
import hmac
import hashlib
import datetime

import requests


def _sign_string(key, string_to_sign):
    key = base64.b64decode(key.encode('utf-8'))
    string_to_sign = string_to_sign.encode('utf-8')
    signed_hmac_sha256 = hmac.HMAC(key, string_to_sign, hashlib.sha256)
    digest = signed_hmac_sha256.digest()
    encoded_digest = base64.b64encode(digest).decode('utf-8')
    return encoded_digest


def get_blob(storage_account, token, file_path):
    now = datetime.datetime.utcnow().strftime('%a, %d %b %Y %H:%M:%S GMT')
    url = 'https://{account}.blob.core.windows.net/{path}'.format(account=storage_account, path=file_path)
    version = '2018-03-28'
    headers = {'x-ms-version': version,
               'x-ms-date': now}

    content = 'GET{spaces}x-ms-date:{now}\nx-ms-version:{version}\n/{account}/{path}'.format(
        spaces='\n'*12,
        now=now,
        version=version,
        account=storage_account,
        path=file_path
    )

    headers['Authorization'] = 'SharedKey ' + storage_account + ':' + _sign_string(token, content)

    response = requests.get(url, headers=headers)

    assert response.status_code == 200
    return response.text

where file_path is of the form {container}/{path-in-container}.

Using this snippet was still superior to add 88 dependencies to the project.