I want to utilize an AZURE Function app to read an XLSX File from an AZURE BLOB Storage. The Function shall be called by a REST API call. I can access the blob and download the file but I'm struggling with reading the content of the file with pandas directly. I'm searching and trying for hours but can't find a solution. My latest approach looks like this:
def main(req: func.HttpRequest) -> func.HttpResponse:
logging.info('Python HTTP trigger function processed a request.')
blob_service_client = BlobServiceClient.from_connection_string(CONNECTION_STRING)
container_client = blob_service_client.get_container_client(CONTAINERNAME)
blob_client = blob_service_client.get_blob_client(container = CONTAINERNAME, blob=BLOBNAME)
blob = BlobClient(ACCOUNT_URL, CONTAINERNAME, BLOBNAME)
#READ PRODUCTS FILE
blob_client.download_blob().readinto(LOCALFILENAME)
df = pd.read_excel(blob_client.download_blob())
On the MS homepage*, there is one example to download a file from a blob and process it afterwards but as I'm utilizing a function app it doesn't make sense to download the file first if I'm not missing anything...
*https://docs.microsoft.com/en-us/azure/storage/blobs/storage-quickstart-blobs-python
The autocomplete for blob gives me the following options but they are also not really helpful:
The error messages are different depending on the way I try to read the file. The current one is:
System.Private.CoreLib: Exception while executing function: Functions.TrainProductModel. System.Private.CoreLib: Result: Failure
Exception: AttributeError: 'str' object has no attribute 'write'
But I think there is actually something fundamentally wrong with my approach. The desired result is to read the file directly into a pandas table.
Appreciating any support as this blocking progress with my master thesis :/
df = pd.read_excel(blob_client.download_blob().readinto())
? I'm not familiar with pandas, nor the blob lib for python, but reading the docs forread_excel
anddownload_blob
, it seems you have to usereadinto()
to download the blob into a stream, whichread_excel()
seems to be accepting as a valid argument for theio
parameter. – eli