I am trying to read a xlsx file from an Azure blob storage to a pandas dataframe without creating a temporary local file. I have seen many similar questions, e.g. Issues Reading Azure Blob CSV Into Python Pandas DF, but haven't managed to get the proposed solutions to work.
Below code snippet results in a UnicodeDecodeError: 'utf-8' codec can't decode byte 0x87 in position 14: invalid start byte
exception.
from io import StringIO
import pandas as pd
from azure.storage.blob import BlobClient, BlobServiceClient
blob_client = BlobClient.from_blob_url(blob_url = url + container + "/" + blobname, credential = token)
blob = blob_client.download_blob().content_as_text()
df = pd.read_excel(StringIO(blob))
Using a temporary file, I do manage to make it work with the following code snippet:
blob_service_client = BlobServiceClient(account_url = url, credential = token)
blob_client = blob_service_client.get_blob_client(container=container, blob=blobname)
with open(tmpfile, "wb") as my_blob:
download_stream = blob_client.download_blob()
my_blob.write(download_stream.readall())
data = pd.read_excel(tmpfile)