0
votes

I am running a Google cloud function, which accesses the google cloud storage bucket for an excel file. My goal is to read the file and do some calculations with it, but is it possible without downloading the file to the /tmp folder? Things I have tried:

  storage_client = storage.Client()
  file_obj = storage_client.get_bucket(‘bucketname’).get_blob('filename.xlsx')
  
  excel_file = None
  file_obj.download_to_file(excel_file)
  wb = openpyxl.load_workbook(excel_file)

at first I thought I could attain a file object, but then after I read the error message I realised I was being asked for a file path so I would have to download to the /tmp folder, which is something I would like to avoid.

I also tried download_as_bytes(), but unfortunately openpyxl cannot read bytes.

Any help/hint would be appreciated :)

1
download_as_string?guillaume blaquiere
@guillaumeblaquiere ummm... As I read from the documentation, it's an deprecated alias for download_as_bytes() googleapis.dev/python/storage/latest/blobs.html I tried nevertheless, and it's the same result as download_as_bytes: InvalidFileException: openpyxl does not support b'.xmlpk\x05\x06\x00\x00\x00\x00\x0c\x00\x0c\x00&\x03\x00\x00\x15\x97\x00\x00\x00\x00' file format, please check you can open it with Excel first. Supported formats are: .xlsx,.xlsm,.xltx,.xltmuser9882001
Ok, and not to convert to string your byte and then load it?guillaume blaquiere
@guillaumeblaquiere sorry... I don't understand what you mean... I didn't do any conversion when I do download_as_string or download_as_bytesuser9882001
have you checked this community answer? It explains that openpyxl expects a path or takes a file-like object, so if you have a bytes you can follow the example in that answer to your case. Do you think this will work for you case?Rafael Lemos

1 Answers

1
votes

As you can see in this Community Answer:

In the docs for openpyxl.load_workbook it says:

#:param filename: the path to open or a file-like object

So if you have bytes as the input you can assemble a "proto-object" to satisfy the parameter requirements of openpyxl.load_workbook and it will work, so like the example below:

from io import BytesIO
...
excel_file = None
file_obj.download_as_bytes(excel_file)
wb = load_workbook(filename=BytesIO(excel_file.read()))