0
votes

I am trying to download a Google Sheets document as a Microsoft Excel document using Python. I have been able to accomplish this task using the Python module googleapiclient.

However, the Sheets document may contain some formulas which are not compatible with Microsoft Excel (https://www.dataeverywhere.com/article/27-incompatible-formulas-between-excel-and-google-sheets/).

When I use the application I created on any Google Sheets document that used any of these formulas anywhere, I get a bogus Microsoft Excel document as output.

I would like to read the cell values in the Google Sheets document before downloading it as a Microsoft Excel document, just to prevent any such errors from happening.

The code I have written thus far is attached below:

import sys
import os
from googleapiclient import discovery
from httplib2 import Http
from oauth2client import file, client, tools

SCOPES = "https://www.googleapis.com/auth/drive.readonly"
store = file.Storage("./credentials/credentials.json")
creds = store.get()

if not creds or creds.invalid:
    flow = client.flow_from_clientsecrets("credentials/client_secret.json",
                                          SCOPES)
    creds = tools.run_flow(flow, store)
DRIVE = discovery.build("drive", "v3", http = creds.authorize(Http()))

print("Usage:     tmp.py <name of the spreadsheet>")

FILENAME = sys.argv[1]
SRC_MIMETYPE = "application/vnd.google-apps.spreadsheet"
DST_MIMETYPE = "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"

files = DRIVE.files().list(
    q = 'name="%s" and mimeType="%s"' % (FILENAME, SRC_MIMETYPE),
    orderBy = "modifiedTime desc,name").execute().get("files", [])

if files:
    fn = '%s.xlsx' % os.path.splitext(files[0]["name"].replace(" ", "_"))[0]
    print('Exporting "%s" as "%s"... ' % (files[0]['name'], fn), end = "")
    data = DRIVE.files().export(fileId=files[0]['id'], mimeType=DST_MIMETYPE).execute()
    if data:
        with open(fn, "wb") as f:
            f.write(data)
        print("Done")
    else:
        print("ERROR: Could not download file")
else:
    print("ERROR: File not found")
2

2 Answers

0
votes

If you want to use python to export something from google docs, then the simplest way is to let googles own server do the job for you.

I was doing a little webscraping on google sheets, and I made this little program which will do the job for you. You just have to insert the id of the document you want to download.

I put in a temporary id, so anyone can try it out.

import requests
ext = 'xlsx' #csv, ods, html, tsv and pdf can be used as well
key = '1yEoHh7WL1UNld-cxJh0ZsRmNwf-69uINim2dKrgzsLg'
url = f'https://docs.google.com/spreadsheets/d/{key}/export?format={ext}'
res = requests.get(url)
with open(f'file.{ext}', 'wb') as f:
    f.write(res.content)

That way conversion will most certainly always be correct, because this is the same a clicking the export button inside the browser version of google sheets.

If you are planning to work with the data inside python, then I recommend using csv format instead of xlsx, and then create the necessary formulas inside python.

0
votes

I think the gspread library might be what you are looking for. https://gspread.readthedocs.io/en/latest/

Here's a code sample:

import tenacity
import gspread
from oauth2client.service_account import ServiceAccountCredentials

@tenacity.retry(wait=tenacity.wait_exponential()) # If you exceed the Google API quota, this waits to retry your request
def loadGoogleSheet(spreadsheet_name):        
    # use creds to create a client to interact with the Google Drive API

    print("Connecting to Google API...")
        
    scope = [
    'https://spreadsheets.google.com/feeds',
    'https://www.googleapis.com/auth/drive'
    ]
    creds = ServiceAccountCredentials.from_json_keyfile_name('client_secret.json', scope)
    client = gspread.authorize(creds)
        
    spreadsheet = client.open(spreadsheet_name)

    return spreadsheet

def readGoogleSheet(spreadsheet):

    sheet = spreadsheet.sheet1 # Might need to loop through sheets or whatever
    val = sheet.cell(1, 1).value # This just gets the value of the first cell. The docs I linked to above are pretty helpful on all the other stuff you can do

    return val

test_spreadsheet = loadGoogleSheet('Copy of TLO Summary - Template DO NOT EDIT')
test_output = readGoogleSheet(test_spreadsheet)
print(test_output)