1
votes

I'm trying to export a csv-file from the Gmail-API via the Users.messages.attachments: get method (1) and convert it to a pandas data-frame. I already have the ID of the attachment and the ID of the message and everything is working fine. The documentation says that the body data of the attachment is "part as a base64url encoded string" (2) and now i would like to convert the the csv-file to a pandas data-frame. Following this post I tried the following:

[...]

#get the attachment

file = service.users().messages().attachments().get(userId='me', messageId=message_id, id=attachmentId).execute()

#convert the file to a pandas data-frame.

data = file['data']
str_csv = base64.urlsafe_b64decode(data)
df = pd.read_csv(StringIO(str_csv))

This is what print(data) looks like :

__5CAGUAcgBpAGMAaAB0ACAAegB1ACAAQQB1AGsAdABpAG8AbgBzAGQAYQB0AGUAbgAgAGYA_AByACAASwBhAG0AcABhAGcAbgBlAG4ACgAxAC4AIABKAGEAbgB1AGEAcgAgADIAMAAyADAAIAAtACAAMwAxAC4AIABKAGEAbgB1AGEAcgAgADIAMAAyADAACgBHAGUAcgDkAHQACQBEAG8AbQBhAGkAbgAgAGQAZQByACAAYQBuAGcAZQB6AGUAaQBnAHQAZQBuACAAVQBSAEwACQBBAG4AdABlAGkAbAAgAGEAbgAgAG0A9gBnAGwAaQBjAGgAZQBuACAASQBtAHAAcgBlAHMAcwBpAG8AbgBlAG4ACQDcAGIAZQByAHMAYwBoAG4AZQBpAGQAdQBuAGcAcwByAGEAdABlAAkAUgBhAHQAZQAgAGQAZQBy and so on...

Unfortunately I get the following error-message:

error-message

Does anyone have an idea how I can fix this or can explain why it isn't working?

(1) see https://developers.google.com/gmail/api/v1/reference/users/messages/attachments/get

(2) see https://developers.google.com/gmail/api/v1/reference/users/messages/attachments

1

1 Answers

2
votes

The immediate cause of the error is that base64.urlsafe_b64decode returns a byte string. You must first decode it to have a true string that you will pass to StringIO:

str_csv = base64.urlsafe_b64decode(data).decode('UTF16')

But beware: the encoded string you show gives when decoded:

'Bericht zu Auktionsdaten für Kampagnen\n1. Januar 2020 - 31. Januar 2020\nGerät\tDomain der angezeigten URL\tAnteil an möglichen Impressionen\tÜberschneidungsrate\tRate der'

that is:

  • 2 headings lines
  • 1 line containing tabs

I think that it will need further processing before feeding read_csv (at least skip some lines and set the separator...)