3
votes

I am working on getting large text corpus of the email. There is no API that allows reading a message in Google Group. So the alternative way is to use a Gmail account which is a member of that group. By using this Gmail I can check all the message that is sent to that group. I am using python and Gmail API to fetch the mail. The problem I face is, I couldn't fetch the emails which are from the groups.

results = service.users().messages().list(userId='me',q="from:[email protected]", maxResults=10).execute()

When I replace the from: with another normal user id it's working. When I replace the from: with group email id it's giving zero results. Could I get the actual code to fetch the group gmails through my Gmail?

the second problem is,

when i query using someones's mail :

results = service.users().messages().list(userId='me',q="from:[email protected]", maxResults=10).execute()

I get the results like that

{'resultSizeEstimate': 82, 'messages': [{'id': '1653929b0b414390', 'threadId': '1644c19f390faf28'}, {'id': '165330aaa5bb9134', 'threadId': '16532ef13e7eec8d'}......

Here it's only returning the message id. In order to get the mail with a body with headers, I have to query again for every id. Can't I get the full JSON in one query?

1
I was looking for the same thing. It looks you need to use the get method for every id returned, and you can specify the fields parameter to return only the data you need (for better performance and less bandwidth usage).Alisson

1 Answers

2
votes

For the first part, your query is backward: email messages are sent to groups, from users. This query should return all messages sent to the group:

to:[email protected]

(You can easily test this in Gmail, since it uses the same query specification in the search box).

Next, to get the full message given the message id, use users.messages.get with the format 'full' option, see: https://developers.google.com/gmail/api/v1/reference/users/messages/get

You do have to call it once for each message, but you can submit a batch of get requests to do it efficiently, once. And then you can use history ids to only fetch new messages:

From "Synchronizing Clients with Gmail" https://developers.google.com/gmail/api/guides/sync

Call messages.list to retrieve the first page of message IDs.

Create a batch request of messages.get requests for each of the messages returned by the list request. If your application displays message contents, you should use format=FULL or format=RAW the first time your application retrieves a message and cache the results to avoid additional retrieval operations. If you are retrieving a previously cached message, you should use format=MINIMAL to reduce the size of the response as only the labelIds may change.

Merge the updates into your cached results. Your application should store the historyId of the most recent message (the first message in the list response) for future partial synchronization.

See: https://developers.google.com/gmail/api/guides/batch