4
votes

I am having trouble retrieving all messages through the Gmail API PHP Library. I use listUsersThreads to retrieve all threads to do either a full or partial mailbox sync on a user's account. The initial full sync processes and returns the message ids I need, which I then use to store mail meta headers (from, to, date, subject). A subsequent call using listUserHistory from the last history id, allows me to do a partial sync to only retrieve the latest messages. From the data I have stored, I then display a full message conversation log between two parties, ordered by date to show the conversation. Clicking the message will then query the API to retrieve email body, which I then display.

The issue is that based on the messages I have saved, looking at the MIME content, there are messages in the MIME body, that I do not have on my database. I then also tried to query the API using a search query, and still there are missing messages, not returned by the API.

A previous developer used mimecast to get the messages, and querying that database does in fact return the messages that I'm missing.

How is the Google Gmail API not giving me all messages between sender and receiver? The MIME body clearly shows messages that are not available when querying the API, and I don't understand why, or how to find the missing messages.

Any assistance would be appreciated.

1

1 Answers

1
votes

So, for in case this issue comes up for anyone else, I believe that it has something to do with expired history items. I stand to be corrected, as this can only be proved after I've had my implementation running for more than two weeks.

If you're considering running a mailbox sync, there's a good chance that you'll be missing messages, especially if those messages were sent from a client other than Inbox or Gmail. History items are kept for two weeks on average, so by syncing a mailbox, you'll be syncing everything from when the account was activated, but expired history items will not be available.

In theory, this means that you should have the full email conversation while a partial sync is executed. You should have all the MIME headers that you need as and when the communication takes place, provided that, like me, you have push notifications enabled through the Google Cloud Services console to alert your systems to run a partial sync process on any given account.

If your partial sync is executed manually, or possibly through CRON on an interval, rather than through push notifications, you'll need to make sure that the interval is configured to sync while all history items are still available and accessible.

The downside of this, though, is that even if you do have trace of all communication with their message IDs, a lookup on an expired message to retrieve email body will fail with a 404 status code, and you will not be able to retrieve email body contents for some messages.

Thus, if your processes relies heavily on what's in the body of the email, you should also store body content locally during a partial sync I really only need the MIME headers, although I do lookup message contents when needed, but it won't cause major problems for me if I was unable to retrieve the body of any given message.

I should be able to confirm this theory within a month from now, so if you think my theory is incorrect, please feel free to make me the wiser. :)