0
votes

We have a client-server architecture that uses Google Drive for sharing files between the client and the server, without having to actually send them.
The client uses the Google Drive API to get a list of file IDs of all files it wants to share with the server.
The server then downloads the files with the appropriate authorization token.
Server response time is crucial for user experience.

We tried a few approaches:
First, we used the webContentLink. This worked until we started receiving large files from the client. Instead of getting the files' content, we got an html with a warning "exceeds the maximum size that Google can scan". We could not find a header we can use to skip this check.
Second, we switched to the Google API resource URL with the alt=media query param. This works, but we then hit API quota errors (User Rate Limit Exceeded). Since this is server code, it was identified as a single user for all requests. Then we added the quotaUser param to represent on behalf of which user each request is. We still got many 403 responses.
In addition, we implemented exponential backoff for the failed requests.
We also added a cache for the successful requests.

Our current solution is a combination of the two. Using the webContentLink whenever possible (which appears not to affect the Google API quota). If the response is not as expected, (i.e. an html, wrong size, etc.), we try the Google API resource URL (with exponential backoff).
(Most of the files are small enough to not exceed the scan size limit)

Both client and server uses the same OAuth 2.0 client ID.

Here are my questions:
1. Is it possible to skip the virus scan, so that all files can be downloaded using the webContentLink?
2. Is the size threshold for the virus scan documented? Assuming we know the file size we can then save the round-trip of the first request (using the webContentLink)
3. Is there anything else we can do other than applying for a higher quota?

1

1 Answers

0
votes
  1. Is it possible to skip the virus scan, so that all files can be downloaded using the webContentLink?

If it is greater than 25MB it is not possible with webContentLink but since you are using authorized request use files.get with alt=media. Apply appropriate error handling options (which you have done using exponential backoff). The next step would be checking if you code is optimized then after checking and applied recommended optimization and still received Error 403 Limit Exceed, time to apply for a higher quota.

  1. Is the size threshold for the virus scan documented? Assuming we know the file size we can then save the round-trip of the first request (using the webContentLink)

To answer this, you can refer to the Google Drive Help Forum : How can I successfully download large files from google drive without network errors at the most end of the download:

Only files smaller than 25 MB can be scanned for viruses.

  1. Is there anything else we can do other than applying for a higher quota?

You can do the following before applying for a higher quota:

After all optimization is done, the only option is to apply for higher quota limit.

Hope this helps!