5
votes

I am using Google Drive API through pydrive to move files between two google drive accounts. I have been testing with a folder with 16 files. My code always raises an error in the sixth file

"User rate limit exceeded">

I know that there is a limit for the number of request (10/s or 1000/100s), but I have tried the exponential backoff suggested by the Google Drive API to handle this error. Even after 248s it still raises the same error.

Here an example what I am doing

def MoveToFolder(self,files,folder_id,drive):
    total_files = len(files)
    for cont in range(total_files):
        success = False
        n=0
        while not success:
            try:
                drive.auth.service.files().copy(fileId=files[cont]['id'],
                                                body={"parents": [{"kind": "drive#fileLink", "id": folder_id}]}).execute()
                time.sleep(random.randint(0,1000)/1000)
                success = True
            except:
                wait = (2**n) + (random.randint(0,1000)/1000)
                time.sleep(wait)
                success = False
                n += 1

I tried to use "Batching request" to copy the files, but it raises the same errors for 10 files.

def MoveToFolderBatch(self,files,folder_id,drive):
    cont=0
    batch = drive.auth.service.new_batch_http_request()
    for file in files:
        cont+=1
        batch.add(drive.auth.service.files().copy(fileId=file['id'],
                                                 body={"parents": [
                                                     {"kind": "drive#fileLink", "id": folder_id}]}))
    batch.execute()

Does anyone have any tips?

EDIT: According to google support:

Regarding your User rate limit exceeded error, is not at all related to the per-user rate limit set in the console. Instead it is coming from internal Google systems that the Drive API relies on and are most likely to occur when a single account owns all the files within a domain. We don't recommend that a single account owns all the files, and instead have individual users in the domain own the files. For transferring files, you can check this link. Also, please check this link on the recommendations to avoid the error.

2

2 Answers

2
votes

403: User Rate Limit Exceeded is basically flood protection.

{
 "error": {
  "errors": [
   {
    "domain": "usageLimits",
    "reason": "userRateLimitExceeded",
    "message": "User Rate Limit Exceeded"
   }
  ],
  "code": 403,
  "message": "User Rate Limit Exceeded"
 }
}

You need to slow down. Implementing exponential backoff as you have done is the correct course of action.

Google is not perfect at counting the requests so counting them yourself isn't really going to help. Sometimes you can get away with 15 request a second other times you can only get 7.

You should also remember that you are completing with the other people using the server if there is a lot of load on the server one of your request may take longer while an other may not. Don't run on the hour that's when most people have cron jobs set up to extract.

Note: If you go to google developer console under where you have enabled the drive API got to quota tab click the pencil icon next to

Queries per 100 seconds per user

and

Queries per 100 seconds

You can increase them both. One is User based the other is project based. Each user can make X requests in 100 seconds your project can make Y request per 100 seconds.

enter image description here

Note: No idea how high you can set yours. This is my dev account so it may have some beta access I cant remember.

2
votes

See 403 rate limit after only 1 insert per second and 403 rate limit on insert sometimes succeeds

The key points are:-

  • backoff but do not implement exponential backoff!. This will simply kill your application throughput

  • instead you need to proactively throttle your requests to avoid the 304's from occurring. In my testing I've found that the maximum sustainable throughput is about 1 transaction every 1.5 seconds.

  • batching makes the problem worse because the batch is unpacked before the 304 is raised. Ie. a batch of 10 is interpreted as 10 rapid transactions, not 1.

Try this algorithm

delay=0                              // start with no backoff
while (haveFilesInUploadQueue) {
   sleep(delay)                      // backoff 
   upload(file)                      // try the upload
   if (403 Rate Limit) {             // if rejected with a rate limit
      delay += 2s                    // add 2s to delay to guarantee the next attempt will be OK
   } else {
      removeFileFromQueue(file)      // if not rejected, mark file as done
      if (delay > 0) {               // if we are currently backing off
         delay -= 0.2s               // reduce the backoff delay
      }
   }
}
// You can play with the 2s and 0.2s to optimise throughput. The key is to do all you can to avoid the 403's 

One thing to be aware of is that there was (is?) a bug with Drive that sometimes an upload gets rejected with a 403, but, despite sending the 403, Drive goes ahead and creates the file. The symptom will be duplicated files. So to be extra safe, after a 403 you should somehow check if the file is actually there. Easiest way to do this is use pre-allocated ID's, or to add your own opaque ID to a property.