I have implemented a Google App Engine application which uploads documents to specific folders in Google Docs. A month ago I started having response time issues (deadline exceeded on GdataClient.GetDocList, fetch-url call, in Gdata Client)when querying for a specific folder in Google Docs. This caused a lot of tasks to queue-up in the Task Queue.
When I saw this, I paused the queues for a while - about 24 hours. When I restarted the queue nearly all of the where uploaded again, except 10 of the the files / tasks.
When I implemented the GetDocList call, I implemented a retry / sleep functionality to avoid the sometimes intermittent "DeadLineExceeded" which I got during my .GetNextLink().href-loop. I know that this is not a good "Cloud" design. But I was forced to do this to get it stable enough for production. For every sleep I extend the wait time and I only retry 5 times. The last time I wait for about 25 sec before retrying.
What I think is that all the tasks in the queues retried so many times (even though I have limited the tasks to running in serial-mode , one at a time. Maximum 5 a minute) that the App Engine App where black-listed from the Google Docs Api.
Can this happen?
What do I need to do to be able to query Google Docs Api from the same App Engine instance again?
Do I need to migrate the App Engine app to a new Application ID?
When I try this from my development environment, the code works, it queries the folder structure and returns a result within the time-limit.
The folder-structure I'm querying is rather big, which means that I need to fetch them via the .GetNextLink().href. In my development environment, the folderstructure contains of much less folders.
Anyway, this have been working very good for about a year in the production AppEngine instance. But stopped working around the 4th - 5th of March.
The user-account which is queried is currently using 7000 MB (3%) of the available 205824 MB.
When I use the code from dev-env but with completely different Google Apps domain / app-id / google account I can not reproduce the error.
When I changed the max-results to 1 (instead of 100 or 50 or 20) I succeed intermittently. But as the max-result is 1 I need to query many 1000 times, and since I only succeed with max 3 in a row, until my exponential back-off quits I never get my whole resultset. The resultset (the folder I query consist of between 300 to 400 folders (which in turn consists of at least 2 - 6 subfolders with pdf-files in)
I have tried with max-result 2, then the fetch fails on every occasion. If I change back to max-result 1 , then it succeeds on one or two fetches in a row, but this is not suffient. Since I need the whole folder-structure to be able to find a the correct folder to store the file in.
I have tried this from my local environment - i.e. from a completly different IP-adress and it still fails. This means that the app-engine app is not blocked from accessing google docs. The max-result change from 2 to 1 also proves that.
Conclusion: The slow return time from the Google Docs API must be due to the extensive amount of files and collections inside the collection which I'm looping through. Keep in mind that this collection contains about 3500 Mb. Is this an issue?
Log: DocListUrl to get entries from = https://docs.google.com/feeds/default/private/full/folder:XXXXXXX/contents?max-results=1.
Retrying RetryGetDocList, wait for 1 seconds.
Retrying RetryGetDocList, wait for 1 seconds.
Retrying RetryGetDocList, wait for 4 seconds.
Retrying RetryGetDocList, wait for 9 seconds.
Retrying RetryGetDocList, wait for 16 seconds.
Retrying RetryGetDocList, wait for 25 seconds.
ApplicationError: 5
Traceback (most recent call last):
File "/base/python_runtime/python_lib/versions/1/google/appengine/ext/webapp/_webapp25.py", line 703, in call
handler.post(*groups)
File "/base/data/home/apps/XXXX/prod-43.358023265943651014/DocsHandler.py", line 418, in post
success = uploader.Upload(blob_reader, fileToUpload.uploadSize, fileToUpload.MainFolder, fileToUpload.ruleTypeReadableId ,fileToUpload.rootFolderId,fileToUpload.salesforceLink,fileToUpload.rootFolder, fileToUpload.type_folder_name, fileToUpload.file_name, currentUser, client, logObj)
File "/base/data/home/apps/XXXX/prod-43.358023265943651014/DocsClasses.py", line 404, in Upload
collections = GetAllEntries('https://docs.google.com/feeds/default/private/full/%s/contents?max-results=1' % (ruleTypeFolderResourceId), client)
File "/base/data/home/apps/XXXX/prod-43.358023265943651014/DocsClasses.py", line 351, in GetAllEntries
chunk = RetryGetDocList(client.GetDocList , chunk.GetNextLink().href)
File "/base/data/home/apps/XXX/prod-43.358023265943651014/DocsClasses.py", line 202, in RetryGetDocList
return functionCall(uri)
File "/base/data/home/apps/XXX/prod-43.358023265943651014/gdata/docs/client.py", line 142, in get_doclist
auth_token=auth_token, **kwargs)
File "/base/data/home/apps/XXXX/prod-43.358023265943651014/gdata/client.py", line 635, in get_feed
**kwargs)
File "/base/data/home/apps/XXXXX/prod-43.358023265943651014/gdata/client.py", line 265, in request
uri=uri, auth_token=auth_token, http_request=http_request, **kwargs)
File "/base/data/home/apps/XXXX/prod-43.358023265943651014/atom/client.py", line 117, in request
return self.http_client.request(http_request)
File "/base/data/home/apps/XXXXX/prod-43.358023265943651014/atom/http_core.py", line 420, in request
http_request.headers, http_request._body_parts)
File "/base/data/home/apps/XXXXX/prod-43.358023265943651014/atom/http_core.py", line 497, in _http_request
return connection.getresponse()
File "/base/python_runtime/python_dist/lib/python2.5/httplib.py", line 206, in getresponse
deadline=self.timeout)
File "/base/python_runtime/python_lib/versions/1/google/appengine/api/urlfetch.py", line 263, in fetch
return rpc.get_result()
File "/base/python_runtime/python_lib/versions/1/google/appengine/api/apiproxy_stub_map.py", line 592, in get_result
return self.__get_result_hook(self)
File "/base/python_runtime/python_lib/versions/1/google/appengine/api/urlfetch.py", line 371, in _get_fetch_result raise DeadlineExceededError(str(err)) DeadlineExceededError: ApplicationError: 5
Regards /Jens