2
votes

I am using urllib2 for Google App Engine (GAE) in python. Very often the app crashes because of the following error:

Deadline exceeded while waiting for HTTP response from URL: ....

The Source looks like this:

import webapp2
import urllib2
from bs4 import BeautifulSoup

def functionRunning2To5Seconds_1()    
    #Check if the Url could be parsed
    try:
        url         ="http://...someUrl..."
        req         = urllib2.Request(url,headers={'User-Agent': 'Mozilla/5.0'})
        page        = urllib2.urlopen(req)
        htmlSource  = BeautifulSoup(page)
    except Exception  e:
        logging.info("Error : {er}".format(er=str(e)))

    #do some calculation with the data of htmlSource, which takes 2 To 5 Seconds

#and the handler looks like:
class xyHandler(webapp2.RequestHandler):
    def post(self, uurl=None):
        r_data1 = functionRunning2To5Seconds_1()
        r_data2 = functionRunning2To5Seconds_2()
        r_data3 = functionRunning2To5Seconds_3()
        ...
        #show the results in a web page

I found this doc which states :

You can use the Python standard libraries urllib, urllib2 or httplib to make HTTP requests. When running in App Engine, these libraries perform HTTP requests using App Engine's URL fetch service

and this:

You can set a deadline for a request, the most amount of time the service will wait for a response. By default, the deadline for a fetch is 5 seconds. The maximum deadline is 60 seconds for HTTP requests and 60 seconds for task queue and cron job requests.

So HOW do I do this? How to set a timeout on urllib2?

Or, do i have to rewrite the whole application to use the App Engine's URL fetch service?

(PS: Does anybody know a secure way to run the "r_data1 = functionRunning2To5Seconds_...()" calls in parallel?)

2

2 Answers

5
votes

https://docs.python.org/2/library/urllib2.html

urllib2.urlopen(url[, data][, timeout])

The optional timeout parameter specifies a timeout in seconds for blocking operations like the connection attempt (if not specified, the global default timeout setting will be used).

2
votes

As suggested by Paul, you can pass the timeout parameter. On App Engine it is tied to the URL fetch and will adjust its deadline to a maximum of 60 seconds. Keep in mind that if the urlopen takes more than the time specified in the timeout parameter, you'll get DeadlineExceededError coming from google.appengine.api.urlfetch_errors.DeadlineExceededError instead of the usual socket.timeout. It's a good practice to catch this error and retry / log if necessary. See [1] for more information on dealing with DeadlineExceededError.

[1] - https://developers.google.com/appengine/articles/deadlineexceedederrors