urllib2 and wget returns HTTP 403 (forbidden), while browser returns OK

Question

There is a webpage that my browser can access, but urllib2.urlopen() (Python) and wget both return HTTP 403 (Forbidden). Is there a way to figure out what happened?

I am using the most primitive form, like urllib2.urlopen("http://test.com/test.php"), using the same url (http://test.com/test.php) for both the browser and wget. I have cleared all my cookies in browser before the test.

Thanks a lot!

I tried it it prints success bla what system and what version of python are you running. — user1786283
the site might be hindering a screen scraping. See http://test.com/robots.txt. Try to change User-Agent header. — jfs
@enginefree I don't think that OP meant http://test.com/test.php literally. — Nathan
@J.F.Sebastian what else can they do to hinder screen scraping? I have made the headers exactly the same as what I saw from LiveHTTPheaders — CuriousMind
Does it work if you turn off javascript, flash, images in the browser? — jfs

Spaceghost Spaceghost · Accepted Answer · 2012-12-03T14:16:37

The Python library urllib has a default user-agent string that includes the word Python in it and wget uses "wget/VERSION". If the site you are cionnectiing checks the user-agent info, it will probably reject these two. Google, for instance, will do so.

It's easy enough to fix.. for wget, use the -U parameter and for urllib, create a URLOpener with an appropriate string.

urllib2 and wget returns HTTP 403 (forbidden), while browser returns OK

2 Answers