I'm using feedparser to fetch RSS feed data. For most RSS feeds that works perfectly fine. However, I know stumbled upon a website where fetching RSS feeds fails (example feed). The return result does not contain the expected keys and the values are some HTML codes.
I tries simply reading the feed URL with urllib2.Request(url)
. This fails with a HTTP Error 405: Not Allowed
error. If I add a custom header like
headers = {
'Content-type' : 'text/xml',
'User-Agent': 'Mozilla/5.0 (X11; Linux i586; rv:31.0) Gecko/20100101 Firefox/31.0',
}
request = urllib2.Request(url)
I don't get the 405 error anymore, but the returned content is a HTML document with some HEAD tags and an essentially empty BODY. In the browser everything looks fine, same when I look at "View Page Source". feedparser.parse
also allows to set agent
and request_headers
, I tried various agents. I'm still not able to correctly read the XML let alone the parsed feed from feedparse
.
What am I missing here?