0
votes

I'm trying to get the redirects of some Wikipedia pages, and it's happening something curious to me.

If i make:

>>> request = requests.get("https://en.wikipedia.org/wiki/barcelona", allow_redirects=True)
>>> request.url
u'https://en.wikipedia.org/wiki/Barcelona'
>>> request.history
[<Response [301]>]

As you can see, the redirection is correct and I have same url in browser that in Python.

But if I try:

>>> request = requests.get("https://en.wikipedia.org/wiki/Yardymli_Rayon", allow_redirects=True)
>>> request.url
u'https://en.wikipedia.org/wiki/Yardymli_Rayon'
>>> request.history
[]

And in the browser I see that the URL has changed to: https://en.wikipedia.org/wiki/Yardymli_District

Anyone knows how to solve it?

1
If you intend to access a lot of Wikipedia data via script please consider using the API that Wander Nauta linked to. Yes, there's a bit of a learning curve, but your scripts will end up being more efficient & easier to read & maintain. But more importantly, it will put less strain on Wikipedia's servers. - PM 2Ring
FWIW, here's an API URL that returns some info about a page, including redirect status, in JSON format: 'https://en.wikipedia.org/w/api.php?action=query&format=json&redirects&titles=Yardymli Rayon' - PM 2Ring
Indeed, if you pass a url in the way suggested by PM 2Ring, you will have the page automatically redirected and will get the correct response. - maurobio

1 Answers

4
votes

Requests doesn't show the redirect because you're not actually being redirected in the HTTP sense. Wikipedia does some JavaScript trickery (probably HTML5 history modification and pushState) to change the address that's shown in the address bar, but that doesn't apply to Requests, of course.

In other words, both requests and your browser are correct: requests is showing the URL you actually requested (and Wikipedia actually served), while your browser's address bar is showing the 'proper', canonical URL.

You could parse the response and look for the <link rel="canonical"> tag if you want to find out the 'proper' URL from your script, or fetch articles over Wikipedia's API instead.