Extracting data from paginated nested links

Question

I have a paginated list of IMDb titles of about 17 pages: the list

The link has URLs in the form of http://www.imdb.com/title/tt0111161/?ref_=adv_li_tt

Where tt0111161 is the title ID.

I'd like to go through the whole list, and for each title, go to the URL http://www.imdb.com/title/tt0111161/ratings
and extract HTML info from that page. How can I do that with Scrapy, BeautifulSoup, or any other method?

What do you want to extra from 'imdb.com/title/tt0111161/ratings '? — Piyush S. Wanare
What have you tried so far? Do you have any code to share with issues you are seeing while running it? — paul trmbrth
@paultrmbrth I don't know where to start yet, so I still haven't written code for it. — Mohamed Oun

Piyush S. Wanare Piyush S. Wanare · Accepted Answer · 2017-02-08T13:26:42

I have tried this way:-

from bs4 import BeautifulSoup
import urllib
r = urllib.urlopen('http://www.imdb.com/title/tt0111161/ratings').read()
soup = BeautifulSoup(r)
print soup

NOTE - IMDb will not allow you to scrap their website.

Extracting data from paginated nested links

1 Answers