0
votes

I have a paginated list of IMDb titles of about 17 pages: the list

The link has URLs in the form of http://www.imdb.com/title/tt0111161/?ref_=adv_li_tt

Where tt0111161 is the title ID.

I'd like to go through the whole list, and for each title, go to the URL http://www.imdb.com/title/tt0111161/ratings
and extract HTML info from that page. How can I do that with Scrapy, BeautifulSoup, or any other method?

1
What do you want to extra from 'imdb.com/title/tt0111161/ratings '?Piyush S. Wanare
@PiyushS.Wanare The votes distribution.Mohamed Oun
What have you tried so far? Do you have any code to share with issues you are seeing while running it?paul trmbrth
@paultrmbrth I don't know where to start yet, so I still haven't written code for it.Mohamed Oun

1 Answers

0
votes

I have tried this way:-

from bs4 import BeautifulSoup
import urllib
r = urllib.urlopen('http://www.imdb.com/title/tt0111161/ratings').read()
soup = BeautifulSoup(r)
print soup

NOTE - IMDb will not allow you to scrap their website.