I'm working with BeautifulSoup to scrape an imdb webpage (https://www.imdb.com/search/title/?release_date=2017&sort=num_votes,desc&page=1). I've successfully scraped the name, year, intro, votes, director, etc. but having difficulties scraping "gross" and "actors".
<p class="sort-num_votes-visible">
<span class="text-muted">Votes:</span>
<span name="nv" data-value="591671">591,671</span>
<span class="ghost">|</span> <span class="text-muted">Gross:</span>
<span name="nv" data-value="226,277,068">$226.28M</span>
</p>
<p class="">
Director:
<a href="/name/nm0003506/?ref_=adv_li_dr_0">James Mangold</a>
<span class="ghost">|</span>
Stars:
<a href="/name/nm0413168/?ref_=adv_li_st_0">Hugh Jackman</a>,
<a href="/name/nm0001772/?ref_=adv_li_st_1">Patrick Stewart</a>,
<a href="/name/nm6748436/?ref_=adv_li_st_2">Dafne Keen</a>,
<a href="/name/nm2933542/?ref_=adv_li_st_3">Boyd Holbrook</a>
</p>
Below are the code I used:
import requests
from bs4 import BeautifulSoup
directors=[]
actors=[]
votes=[]
grosses=[]
res_movie = requests.get('http://www.imdb.com/search/titlerelease_date='+'2018'+'&sort=num_votes,desc&page='+'1')
bs_movie = BeautifulSoup(res_movie.text,'html.parser')
movies=bs_movie.find_all('div', class_='lister-item mode-advanced')
for movie in movies:
director=movie.find('p',class_='').find_all('a')[0].text
directors.append(director)
actors.append(movie.find('p',class_='').find_all('a')[1:].text)
vote=movie.find_all('span', attrs = {'name':'nv'})[0].text
votes.append(vote)
gross=movie.find_all('span', attrs = {'name':'nv'})[1].text
grosses.append(gross)
The error I'm getting from actors:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-70-a969b9a65fa7> in <module>
60 directors.append(director)
61
---> 62 actors.append(movie.find('p',class_='').find_all('a')[:1].text)
63
64
AttributeError: 'list' object has no attribute 'text'
The error I'm getting from gross:
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-69-bd813766e1ca> in <module>
74 votes.append(vote)
75
---> 76 gross=movie.find_all('span', attrs = {'name':'nv'})[1].text
77 grosses.append(gross)
78 # print(directors)
IndexError: list index out of range
I was hoping to use the list's index to get the element I desired. I would love to learn the proper method to obtain the element. Thanks so much in advance!!