I'm quite new to web scraping but I'm making a progress little by little; However for this one I'm really having a hard time.
What I'm trying to do is to scrape from ESPN NBA boxscore website: http://espn.com/nba/boxscore?gameId=401160948
I want to scrape the names of the players that did not play/participate (labeled with "DNP") followed by reasons at the end the two tables and append them in a list.
Here's my code:
from bs4 import BeautifulSoup
page = requests.get('https://espn.com/nba/boxscore?gameId=401160948')
soup = BeautifulSoup(page.content, 'html.parser')
main_container = soup.find(id='main-container')
data = []
for hstat in main_container.find_all('tbody')[0]:
player_info = {}
player_info['name'] = hstat.find("td", {"class": "name"}).find('span').get_text()
data.append(player_info)
print(data)
The code above is working for the tbody[0] and tbody[2], maybe because of complete information in all tds?Not really sure. However, for tbody[1] and tbody[4] which contain the players td value of DNP, it ain't working, I'm trying to make a table of players that did not play, so I won't also be needing the other players from tbody[1] and [4] that played which is also I don't know yet how to exclude since early at this point i'm already lacking of solutions.
What should I do here? Need help
Thank you.