1
votes

I'm quite new to web scraping but I'm making a progress little by little; However for this one I'm really having a hard time.

What I'm trying to do is to scrape from ESPN NBA boxscore website: http://espn.com/nba/boxscore?gameId=401160948

I want to scrape the names of the players that did not play/participate (labeled with "DNP") followed by reasons at the end the two tables and append them in a list.

Here's my code:

from bs4 import BeautifulSoup

page = requests.get('https://espn.com/nba/boxscore?gameId=401160948')
soup = BeautifulSoup(page.content, 'html.parser')
main_container = soup.find(id='main-container')

data = []
for hstat in main_container.find_all('tbody')[0]:
    player_info = {}
    player_info['name'] = hstat.find("td", {"class": "name"}).find('span').get_text()
    data.append(player_info)
print(data)

The code above is working for the tbody[0] and tbody[2], maybe because of complete information in all tds?Not really sure. However, for tbody[1] and tbody[4] which contain the players td value of DNP, it ain't working, I'm trying to make a table of players that did not play, so I won't also be needing the other players from tbody[1] and [4] that played which is also I don't know yet how to exclude since early at this point i'm already lacking of solutions.

What should I do here? Need help

Thank you.

2

2 Answers

0
votes

You can use .find_previous() to find all information about the player: name and team.

import requests
from bs4 import BeautifulSoup


url = 'https://www.espn.com/nba/boxscore?gameId=401160948'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')

for td in soup.select('td.dnp'):
    team = td.find_previous('div', class_='team-name').text
    reason = td.text.split('-')[-1]
    name = td.find_previous('span', class_='').text

    print('{:<20} {:<20} {}'.format(name, team, reason))

Prints:

J. Evans             Warriors             LEFT ADDUCTOR STRAIN
M. Kidd-Gilchrist    Hornets              COACH'S DECISION
C. Martin            Hornets              COACH'S DECISION
W. Hernangomez       Hornets              COACH'S DECISION
0
votes

Try this. I have put a check if a <tr> contains <td> having class dnp then get the <tr>'s first <td> and append the text to data

import requests
from bs4 import BeautifulSoup

page = requests.get('https://espn.com/nba/boxscore?gameId=401160948')
soup = BeautifulSoup(page.content, 'html.parser')
main_container = soup.find(id='main-container')

data = []

for tbody_soup in main_container.find_all('tbody'):
    # print(tbody_soup)
    player_info = {}
    for tr_soup in tbody_soup:
        if tr_soup.find("td", {"class": "dnp"}) is not None:
            data.append(tr_soup.find("td", {"class": "name"}).find('span').get_text())
print(data)