I have some Python code which scrapes the game logs of NBA players for a given season (for instance: the data here) into a csv
file. I'm using Beautiful Soup. I am aware that there is an option to just get csv
version by clicking on a link on the website, but I am adding something to each line, so I feel like scraping line by line is the easiest option. The goal is to eventually write code that does this for every season of every player.
The code looks like this:
import urllib
from bs4 import BeautifulSoup
def getData(url):
html = urllib.urlopen(url)
soup = BeautifulSoup(html, "html.parser")
type(soup)
file = open('/Users/Mika/Desktop/a_players.csv', 'a')
for table in soup.find_all("pre", class_ = ""):
dataline = table.getText
player_id = player_season_url[47:-14]
file.write(player_id + ',' + dataline + '\n')
file.close()
player_season_url = "https://www.basketball-reference.com/players/a/abdelal01/gamelog/1991/"
getData(player_season_url)
The problem is this: as you can see from inspecting the element of the URL, some cells in the table have empty values.
<td class="right " data-stat="fg3_pct"></td>
(this is an example of a good cell with a value ("1") in in that is properly scraped):
<th scope="row" class="right " data-stat="ranker" csk="1">1</th>
When scraping, the rows come out uneven, skipping over the empty values to create a csv
file with the values out of place. Is there a way to ensure that those empty values get replaces with " "
in the csv
file?