You can use BeautifulSoup and CSS selector [id$="-game-basic"] table
to select only the two basic tables and then load these tables with pd.read_html()
:
import requests
import pandas as pd
from bs4 import BeautifulSoup
url = 'https://www.basketball-reference.com/boxscores/201910220TOR.html'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')
my_tables = soup.select('[id$="-game-basic"] table')
df_1 = pd.read_html(str(my_tables[0]))[0].droplevel(0, axis=1)
df_2 = pd.read_html(str(my_tables[1]))[0].droplevel(0, axis=1)
print(df_1)
print(df_2)
Prints:
Starters MP ... PTS +/-
0 Jrue Holiday 41:05 ... 13 -14
1 Brandon Ingram 35:06 ... 22 -19
2 J.J. Redick 27:03 ... 16 -14
3 Lonzo Ball 24:50 ... 8 -7
4 Derrick Favors 20:46 ... 6 -12
5 Reserves MP ... PTS +/-
6 Josh Hart 28:10 ... 15 -1
7 Nicolò Melli 19:37 ... 14 +11
8 Kenrich Williams 18:02 ... 3 +11
9 Frank Jackson 13:51 ... 9 +7
10 Jahlil Okafor 12:29 ... 8 -7
11 E'Twaun Moore 12:06 ... 5 -1
12 Nickeil Alexander-Walker 11:55 ... 3 +6
13 Jaxson Hayes Did Not Play ... Did Not Play Did Not Play
14 Team Totals 265 ... 122 NaN
[15 rows x 21 columns]
Starters MP ... PTS +/-
0 Kyle Lowry 44:59 ... 22 -1
1 Fred VanVleet 44:21 ... 34 +18
2 Pascal Siakam 38:09 ... 34 +5
3 OG Anunoby 35:48 ... 11 +12
4 Marc Gasol 31:55 ... 6 -2
5 Reserves MP ... PTS +/-
6 Norman Powell 28:38 ... 5 +2
7 Serge Ibaka 26:00 ... 13 +6
8 Terence Davis 15:10 ... 5 0
9 Matt Thomas Did Not Play ... Did Not Play Did Not Play
10 Chris Boucher Did Not Play ... Did Not Play Did Not Play
11 Stanley Johnson Did Not Play ... Did Not Play Did Not Play
12 Malcolm Miller Did Not Play ... Did Not Play Did Not Play
13 Dewan Hernandez Did Not Play ... Did Not Play Did Not Play
14 Team Totals 265 ... 130 NaN
[15 rows x 21 columns]
EDIT: To put this function in a loop, you can use this example:
import requests
import pandas as pd
from bs4 import BeautifulSoup
url = 'https://www.basketball-reference.com/leagues/NBA_2020_games.html'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')
def get_tables(url):
soup = BeautifulSoup(requests.get(url).content, 'html.parser')
my_tables = soup.select('[id$="-game-basic"] table')
df_1 = pd.read_html(str(my_tables[0]))[0].droplevel(0, axis=1)
df_2 = pd.read_html(str(my_tables[1]))[0].droplevel(0, axis=1)
return df_1, df_2
for a in soup.select('.filter a'):
u = 'https://www.basketball-reference.com' + a['href']
print(u)
soup2 = BeautifulSoup(requests.get(u).content, 'html.parser')
for a2 in soup2.select('td a[href^="/boxscores/"]'):
u2 = 'https://www.basketball-reference.com' + a2['href']
t1, t2 = get_tables(u2)
print(u2)
print(t1)
print(t2)
print('-' * 80)
Prints:
https://www.basketball-reference.com/leagues/NBA_2020_games-october.html
https://www.basketball-reference.com/boxscores/201910220TOR.html
Starters MP ... PTS +/-
0 Jrue Holiday 41:05 ... 13 -14
1 Brandon Ingram 35:06 ... 22 -19
2 J.J. Redick 27:03 ... 16 -14
3 Lonzo Ball 24:50 ... 8 -7
4 Derrick Favors 20:46 ... 6 -12
5 Reserves MP ... PTS +/-
6 Josh Hart 28:10 ... 15 -1
7 Nicolò Melli 19:37 ... 14 +11
8 Kenrich Williams 18:02 ... 3 +11
9 Frank Jackson 13:51 ... 9 +7
10 Jahlil Okafor 12:29 ... 8 -7
11 E'Twaun Moore 12:06 ... 5 -1
12 Nickeil Alexander-Walker 11:55 ... 3 +6
13 Jaxson Hayes Did Not Play ... Did Not Play Did Not Play
14 Team Totals 265 ... 122 NaN
[15 rows x 21 columns]
Starters MP ... PTS +/-
0 Kyle Lowry 44:59 ... 22 -1
1 Fred VanVleet 44:21 ... 34 +18
2 Pascal Siakam 38:09 ... 34 +5
3 OG Anunoby 35:48 ... 11 +12
4 Marc Gasol 31:55 ... 6 -2
5 Reserves MP ... PTS +/-
6 Norman Powell 28:38 ... 5 +2
7 Serge Ibaka 26:00 ... 13 +6
8 Terence Davis 15:10 ... 5 0
9 Matt Thomas Did Not Play ... Did Not Play Did Not Play
10 Chris Boucher Did Not Play ... Did Not Play Did Not Play
11 Stanley Johnson Did Not Play ... Did Not Play Did Not Play
12 Malcolm Miller Did Not Play ... Did Not Play Did Not Play
13 Dewan Hernandez Did Not Play ... Did Not Play Did Not Play
14 Team Totals 265 ... 130 NaN
[15 rows x 21 columns]
--------------------------------------------------------------------------------
https://www.basketball-reference.com/boxscores/201910220LAC.html
Starters MP ... PTS +/-
0 Anthony Davis 37:22 ... 25 +3
1 LeBron James 36:00 ... 18 -8
2 Danny Green 32:20 ... 28 +7
...and so on.