0
votes

I've been trying to parse HTML from https://www.teamrankings.com/nba/team/oklahoma-city-thunder but can't get the full page to parse. I've tried requests, urllib, and selenium with BeautifulSoup. All of them don't parse full HTML. The closest I got was with urllib (code below). I've tried many different user agents and all different parsers.

If I print webpage before using BeautifulSoup, I can see all the content. Once I use BeautifulSoup, it cuts most of it out. I've tried html.parser, lxml, and html5.

url = https://www.teamrankings.com/nba/team/oklahoma-city-thunder

req = Request(url, headers={'User-Agent': 'Mozilla/5.0 (iPhone; CPU iPhone OS 5_1 like Mac OS X) AppleWebKit/534.46 (KHTML, like Gecko) Version/5.1 Mobile/9B179 Safari/7534.48.3'})

webpage = urlopen(req).read()

print(webpage)

basketball = BeautifulSoup(webpage)

print(basketball)

Thanks in advance!

1
Which "content" is being cut off exactly? Could you provide an example in the link you provided?Xosrov
I can't reproduce this with requests and BeautifulSoup (lxml parser).AMC
I am only getting game data from March and beyond... I am looking to get all data from all dates (october - present)Sean Robert
AMC are you using a User Agent when using requests?Sean Robert

1 Answers

0
votes

not sure what you mean by not getting all the content. Have you tried just using Pandas (it uses beautifulsoup under the hood to parse <table> tags. Returns the full table for me:

EDIT

In the furture, be more specific in your question. It wasn't until your comments that you explained more. It's all there, you just need to iterate through it all.

import requests
import pandas as pd
from bs4 import BeautifulSoup

url = 'https://www.teamrankings.com/nba/team/oklahoma-city-thunder'
response = requests.get(url)

df = pd.DataFrame()
soup = BeautifulSoup(response.text, 'html.parser')
table = soup.find_all('table')[1]

cols = [ each.text for each in table.find_all('th') ]
rows = table.find_all('tr')
for row in rows:
    data = [ each.text for each in row.find_all('td') ]
    temp_df = pd.DataFrame([data])
    df = df.append(temp_df, sort=True).reset_index(drop=True)

df = df.dropna()
df.columns = cols

Output:

    print (df)
     Date      Opponent     Result Location    W/L  Div Spread     Total Money
1   10/23          Utah   L 95-100     Away    0-1  0-1   +9.0  Un 221.0  +339
2   10/25    Washington    L 85-97     Home    0-2  0-1   -8.5  Un 218.5  -399
3   10/27  Golden State   W 120-92     Home    1-2  0-1   -1.0  Un 223.5  -117
4   10/28       Houston  L 112-116     Away    1-3  0-1  +10.0  Ov 227.5  +433
5   10/30      Portland   L 99-102     Home    1-4  0-2   +1.5  Un 221.5  +104
6   11/02   New Orleans  W 115-104     Home    2-4  0-2   -2.0  Un 228.5  -124
7   11/05       Orlando   W 102-94     Home    3-4  0-2   -3.0  Un 201.5  -142
8   11/07   San Antonio  L 112-121     Away    3-5  0-2   +5.0  Ov 211.5  +172
9   11/09  Golden State  W 114-108     Home    4-5  0-2  -12.5  Ov 216.5  -770
10  11/10     Milwaukee  L 119-121     Home    4-6  0-2   +8.5  Ov 220.0  +329
11  11/12       Indiana   L 85-111     Away    4-7  0-2   +1.0  Un 213.0  -101
12  11/15  Philadelphia  W 127-119     Home    5-7  0-2   +3.5  Ov 214.0  +148
13  11/18   LA Clippers    L 88-90     Away    5-8  0-2   +7.5  Un 222.0  +297
14  11/19     LA Lakers  L 107-112     Away    5-9  0-2  +11.0  Ov 209.5  +469
15  11/22     LA Lakers  L 127-130     Home   5-10  0-2   +4.5  Ov 209.5  +186
16  11/25  Golden State   W 100-97     Away   6-10  0-2   -7.5  Un 213.5  -297
17  11/27      Portland  L 119-136     Away   6-11  0-3   +3.0  Ov 219.0  +137
18  11/29   New Orleans  W 109-104     Home   7-11  0-3   -4.5  Un 229.0  -195
19  12/01   New Orleans  W 107-104     Away   8-11  0-3   +2.5  Un 226.5  +124
20  12/04       Indiana  L 100-107     Home   8-12  0-3   +1.5  Un 208.5  +102
21  12/06     Minnesota  W 139-127     Home   9-12  1-3   -3.5  Ov 218.0  -160
22  12/08      Portland   W 108-96     Away  10-12  2-3   +3.5  Un 223.0  +154
23  12/09          Utah   W 104-90     Away  11-12  3-3   +8.5  Un 206.5  +311
24  12/11    Sacramento    L 93-94     Away  11-13  3-3   +1.5  Un 207.5  +117
25  12/14        Denver  L 102-110     Away  11-14  3-4   +5.5  Ov 204.0  +211
26  12/16       Chicago  W 109-106     Home  12-14  3-4   -5.0  Ov 208.5  -211
27  12/18       Memphis  W 126-122     Home  13-14  3-4   -6.5  Ov 219.5  -254
28  12/20       Phoenix  W 126-108     Home  14-14  3-4   -3.0  Ov 224.5  -147
29  12/22   LA Clippers  W 118-112     Home  15-14  3-4   -1.0  Ov 223.5  -111
30  12/26       Memphis   L 97-110     Home  15-15  3-4   -5.5  Un 224.0  -242
..    ...           ...        ...      ...    ...  ...    ...       ...   ...
53  02/09        Boston    3:30 pm     Home                 --        --    --
54  02/11   San Antonio    8:00 pm     Home                 --        --    --
55  02/13   New Orleans    8:00 pm     Away                 --        --    --
56  02/21        Denver    8:00 pm     Home                 --        --    --
57  02/23   San Antonio    7:00 pm     Home                 --        --    --
58  02/25       Chicago    8:00 pm     Away                 --        --    --
59  02/27    Sacramento    8:00 pm     Home                 --        --    --
60  02/28     Milwaukee    8:00 pm     Away                 --        --    --
61  03/03   LA Clippers    8:00 pm     Home                 --        --    --
62  03/04       Detroit    7:00 pm     Away                 --        --    --
63  03/06      New York    7:30 pm     Away                 --        --    --
64  03/08        Boston    6:00 pm     Away                 --        --    --
65  03/11          Utah    8:00 pm     Home                 --        --    --
66  03/13     Minnesota    8:00 pm     Home                 --        --    --
67  03/15    Washington    6:00 pm     Away                 --        --    --
68  03/17       Memphis    8:00 pm     Away                 --        --    --
69  03/18       Atlanta    7:30 pm     Away                 --        --    --
70  03/20        Denver    8:00 pm     Home                 --        --    --
71  03/23         Miami    7:30 pm     Away                 --        --    --
72  03/26     Charlotte    8:00 pm     Home                 --        --    --
73  03/28  Golden State    8:30 pm     Away                 --        --    --
74  03/30        Denver    9:00 pm     Away                 --        --    --
75  04/01       Phoenix    8:00 pm     Home                 --        --    --
76  04/04   LA Clippers    3:30 pm     Away                 --        --    --
77  04/05     LA Lakers    9:30 pm     Away                 --        --    --
78  04/07      Brooklyn    8:00 pm     Home                 --        --    --
79  04/10      New York    8:00 pm     Home                 --        --    --
80  04/11       Memphis    8:00 pm     Away                 --        --    --
81  04/13          Utah    8:00 pm     Home                 --        --    --
82  04/15        Dallas    7:30 pm     Away                 --        --    --

[82 rows x 9 columns]