Python - Web scraping

Question

I am new to python and am trying to scrape data from the following site. Although this code worked for a different site i cannot get it to work for nextgen stats. anyone have any thoughts as to why? below is my code and the error i am getting

import pandas as pd
import numpy as np
import html5lib

urlwk1 = 'https://nextgenstats.nfl.com/stats/receiving/2020/1'
urlwk2 = 'https://nextgenstats.nfl.com/stats/receiving/2020/2'

df11 = pd.read_html(urlwk1)
df11[0].to_csv ('NFL_Receiving_Page1.csv', index=False) #index false gets rid of index listing that appears as the very first column in the csv

Below is the error I am getting

df11 = pd.read_html(urlwk1) Traceback (most recent call last): File "", line 1, in File "C:\Users\USERX\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.7_qbz5n2kfra8p0\LocalCache\local-packages\Python37\site-packages\pandas\util_decorators.py", line 296, in wrapper return func(*args, **kwargs) File "C:\Users\USERX\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.7_qbz5n2kfra8p0\LocalCache\local-packages\Python37\site-packages\pandas\io\html.py", line 1101, in read_html displayed_only=displayed_only, File "C:\Users\USERX\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.7_qbz5n2kfra8p0\LocalCache\local-packages\Python37\site-packages\pandas\io\html.py", line 917, in _parse raise retained File "C:\Users\USERX\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.7_qbz5n2kfra8p0\LocalCache\local-packages\Python37\site-packages\pandas\io\html.py", line 898, in _parse tables = p.parse_tables() File "C:\Users\USERX\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.7_qbz5n2kfra8p0\LocalCache\local-packages\Python37\site-packages\pandas\io\html.py", line 217, in parse_tables tables = self._parse_tables(self._build_doc(), self.match, self.attrs) File "C:\Users\USERX\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.7_qbz5n2kfra8p0\LocalCache\local-packages\Python37\site-packages\pandas\io\html.py", line 547, in _parse_tables raise ValueError("No tables found") ValueError: No tables found df11[0].to_csv ('NFL_Receiving_Page1.csv', index=False) #index false gets rid of index listing that appears as the very first column in the csv Traceback (most recent call last): File "", line 1, in NameError: name 'df11' is not defined

We can't quite tell from that error exactly what's going on, but if it works on one site it's not guaranteed to work on the other as the structure is quite likely different. Are you familiar with using the debugger library? Have you checked what df11[0] is in the context above? — chrymxbrwn
Thanks. I have updated the error i'm getting and provided exactly what it looks like. df11 is supposed to contain the scraped dataframe. — wolfblitza
The error shared above is the output i get when i run the df11 line — wolfblitza

CodeIt CodeIt · Accepted Answer · 2020-09-27T03:57:57

Pandas pandas.read_html is not capable of parsing dynamically loading html tables.

This page is fetching that table data using an API call

You can use this below code to fetch and parse the API response

import requests
import pandas as pd

headers = {
    'accept': 'application/json, text/plain, */*',
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.121 Safari/537.36',
    'referer': 'https://nextgenstats.nfl.com/',
    'accept-language': 'en-US,en;q=0.9,hi;q=0.8',
}

response = requests.get('https://appapi.ngs.nfl.com/statboard/receiving?season=2020&seasonType=REG&week=2', headers=headers)

df = pd.read_json(response.content)
df.to_csv ('NFL_Receiving_Page1.csv', index=False)

See it in action here

Python - Web scraping

2 Answers