I've been trying to scrape a table on Wikipedia using Beautifulsoup, but encountered some problems.
Page: https://en.wikipedia.org/wiki/New_York_City Table: enter image description here
Table: "Racial composition"
In the page source, the table seems to start at row 1470.
Here's the code I tried first:
website_url = requests.get('https://en.wikipedia.org/wiki/New_York_City').text
soup = BeautifulSoup(website_url,'lxml')
table = soup.find('table',{'class':'wikitable sortable collapsible'})
headers = [header.text for header in table.find_all('th')]
table_rows = table.find_all('tr')
rows = []
for row in table_rows:
td = row.find_all('td')
row = [row.text for row in td]
rows.append(row)
with open('NYC_DEMO.csv', 'w') as f:
writer = csv.writer(f)
writer.writerow(headers)
writer.writerows(row for row in rows if row)
And here's the error:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-24-e6000bdafe11> in <module>
3 table = soup.find('table',{'class':'wikitable sortable collapsible'})
4
----> 5 headers = [header.text for header in table.find_all('th')]
6
7 table_rows = table.find_all('tr')
AttributeError: 'NoneType' object has no attribute 'find_all'
I suppose this is code from the Wikipedia page that we'd need to get:
<tbody><tr>
<th>Racial composition</th>
<th>2010<sup id="cite_ref-QuickFacts2010_226-1" class="reference"><a href="#cite_note-QuickFacts2010-226">[224]</a></sup></th>
<th>1990<sup id="cite_ref-pop_228-0" class="reference"><a href="#cite_note-pop-228">[226]</a></sup></th>
<th>1970<sup id="cite_ref-pop_228-1" class="reference"><a href="#cite_note-pop-228">[226]</a></sup></th>
<th>1940<sup id="cite_ref-pop_228-2" class="reference"><a href="#cite_note-pop-228">[226]</a></sup>
</th></tr>
<tr>
<td><a href="/wiki/White_American" class="mw-redirect" title="White American">White</a></td>
<td>44.0%</td>
<td>52.3%</td>
<td>76.6%</td>
<td>93.6%
</td></tr>
<tr>
...
I'm guessing it can't locate the right table? There's quite some tables on that page so how do I correctly point towards that table?
Thanks in advance for your help.
read_html
stackoverflow.com/questions/43344580/… – sushanth