Scrape Standard Deviation from Yahoo Finance using Beautiful Soup

Question

I'm trying to scrape some numbers from the Risk Statistics table on a yahoo finance webpage using BeautifulSoup and Python 2.7: https://finance.yahoo.com/quote/SHSAX/risk

So far, I've looked at the html using https://codebeautify.org:

#!/usr/bin/python
from bs4 import BeautifulSoup, Comment
import urllib

riskURL = "https://finance.yahoo.com/quote/SHSAX/risk"
page = urllib.urlopen(riskURL)
content = page.read().decode('utf-8')
soup = BeautifulSoup(content, 'html.parser')

My trouble is actually getting the numbers using soup.find. For example, standard deviation:

    # std should be 13.44
    stdevValue = float(soup.find("span",{"data-reactid":"124","class":"W(39%) Fl(start)"}).text)
    # std of category should be 0.18
    stdevCat = float(soup.find("span",{"data-reactid":"125","class":"W(57%) Mend(5px) Fl(end)"}).text)

Both of these calls to soup.find return none. What am I missing?

FedOpp FedOpp · Accepted Answer · 2018-09-21T18:36:29

From what I read on the web "data-reactid" is a custom attribute used by the react framework to reference components (you can read more here what's data-reactid attribute in html?) and after a couple of tries I noticed that on every reload of the page the data-reactid attributes are different, like random generated.

I think you should try find another approach to achieve this.

Maybe you can try to find a specific element like the "Standard Deviation" row, and then loop down to gather the data.

std_span = next(x for x in soup.find_all('span') if x.text == "Standard Deviation")
parent_div = std_span.parent
for sibling in parent_div.next_siblings:
   for child in sibling.children:
      # do something
      print(child.text)

Hope it helps.

Scrape Standard Deviation from Yahoo Finance using Beautiful Soup

2 Answers