0
votes

I am trying to scrape Yahoo Finance (https://finance.yahoo.com/quote/AAPL), however the attribute value associated with the data point I am trying to get changes. As you can see from picture 1, the "span" tag has attribute data-reactid="52". (Highlighted in blue on picture 1)

My code to get this data point is the following:

home_page = "https://finance.yahoo.com/quote/AAPL"
response = requests.get(home_page)
print(response.status_code)
soup = BeautifulSoup(response.content,'lxml')

header = soup.find("div", attrs = {'id':'quote-header-info'})
company_name = header.find("h1", attrs = {'data-reactid':'7'}).text
price = soup.find("span", attrs = {'span':'data-reactid':'52'})

Unfortunately, this returns a value of None. (I have used a different parser (html5lib) but get the same response)

After inspecting the soup, I noticed that the attribute value associated with this data point had changed. See picture 2 (It is hard to see, but the tag is slightly highlighted in gray, upper section of the image).

Is there anyway to prevent the values from changing? Or what is a workaround this issue?

2

2 Answers

0
votes

Try using the xpath. You can't do it in BS but use lxml.

from lxml import html

home_page = "https://finance.yahoo.com/quote/AAPL"
response = requests.get(home_page)
tree = html.fromstring(response.content)
price = str(tree.xpath('//*[@id="quote-header info"]/div[3]/div[1]//span[1]//text()')[0])

Selenium would also be great for this. But I hope this helps! Let me know if you have any questions.

0
votes

You can use one of the classes that seems stable over time (At least for quite a long time now).

import requests
from bs4 import BeautifulSoup as bs

r = requests.get('https://finance.yahoo.com/quote/AAPL/')
soup = bs(r.content, 'lxml')
print(soup.select_one('.Mb\(-4px\)').text)