attribute value in HTML tag changes after using requests.get and parsing with BeautifulSoup

Question

I am trying to scrape Yahoo Finance (https://finance.yahoo.com/quote/AAPL), however the attribute value associated with the data point I am trying to get changes. As you can see from picture 1, the "span" tag has attribute data-reactid="52". (Highlighted in blue on picture 1)

My code to get this data point is the following:

home_page = "https://finance.yahoo.com/quote/AAPL"
response = requests.get(home_page)
print(response.status_code)
soup = BeautifulSoup(response.content,'lxml')

header = soup.find("div", attrs = {'id':'quote-header-info'})
company_name = header.find("h1", attrs = {'data-reactid':'7'}).text
price = soup.find("span", attrs = {'span':'data-reactid':'52'})

Unfortunately, this returns a value of None. (I have used a different parser (html5lib) but get the same response)

After inspecting the soup, I noticed that the attribute value associated with this data point had changed. See picture 2 (It is hard to see, but the tag is slightly highlighted in gray, upper section of the image).

Is there anyway to prevent the values from changing? Or what is a workaround this issue?

SpaceTristan SpaceTristan · Accepted Answer · 2019-10-25T02:59:11

Try using the xpath. You can't do it in BS but use lxml.

from lxml import html

home_page = "https://finance.yahoo.com/quote/AAPL"
response = requests.get(home_page)
tree = html.fromstring(response.content)
price = str(tree.xpath('//*[@id="quote-header info"]/div[3]/div[1]//span[1]//text()')[0])

Selenium would also be great for this. But I hope this helps! Let me know if you have any questions.

attribute value in HTML tag changes after using requests.get and parsing with BeautifulSoup

2 Answers