0
votes

I am trying to follow the guide posted here: https://medium.freecodecamp.org/how-to-scrape-websites-with-python-and-beautifulsoup-5946935d93fe

I am at this point, where I am supposed to get the name of presumably the stock.

Take out the div of name and get its value

name_box = soup.find(‘h1’, attrs={‘class’: ‘name’})

I suspect I will also have trouble when querying the price. Do I have to replace 'price' with 'priceText__1853e8a5' as found in the html?

get the index price

price_box = soup.find(‘div’, attrs={‘class’:’price’})

Thanks, this would be a massive help.

2

2 Answers

0
votes

If you replace price with priceText__1853e8a5 you will get your result, but I suspect that the class name changes dynamically/is dynamically generated (note the number at the end). So to get your result you need something more robust.

You can target tags in BeautifulSoups with CSS selectors (with select()/select_one() methods. This example will target all <span> tags with class attribute that begins with priceText (^= operator - more info about CSS selectors here).

from bs4 import BeautifulSoup
import requests

r = requests.get('https://www.bloomberg.com/quote/SPX:IND')
soup = BeautifulSoup(r.text, 'lxml')

print(soup.select_one('span[class^="priceText"]').text)

This prints:

2,813.36
0
votes

You have several options to do that.

  1. getting the value by appropriate xPath.

//span[contains(@class, 'priceText__')]

  1. Writing regex to find the exact element.

price_tag = soup.find_all('span', {'class': re.compile(r'priceText__.*?')})

I am not sure with the regex pattern as i am bad in it. Edits are welcome.