I'm trying the extract the value of "Next 5 Years (per annum)" for the stock BABA from the Yahoo Finance "Analysis" tab : https://finance.yahoo.com/quote/BABA/analysis?p=BABA. (It's 2.85% the second row from the bottom).
I have been trying to use those questions:
Scrape Yahoo Finance Financial Ratios
Scrape Yahoo Finance Income Statement with Python
But I can't even extract the data from the page
tried this website as well :
https://hackernoon.com/scraping-yahoo-finance-data-using-python-ayu3zyl
This is the I code wrote the get the web page data
First import the packages:
from bs4 import BeautifulSoup as soup
from urllib.request import urlopen as uReq
Then trying to extract the data from the page:
Url= "https://finance.yahoo.com/quote/BABA/analysis?p=BABA"
r = requests.get(Url)
data = r.text
soup = BeautifulSoup(data,features="lxml")
When looking at type of "data" and "soup" objects I see that
type(data)
<class 'str'>
I can extract somehow the needed data of the row of ">Next 5 Years" using regular expressions.
But when when looking at
type(soup)
<class 'bs4.BeautifulSoup'>
And the data in it is not relevant to the page for some reason.
looks like that (copied only small part of what in the soup object):
soup
<!DOCTYPE html>
<html class="NoJs featurephone" id="atomic" lang="en-US"><head prefix="og:
http://ogp.me/ns#"><script>window.performance && window.performance.mark &&
window.performance.mark('PageStart');</script><meta charset="utf-8"/>
<title>Alibaba Group Holding Limited (BABA) Analyst Ratings, Estimates &
Forecasts - Yahoo Finance</title><meta con
tent="recommendation,analyst,analyst
rating,strong buy,strong
sell,hold,buy,sell,overweight,underweight,upgrade,downgrade,price target,EPS
estimate,revenue estimate,growth estimate,p/e
estimate,recommendation,analyst,analyst rating,strong buy,strong
sell,hold,buy,sell,overweight,underweight,upgrade,downgrade,price target,EPS
estimate,revenue estimate,growth estimate,p/e estimate" name="keywords"/>
<meta content="on" http-equiv="x-dns-prefetch-control"/><meta content="on"
property="twitter:dnt"/><meta content="90376669494" property="fb:app_id"/>
<meta content="#400090" name="theme-color"/><meta content="width=device-
width,
- Is there any other way to extract the needed data that is NOT regular expressions from the object data ?
- How the soup object helps me extract the data (I see it is used a lot, but not sure how to make useful) ?
Thanks in Advance