0
votes

How can I retrieve the EPS stock information from this website via Google Sheets? http://seekingalpha.com/symbol/KO

What I would like to output in Google Sheets is this value:

1.66

On the site I believe that the relevant XML Code to pull that information from is this:

<span class="right" id="qp_eps">1.66</span>

This Google Sheet formula does not return anything than an error:

=ImportXML("http://seekingalpha.com/symbol/KO/focus", "//span[@id='qp_eps'")
2
Shouldn't your xpath look like : //span[@id='qp_eps']/text() also what if there are thousands of span elements with attributes id="qp_eps" - SomeDude
I haven't been in touch with xPath before and basically have no idea of it. I tried your proposal but this is also not retrieving anything, unfortunately - stefan
do you know if there is only one span with id = 'qp_eps' in that link ? - SomeDude
Well, in the DOM Explorer in Microsoft Edge I can only find it once - stefan
Have you tried to verify that the HTML/DOM you see in a browser after the page is loaded, is the same as the original HTML fetched from the URL you gave to ImportXML()? It looks to me like the original HTML has no span with id='qp_eps'. It must be added later by Javascript or something. Therefore it will not be loaded by ImportXML(). - LarsH

2 Answers

1
votes

The HTML/DOM that you see in the browser developer tools is not the original HTML received from the web server; it has been augmented dynamically by Javascript running in the browser.

You can check this going to the URL you're importing (http://seekingalpha.com/symbol/KO/focus) in a browser and doing a "Save" (HTML only). Then examine the saved file. It doesn't contain any element with id="qp_eps", does it?

If you have curl or wget, you can use them instead to fetch the original HTML from the web server and verify what it contains. This is the HTML that ImportXML() sees, so you'll have to write your XPath with this HTML in mind.

1
votes

I Have a workaround for you:

Using IMPORTDATA instead of IMPORTXML, you can use Index which oddly, even with different tickers, is quite consistent:

There are two ways to go about it - First is using Index twice:

=INDEX(SORT(INDEX(IMPORTDATA("http://seekingalpha.com/symbol/KO"),,55)),3)

And if you want to clean it up to only return the value you can substitute the "eps:" portion:

=SUBSTITUTE(INDEX(SORT(INDEX(IMPORTDATA("http://seekingalpha.com/symbol/KO"),,55)),3),"eps:","")

Second is to use index again but instead of Indexing twice, you can just join the values and regex out what you want:

=REGEXEXTRACT(CONCATENATE(SORT(INDEX(IMPORTDATA("http://seekingalpha.com/symbol/AAPL"),,55))),"eps:(.*)NOW_P")

Here are two screenshot of the two types:

enter image description here

enter image description here