0
votes

I have limited html and xml knowledge and I'm trying to scrape some URLs to obtain a block of text using =ImportXML() on Google Spreadsheets.

Here is the link: http://www.worldbank.org/projects/P082167/agricultural-transition?lang=en&tab=overview

<div id="abstractmore" style="">

        <h2>ABSTRACT*</h2>
        <p>

            The project aims to...be responsible for the general management of the project.<a href="javascript:;" id="rdless" class="more">&nbsp;Read LessĀ»</a>

        </p>

    </div>

I am trying to extract the complete abstract. I used Chrome's inspect element tool and browsed through various tutorials...I was able to come up with these xpaths from what I've read:

//div[@id='abstractmore']/p/text()
//*[@id="abstractmore"]/p/text()

These are returning with error: Imported content is empty. I am completely lost as to how to figure out xpath!?

1

1 Answers

0
votes

There is no such @id='abstractmore', but are:

id="abstract"
and
<span class="more"><a href="javascript:;" id="rdmore" class="more">&nbsp;Read MoreĀ»</a></span>

nevertheless that not helps, it's not clear why Google Spreadsheet function not extracting H2:

//*[@id="dataSections"]/*[@id="leftSection"]/*[@id="box2"]/*[@id="box2Inner"]/*[@id="tabContent"]/h2 

probably for the same reason not extracts <p> content