0
votes

I need to scrape images' source URLs from a directory's linked web pages to columns into a Google Sheet.

I think using IMPORTXML function would be the easiest solution, but I get the #N/A "Imported content is empty." error every time.

I have tried to use this extension as well to define XPath, but still the same error.

The page's source code, where image source URL is:

<div class="centerer" id="rbt-gallery-img-1">
  <i class="spinner">
    <span></span>
  </i>
  <img data-lazy="//i.example.com/01.jpg" border="0"/>
</div>

So I want to get "i.example.com/01.jpg" value to B2, followed by further images' URLs to adjacent cells.

The function I used is:

=IMPORTXML(A2,"//img[@class='centerer']/@data-lazy")

I tried using spinner instead of centerer, with the same result.

1

1 Answers

0
votes

You can get the string i.example.com/01.jpg with the following XPath-1.0 expression:

substring-after(//div[@class='centerer']/img/@data-lazy,'//')

If you don't need to remove the leading //, you can only use

//div[@class='centerer']/img/@data-lazy

So, in the first case, the Google-Sheets expression could be

=IMPORTXML(A2,"substring-after(//div[@class='centerer']/img/@data-lazy,'//')")

and in the second it could be

=IMPORTXML(A2,"//div[@class='centerer']/img/@data-lazy")