0
votes

I'm trying to scrape some background image urls into a google sheet. Here is an example of the container-

<div class="_rs9 _1xcn">
<div class="_1ue-">
<section class="_4gsw _7of _1ue_" style="background-image: url(https://scontent.x.com/v/t64.5771-25/38974906_464042117451453_1752137156853235712_n.png?_nc_cat=100&amp;_nc_ht=scontent.x.com&amp;oh=c19f15536205be2e1eedb7f7fc7cb61b&amp;oe=5C4442FD)">
<div class="_7p2">
</div>
</section>

I need to get from the https to the question mark after png. I know there's a way to use substring-before/-after but I am having a tough time, particular with escaping quotes.

Here is my attempt. This just gets me an "#N/A":

=IMPORTXML(B2,"substring-before(substring-after(//section[@class='_4gsw _7of _1ue_']/@style, """"background-image: url(""""), """")"""")")

Could anyone help with the full importxml statement? Much appreciated, thanks.

1

1 Answers

0
votes

Your approach was close. Try the following XPath expression:

substring-before(substring-after(//section[@class='_4gsw _7of _1ue_']/@style, 'background-image: url('),'?')

The whole expression could look like this:

=IMPORTXML(B2,"substring-before(substring-after(//section[@class='_4gsw _7of _1ue_']/@style, 'background-image: url('),'?')")