0
votes

I read an article that explained how in Google Sheets you can use the importXML() function to access a webpage, scrape data from it, and place the results in a cell. For example, if I place a website URL in cell A1 and the following formula in cell A2:

=importxml(A1,"//meta[@name='description']/@content")

then A2 will show the contents of the meta description tag on the webpage specified in A1.

I would like to use a similar approach but placing a Google query in cell A1.

I copied the following formula (which I found in an article in a UK journal) into cell A1:

=ArrayFormula(regexreplace(regexextract(importxml("https://www.google.co.uk/search?q=trade+balance&pws=0&gl=UK&num=50","//h3[@class='r']/a/@href"), "http.*&sa\b"), "&sa", ""))

It filled 50 rows of data.

Then I changed the num parameter to "num=20" and the response was "Could not fetch URL ..."

Then I entered the same formula into another cell, replacing the URL with "https://www.google.com" and the gl parameter with "gl=US" and the response again was "Could not fetch URL ..."

So it looks like Google in the UK might let you do this periodically, by Google in the US does not.

Can anyone suggest an alternative? Do you need a key to have permission?

BTW - I read on a forum that you can use Google's Custom Search API and UrlFetch() to gather search results and populate a Google Sheet. But before trying that I thought I first ask about the possibility of using importXML.

Thank you.

1
so what website you wanna scrape specifically? because it depends on each site if its possible or notplayer0
Thanks for replying. I want to scrape a Google search results page. In my example I was using this function call: importxml("google.co.uk/…CurtisD

1 Answers

1
votes

Google has lately been blocking search requests made from both Apps Scripts and Google Sheets though the requests are made from one of their own IPs. They probably do this to prevent scraping. There are occasions where it works and you are able to scrape Google search, however, it's time limited so results will vanish after a few minutes/hours. You can try to scrape Bing, DuckDuckGo or other search engines.