1
votes

I am looking for a solution in google spreadsheet to check if a domain has any content on its site or shows a blank page/brings a 404 error.

I am looking for a way to get two different outcomes (1 and 0) by these requirements:

  • 1 if domain is existing and there is content on this domain
  • 0 if domain is not existing
  • 0 if domain is existing but can't be reached (404)
  • 0 if domain is existing and can be reached but has a blank page

Examples are found in the following spreadsheet: https://docs.google.com/spreadsheets/d/1gcdF_NdhYX4vBJgwP-cAVsTmeO2WgrynND2f63Zi3Lk/edit#gid=0

I was trying to get some date from the domains with IMPORTDATA, IMPORTHTML and IMPORTXML (as a next step I would add another column that gives me 1 if content is not cell is not empty, 0 else:

=if(isna(IMPORTDATA(A1))=FALSE;1;0)

=if(isna(importhtml(A9;"list";1));"";transpose(IMPORThtml(A9;"list";1)))

=IMPORTXML(A13;"//h:h1")

But these formulas are not reliable enough to handle the task. I'd be a big fan of IMPORTXML, but as I read so far it is currently not working in new google spreadsheets (not even the official examples are working for me...).

Is there any way to solve this problem for about 1000 domains? Thanks in advance!

1

1 Answers

1
votes

You can try creating a script in the script editor:

function SOverflowChecker( uri )
{
var response_code ;
try {
response_code = UrlFetchApp .fetch( uri ) .getResponseCode() .toString() ;
}
catch( error ) {
response_code = error .toString() .match( / returned code (ddd)./ )[1] ;
}
finally {
return response_code ;
}
}

Save it then in the cell use this code:

=SOverflowChecker(RowValueOfURL)

RowValueOfURL being the row for whatever your URL is on.

So for example if the URL is in B2:

=SOverflowChecker(B2)

It will check the website and return a status code, based on the status code you can evaluate the URL is dead or alive.

Reference: http://www.tinkeredge.com/blog/web-usability/check-on-page-for-broken-links-with-google-docs/

Hope this helps