0
votes

I am trying to scrape some data from the website Sporcle (specifically the Date Earned from one of the Badges) but the XPath that I got from [F12-->right-clicking the element-->Copy-->Copy XPath] does not seem to work with the google sheets command IMPORTXML; all I ever get is #N/A.

=IMPORTXML("https://www.sporcle.com/user/Jimmy/badges/earned/","//*[@id='badge-container']/div[1]/div[3]")

1

1 Answers

0
votes

Website uses dynamic rendering. So, classic methods don't work. I see 3 ways to do it :

  • With IMPORTXML : we retrieve the JSON data from a script element and we parse it with formulas.

  • With IMPORTXML+ImportJSON script : we retrieve the JSON data from a script element and we parse it with the script (cleaner).

  • With IMPORTFROMWEB addon (number of requests are limited in the "free" plan).

Solution 1 :

Output :enter image description here

First, we extract the JSON data in A1 with IMPORTXML and the following formula :

=IMPORTXML(B1;"substring-before(substring-after(//*[contains(text(),'badge_limiter')],'var badgeList = [{'),'}]')")

Then we parse the data with a combination of multiple formulas. In J2 we write :

=QUERY(ARRAYFORMULA(SPLIT(TRANSPOSE(SPLIT(SUBSTITUE(SUBSTITUE(SUBSTITUE(REGEXREPLACE(M1;"(""\w+?_\w+?"":)";"");""",";""";");"""";"");"},";"");"{"));";"));"select Col1,Col6")

Solution 2 :

Output : IJS

First, we extract the JSON data in A1 with IMPORTXML and the following formula :

=IMPORTXML(B1;"substring-before(substring-after(//*[contains(text(),'badge_limiter')],'var badgeList = '),'}]')")&"}]"

Then we parse the data with the script. Formula used in F1 is :

=ImportJSONFromSheet("Feuille 15";"/badge_name,/earned_date")

Where Feuille 15 is the name of the sheet I'm working with. The rest is to select the columns of interest.

Solution 3 :

Output : IFW

XPath used for badges names and dates : :

//td[@class='left-align link-col col-width-1']
//td[@class="col-width-3"]

Then we pass the formula in B5:

=IMPORTFROMWEB(C1;C2:D2;B3:C3)

Note : be sure to set jsRendering to TRUE.

Side note : I'm based in Europe, so you'll probably need to replace ; with , in the formulas.