0
votes

I'm trying to ImportXML data to my google sheet from this site:

https://pick224.com/

I used the copy Xpath option to get the 'xpath_query' but that didn't work.

=IMPORTXML("https://pick224.com/","/*[@id='DataRow31623_e9a5c9026c6106']/td[11]")

Later I also tried the copy full Xpath option but that also didn't work, I get the same #N/A error.

Here is the specific element I want to grab from the site

Is there anything preventing me from collecting data from this site or am I just not doing it properly?

I've been trying to solve this in the last few days, so I'd be grateful for any kind of help. Here is my google sheet:

https://docs.google.com/spreadsheets/d/17i5Q5_8c_LqY7S361TPI4NRZZUOdlJ_0aa3vWiVGd7c/edit?usp=sharing

1
Is that iddynamic?user13338210
I believe it isn't, because each row belongs to one unique player. In 'DataRow31623_e9a5c9026c6106' the '31623' identifies the player.Juliusz Dąbrowski
Could you try using XPath Finder extension? It gives the full path.user13338210
I already tried to paste in the full path but that didn't work: /html/body/div[2]/div[2]/div/div/div[1]/form/div/div/div/table[1]/tbody/tr[2]/td[11]Juliusz Dąbrowski
Observing the source code I found this url is called by the javascript. Can you query this url? However, in your case the url might be different. Especially the last part i.e 9b0d7000447c1af4abb545f3b8ae . Change accordingly and try it once please and tell your results :)user13338210

1 Answers

0
votes

The url for downloading the whole .csv data seems fixed (appsession part is not needed).

https://c0dcn566.caspio.com/dp/9b0d7000447c1af4abb545f3b8ae?RecordID=&PageID=2&PrevPageID=&cpipage=3&download=1&CPIorderBy=P1GP&CPISortType=desc&downloadFormat=csv

However, you can't import it directly with an IMPORTDATA step since the file is too large. So you'll have to write a GoogleAppScript.

If you think the url of the .csv file could change, you can use IMPORTFROMWEB with javascript rendering activated to retrieve the address. The addon is not entirely free. Check the limit of the free plan.

Hockey

A1 : website of interest; A2 : XPath expression; A3:B3 : options to activate javascript (TRUE)

Formula in A4 (Concat+Regex):

="https://c0dcn566.caspio.com/dp/"&REGEXEXTRACT(IMPORTFROMWEB(A1;A2;A3:B3);"""appKey"":""(.+?)""")&"?RecordID=&PageID=2&PrevPageID=&cpipage=3&download=1&CPIorderBy=P1GP&CPISortType=desc&downloadFormat=csv"

This way, you always have the last updated url for downloading the .csv.

You could also use IMPORTFROMWEB to retrieve the data but for the 1st page only (100 resuts). Incrementing the cpipage parameter in the url is not enough to load and fetch the rest of the data in GoogleSheets. Example with the first 5 columns :

ImpWeb

The GoogleAppScript part. In script editor, you can now use this kind of script (source) :

function importCSVFromWeb() {

      // Provide the full URL of the CSV file.
      //var csvUrl = "https://c0dcn566.caspio.com/dp/9b0d7000447c1af4abb545f3b8ae?RecordID=&PageID=2&PrevPageID=&cpipage=3&download=1&CPIorderBy=P1GP&CPISortType=desc&downloadFormat=csv";
      var csvUrl = SpreadsheetApp.getActiveSheet().getRange(4, 1).getValue();
      var csvContent = UrlFetchApp.fetch(csvUrl).getContentText();
      var csvData = Utilities.parseCsv(csvContent);

      var sheet = SpreadsheetApp.getActiveSheet();
      sheet.getRange(1, 1, csvData.length, csvData[0].length).setValues(csvData);

    }

You can declare directly the csvUrl variable with the fixed url of the csv. Or with the url you've build with IMPORTFROMWEB. In that case, you'll have to specify the position of the cell which contains the address (line 4, column 1 in the example).

Once you run the script, the active sheet will be populated with data. Warning : it could stress your CPU. When the data is loaded, you can create a new sheet and use QUERY, FILTER, SORT,... to transform the table previoulsy created. If you want something automatic (script launched without your intervention), you should probably use triggers.

Alternatively, you could also write a custom function to load the csv. But it's a bit more tricky since you need to filter the data before importing it. Otherwise you'll get a "too large" error. For example, to keep only the players of the "AHL" league, you can write something like this (credits to @Cooper for the base script) :

function importCSV(url) {
  var csvContent = UrlFetchApp.fetch(url).getContentText();
  var csvData = Utilities.parseCsv(csvContent);
  var x = 'AHL';
  var d=0;
 for(var i=0;(i-d)<csvData.length;i++) {
    if(csvData[i-d][0]!==x) {
      csvData.splice(i-d++,1);
    }
  }
  return csvData;
 }

So now, you can use the "importCSV" formula (importCSV(url.retrieved.with.IMPORTFROMWEB or fixed.url)) to load the data :

Formula

That's it for now.