1
votes

I have no experience coding!

I am having trouble scraping data from a website into my Google spreadsheet. I want to get the Observation number into my spreadsheet form this page

I have tried this but honestly have no idea what I'm doing:

=IMPORTXML(A3,"//*[@id="obsstatcol"]/div/div[1]")

With A3 being the above page URL, and the rest is a mash of some tutorial I found with the XPath copies from the observation value I'm trying to scrape off the page.

Can anyone help me make sense of what the hell I'm trying to do and offer some advice?

Thanks in advance

1

1 Answers

0
votes

Good attempt! However, unfortunately the obvervation number is not determined until after the page loads. That means that your formula:

=IMPORTXML(A3,"//*[@id=""obsstatcol""]/div/div[1]")

yields

{{ shared.numberWithCommas( totalObservations ) }}

so you cannot just use ImportXML() in this case.

However, all is not lost. I opened the network monitor with F12, and saw that the page was making a web request to this url:

https://api.inaturalist.org/v1/observations/observers?verifiable=any&quality_grade=needs_id&user_id=ericthuranira&locale=en-US

to get the observation data, which appears to be in JSON format. E.g. (formatted for readability)

{
  "total_results": 1,
  "page": 1,
  "per_page": 500,
  "results": [
    {
      "user_id": 1265521,
      "observation_count": 121,
      "species_count": 42,
      "user": {
        "id": 1265521,
        "login": "ericthuranira",
        "spam": false,
        "suspended": false,
        "created_at": "2018-10-09T11:43:22+00:00",
        "login_autocomplete": "ericthuranira",
        "login_exact": "ericthuranira",
        "name": "Eric Thuranira",
        "name_autocomplete": "Eric Thuranira",
        "orcid": null,
        "icon": "https://static.inaturalist.org/attachments/users/icons/1265521/thumb.jpeg?1580369132",
        "observations_count": 237,
        "identifications_count": 203,
        "journal_posts_count": 0,
        "activity_count": 440,
        "species_count": 150,
        "universal_search_rank": 237,
        "roles": [],
        "site_id": 1,
        "icon_url": "https://static.inaturalist.org/attachments/users/icons/1265521/medium.jpeg?1580369132"
      }
    }
  ]
}

This is not in XML format, so you'll have to use a JSON parser to do that. Fortunately, somebody has made one for Google Sheets! You can easily get this for yourself by doing the following:

  1. Paste the code from here into your script editor (Tools > Script Editor), and save it as ImportJSON. This gives you your JSON parser.

  2. Taking the "api" URL I mentioned above for the observers, use this formula (assuming URL is in A3)

    =ImportJSON(A3,"/results/observation_count","noHeaders")
    

And this will get you the number you want.