0
votes

I'm scraping data from a large online database (GBIF), which requires three steps: (1) match a GBIF "key" identifier to a species name, (2) send a query to the database, getting a download key ("res") in return, and (3) download, import, and filter the data associated with that species. I've written a function for each of these (not including the actual code here, since it's unfortunately very long and requires login credentials):

get_gbif_key <- function(species) {}
get_gbif_res <- function(gbifkey) {} 
get_gbif_dat <- function(gbifres) {}

I have a list of several hundred species to which I want to apply these three functions in order. I know they work individually, but I can't figure out how to feed them into each other (probably using purrr?) and reference the correct inputs from the nested outputs of the previous function.

So, for example:

> testlist <- c('Gadus morhua','Caretta caretta')
> testkey <- map(testlist, get_gbif_key)
> testkey
[[1]]
[1] 8084280

[[2]]
[1] 8894817

Here's where I'm stuck. I want to feed the keys in this list structure into the next function, but I don't know how to properly reference them using map or other functions. I can do it by manually creating a new list for the next function:

> testlist2 <- c('8084280','8894817')
> testres <- map(testlist2, get_gbif_res)
> testres
[[1]]
<<gbif download>>
  Username: XXXX
  E-mail: [email protected]
  Download key: 0001342-180412121330197

[[2]]
<<gbif download>>
  Username: XXXX
  E-mail: [email protected]
  Download key: 0001343-180412121330197

EDIT: the structure of this output may be posing a problem here. When I run listviewer::jsonedit(testres), it just looks like a normal nested list with entries 0 and 1 holding the two download keys. However, when I run str(testres), I get the following:

> str(testres)
List of 2
 $ :Class 'occ_download'  atomic [1:1] 0001342-180412121330197
  .. ..- attr(*, "user")= chr "XXXX"
  .. ..- attr(*, "email")= chr "[email protected]"
 $ :Class 'occ_download'  atomic [1:1] 0001343-180412121330197
  .. ..- attr(*, "user")= chr "XXXX"
  .. ..- attr(*, "email")= chr "[email protected]"

And, again, for the third one:

> testlist3 <- c('0001342-180412121330197','0001343-180412121330197')
> testdat <- map(testlist3, get_gbif_dat)

Which successfully loads a list object with the desired data into R (it has two unnamed elements, 0 and 1, each of which is a list of 28 requested variables for each species). Any advice for scripting this get_gbif_key %>% get_gbif_res %>% get_gbif_dat workflow in a way that unpacks the preceding list structures correctly?

1
Without a reproducible example, it is difficult to test, Try map(testlist, ~get_gbif_key(.x) %>% get_gbif_res %>% .[3] %>% get_gbif_dat) In the tesres, it is not clear about the structure to comment how to extract the third element. - akrun
When I run that line of code, I get an error related to the middle function, get_gbif_res. I think I need a reference in ` %>% get_gbif_res %>%` to the output of the previous function. I'll edit the example to show what the structure of testres is. - AFH
I cannot test it as I mentioned it earlier - akrun
What I would recommend, if you want to use a map pipeline, would to alter the output of your second function to be just the occ_download key needed for the 3rd function (i.e. remove the other attributes, if they are not needed), then you should be able to map(testlist, get_gbif_key) %>% map(get_gbif_res) %>% map(get_gbif_dat) - Jake Kaupp

1 Answers

0
votes

Here's what you should try based on the evidence provided so far. Basically, the results suggest you should be able to succeed with nested map-ping:

      yourData <- map( unlist(    # to make same class as your single func version
                      map(
                          map(testlist, 
                              get_gbif_key), # returns gbifkeys
                          get_gbif_res)),  # returns gbif_res's
                       get_gbif_dat)      # returns data items

The last item that you showed the structure for is just a list of atomic character vectors with some extra attributes and your functions seems to handle that without difficulty, so mapping should succeed.