I'm scraping data from a large online database (GBIF), which requires three steps: (1) match a GBIF "key" identifier to a species name, (2) send a query to the database, getting a download key ("res") in return, and (3) download, import, and filter the data associated with that species. I've written a function for each of these (not including the actual code here, since it's unfortunately very long and requires login credentials):
get_gbif_key <- function(species) {}
get_gbif_res <- function(gbifkey) {}
get_gbif_dat <- function(gbifres) {}
I have a list of several hundred species to which I want to apply these three functions in order. I know they work individually, but I can't figure out how to feed them into each other (probably using purrr?) and reference the correct inputs from the nested outputs of the previous function.
So, for example:
> testlist <- c('Gadus morhua','Caretta caretta')
> testkey <- map(testlist, get_gbif_key)
> testkey
[[1]]
[1] 8084280
[[2]]
[1] 8894817
Here's where I'm stuck. I want to feed the keys in this list structure into the next function, but I don't know how to properly reference them using map or other functions. I can do it by manually creating a new list for the next function:
> testlist2 <- c('8084280','8894817')
> testres <- map(testlist2, get_gbif_res)
> testres
[[1]]
<<gbif download>>
Username: XXXX
E-mail: [email protected]
Download key: 0001342-180412121330197
[[2]]
<<gbif download>>
Username: XXXX
E-mail: [email protected]
Download key: 0001343-180412121330197
EDIT: the structure of this output may be posing a problem here. When I run listviewer::jsonedit(testres), it just looks like a normal nested list with entries 0 and 1 holding the two download keys. However, when I run str(testres), I get the following:
> str(testres)
List of 2
$ :Class 'occ_download' atomic [1:1] 0001342-180412121330197
.. ..- attr(*, "user")= chr "XXXX"
.. ..- attr(*, "email")= chr "[email protected]"
$ :Class 'occ_download' atomic [1:1] 0001343-180412121330197
.. ..- attr(*, "user")= chr "XXXX"
.. ..- attr(*, "email")= chr "[email protected]"
And, again, for the third one:
> testlist3 <- c('0001342-180412121330197','0001343-180412121330197')
> testdat <- map(testlist3, get_gbif_dat)
Which successfully loads a list object with the desired data into R (it has two unnamed elements, 0 and 1, each of which is a list of 28 requested variables for each species). Any advice for scripting this get_gbif_key %>% get_gbif_res %>% get_gbif_dat workflow in a way that unpacks the preceding list structures correctly?
map(testlist, ~get_gbif_key(.x) %>% get_gbif_res %>% .[3] %>% get_gbif_dat)In thetesres, it is not clear about the structure to comment how to extract the third element. - akrunget_gbif_res. I think I need a reference in ` %>% get_gbif_res %>%` to the output of the previous function. I'll edit the example to show what the structure of testres is. - AFHocc_downloadkey needed for the 3rd function (i.e. remove the other attributes, if they are not needed), then you should be able tomap(testlist, get_gbif_key) %>% map(get_gbif_res) %>% map(get_gbif_dat)- Jake Kaupp