2
votes

I used the Google Geocoding API to request location data for thousands of addresses. The content for each request was parsed as a list. The resulting list was added under the column "get_response".

I'm having major difficulties extracting individual attributes from these lists using the purrr package, and was hoping you wonderful folks could help.

library(tidyverse)
#> Warning: package 'tidyverse' was built under R version 3.5.3

l1 <- list(results = list(list(geometry = list(location = list(lat = 41.9, lng = -87.6)))), status = "OK")
l2 <- list(results = list(list(geometry = list(location = list(lat = 35.1, lng = -70.6)))), status = "OK")

starting_df <- tribble(~name, ~get_response,
                     "first_location", l1,
                     "second_location", l2)
print(starting_df)
#> # A tibble: 2 x 2
#>   name            get_response    
#>   <chr>           <list>          
#> 1 first_location  <named list [2]>
#> 2 second_location <named list [2]>

Below I demonstrate how I am able to extract the attribute one at a time:

pluck(starting_df[1,]$get_response, 1, "results", 1, "geometry", "location", "lat")
#> [1] 41.9
pluck(starting_df[2,]$get_response, 1, "results", 1, "geometry", "location", "lat")
#> [1] 35.1

This is my desired output:

desired_output <- tribble(~name, ~get_response, ~lat,
                                  "first_location", l1, 41.9,
                                  "second_location", l2, 35.1)
print(desired_output)
#> # A tibble: 2 x 3
#>   name            get_response       lat
#>   <chr>           <list>           <dbl>
#> 1 first_location  <named list [2]>  41.9
#> 2 second_location <named list [2]>  35.1

This is my attempt at using purrr::map

new_df <- mutate(starting_df, lat = map(get_response, pluck(1, "results", 1, "geometry", "location", "lat")))
#> Error: Can't convert NULL to function

Created on 2020-04-18 by the reprex package (v0.3.0)

Does anyone know a good way to do this?

2

2 Answers

2
votes

You can use map_dbl from purrr, and apply your pluck using the formula format:

starting_df %>%
mutate(lat=map_dbl(get_response,~pluck(.x,"results",1,"geometry","location","lat")))

# A tibble: 2 x 3
  name            get_response       lat
  <chr>           <list>           <dbl>
1 first_location  <named list [2]>  41.9
2 second_location <named list [2]>  35.1
1
votes

We can use map from purrr

library(dplyr)
library(purrr)
starting_df %>%
    mutate(lat = map_dbl(get_response, ~ pluck(.x, 1, 1, 
             'geometry', 'location', 'lat', .default = NA_real_), 
              .default = NA_real_))
# A tibble: 2 x 3
#  name            get_response       lat
#  <chr>           <list>           <dbl>
#1 first_location  <named list [2]>  41.9
#2 second_location <named list [2]>  35.1

it should also work when some elements doesn't have the 'lat'

l3 <-  list(results = list(list(geometry = 
         list(location = list( lng = -70.6)))), status = "OK")

starting_df <- tribble(~name, ~get_response,
                      "first_location", l1,
                      "second_location", l2,  
                       "third_location", l3)
starting_df %>%
     mutate(lat = map_dbl(get_response, ~ pluck(.x, 1, 1, 
              'geometry', 'location', 'lat', .default = NA_real_), 
                .default = NA_real_))
# A tibble: 3 x 3
#  name            get_response       lat
#  <chr>           <list>           <dbl>
#1 first_location  <named list [2]>  41.9
#2 second_location <named list [2]>  35.1
#3 third_location  <named list [2]>  NA  

Or another option is rowwise from dplyr

starting_df %>% 
     rowwise %>%
     mutate(lat = pluck(get_response, 1, 1, 'geometry', 'location', 'lat'))
# A tibble: 2 x 3
# Rowwise: 
#  name            get_response       lat
#  <chr>           <list>           <dbl>
#1 first_location  <named list [2]>  41.9
#2 second_location <named list [2]>  35.1