2
votes

I have to work with some data that is in recursive lists like this (simplified reproducible example below):

groups
#> $group1
#> $group1$countries
#> [1] "USA" "JPN"
#> 
#> 
#> $group2
#> $group2$countries
#> [1] "AUS" "GBR"

Code for data input below:

chars <- c("USA", "JPN")
chars2 <- c("AUS", "GBR")

group1 <- list(countries = chars)
group2 <- list(countries = chars2)

groups <- list(group1 = group1, group2 = group2)
groups

I'm trying to work out how to extract the vectors that are in the lists, without manually having to write a line of code for each group. The code below works, but my example has a large number of groups (and the number of groups will change), so it would be great to work out how to extract all of the vectors in a more efficient manner. This is the brute force way, that works:

countries1 <- groups$group1$countries
countries2 <- groups$group2$countries

In the example, the bottom level vector I'm trying to extract is always called countries, but the lists they're contained in change name, varying only by numbering.

Would there be an easy purrr solution? Or tidyverse solution? Or other solution?

4
Will "countries" always be at the same depth within the list? Or could there be a varying number of levels between "groups" and "countries"?camille
It's always the same depth. Thank you.Jeremy K.
I'm finding a few SO posts on extracting from nested lists: this one uses purrr with more deeply nested data, or this, or with regex, or this. I'm not totally satisfied yours is directly a duplicate of any of those but they should helpcamille
@camille very helpful! I'll go through some of the links you've put above, and keep searching deeper, and if my question is a duplicate, I'll close this up.Jeremy K.
what is countries was NOT at the same depth? what then?Ben

4 Answers

2
votes

Add some additional cases to your list

groups[["group3"]] <- list()
groups[["group4"]] <- list(foo = letters[1:2])
groups[["group5"]] <- list(foo = letters[1:2], countries = LETTERS[1:2])

Here's a function that maps any list to just the elements named "countries"; it returns NULL if there are no elements

fun = function(x)
    x[["countries"]]

Map your original list to contain just the elements you're interested in

interesting <- Map(fun, groups)

Then transform these into a data.frame using a combination of unlist() and rep()

df <- data.frame(
    country = unlist(interesting, use.names = FALSE),
    name = rep(names(interesting), lengths(interesting))
)

Alternatively, use tidy syntax, e.g.,

interesting %>% 
    tibble(group = names(.), value = .) %>% 
    unnest("value")

The output is

# A tibble: 6 x 2
  group  value
  <chr>  <chr>
1 group1 USA
2 group1 JPN
3 group2 AUS
4 group2 GBR
5 group5 A
6 group5 B

If there are additional problems parsing individual elements of groups, then modify fun, e.g.,

fun = function(x)
    as.character(x[["countries"]])
1
votes

This will put the output in a list which will handle any number of groups

countries <- unlist(groups, recursive = FALSE)
names(countries) <- sub("^\\w+(\\d+)\\.(\\w+)", "\\2\\1", names(countries), perl = TRUE)

> countries
$countries1
[1] "USA" "JPN"

$countries2
[1] "AUS" "GBR"
1
votes

You can simply transform your nested list to a data.frame and then unnest the country column.

library(dplyr)
library(tidyr)
groups %>% 
  tibble(group = names(groups),
         country = .) %>% 
  unnest(country) %>% 
  unnest(country)
#> # A tibble: 4 x 2
#>   group  country
#>   <chr>  <chr>  
#> 1 group1 USA    
#> 2 group1 JPN    
#> 3 group2 AUS    
#> 4 group2 GBR

Created on 2020-01-15 by the reprex package (v0.3.0)

Since the countries are hidden 2 layers deep, you have to run unnest twice. Otherwise I think this is straightforward.

0
votes

If you actually want to have each vector as a an object in you global environment a combination of purrr::map2/walk and list2env will work. In order to make this work, we have to give the country entries in the list individual names first, otherwise list2env just overwrites the same object over and over again.

library(purrr)
groups <- 
  map2(groups, 1:length(groups), ~setNames(.x, paste0(names(.x), .y)))
walk(groups, ~list2env(. , envir = .GlobalEnv))

This would create the exact same results you are describing in your question. I am not sure though, if it is the best solution for a smooth workflow, since I don't know where you are going with this.