5
votes

I encounter a lot of lists nested in data.frame columns, and don't see any generic method for flattening these when possible - i.e. when the nested element is potentially coercible into a data.frame with same number of lines as the parent. Consider these examples of such nestings:

require(dplyr)
data_frame(a=1:3, b = c('a','b','c'), c = list('cats','dogs','birds'))
#> # A tibble: 3 x 3
#>       a     b         c
#>   <int> <chr>    <list>
#> 1     1     a <chr [1]>
#> 2     2     b <chr [1]>
#> 3     3     c <chr [1]>
data_frame(a=1:3, b = c('a','b','c'), c = list(iris[1:3,]))
#> # A tibble: 3 x 3
#>       a     b                    c
#>   <int> <chr>               <list>
#> 1     1     a <data.frame [3 x 5]>
#> 2     2     b <data.frame [3 x 5]>
#> 3     3     c <data.frame [3 x 5]>
data_frame(a=1:3, b = c('a','b','c'), c = list(iris[1,], iris[2,], iris[3,]))
#> # A tibble: 3 x 3
#>       a     b                    c
#>   <int> <chr>               <list>
#> 1     1     a <data.frame [1 x 5]>
#> 2     2     b <data.frame [1 x 5]>
#> 3     3     c <data.frame [1 x 5]>

Is there an elegant generic way to flatten these? The closest I've found is jsonlite::flatten, which claims to "flatten nested data frames", but it seems unable to handle nested lists such as in these examples.

1

1 Answers

8
votes

One option is unnest

library(tidyr)
data_frame(a=1:3, b = c('a','b','c'), c = list('cats','dogs','birds')) %>%
    unnest
# A tibble: 3 x 3
#     a     b     c
#  <int> <chr> <chr>
#1     1     a  cats 
#2     2     b  dogs
#3     3     c birds


data_frame(a=1:3, b = c('a','b','c'), c = list(iris[1:3,])) %>% 
          unnest
# A tibble: 9 x 7
      a     b Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#  <int> <chr>        <dbl>       <dbl>        <dbl>       <dbl>  <fctr>
#1     1     a          5.1         3.5          1.4         0.2  setosa
#2     1     a          4.9         3.0          1.4         0.2  setosa
#3     1     a          4.7         3.2          1.3         0.2  setosa
#4     2     b          5.1         3.5          1.4         0.2  setosa
#5     2     b          4.9         3.0          1.4         0.2  setosa
#6     2     b          4.7         3.2          1.3         0.2  setosa
#7     3     c          5.1         3.5          1.4         0.2  setosa
#8     3     c          4.9         3.0          1.4         0.2  setosa
#9     3     c          4.7         3.2          1.3         0.2  setosa

data_frame(a=1:3, b = c('a','b','c'), c = list(iris[1,], iris[2,], iris[3,])) %>% 
       unnest
# A tibble: 3 x 7
#      a     b Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#   <int> <chr>        <dbl>       <dbl>        <dbl>       <dbl>  <fctr>
#1     1     a          5.1         3.5          1.4         0.2  setosa
#2     2     b          4.9         3.0          1.4         0.2  setosa
#3     3     c          4.7         3.2          1.3         0.2  setosa