1
votes

I am looking to tidy a nested data frame and I'm having some difficulties. I can reshape the data fine using one case, but I am looking to iterate over the entire data frame by case.

My data looks like this:

df <- tibble(
          case = c("a","a","b","b","c","c"),
          year = c(1990,2000,1990,2000,1990,2000),
          var1 = round(runif(6,0,1), 2),
          var2 = round(runif(6,10,20), 2)

)

I can perform the task I would like to with only one case using tidyr

 df %>% 
  filter( case == "a") %>%
  gather(var, value, -c(1:2)) %>%
  spread(year, value)

Output:

#      case  var  `1990` `2000`
#     <chr> <chr>  <dbl>  <dbl>
#    1 a     var1   0.850  0.540
#    2 a     var2  14.4   16.7  

How can I use purrr or another functional programming tool to vectorize this operation and perform the same action with all of my cases and bind them into one data frame? Some combination of "nest" and "map"?

Thank you!

2
Maybe df %>% nest(-case) %>% mutate(output = map(data, ~gather(.x, var, val, -year) %>% spread(year, val))) %>% unnest(), but it seems like there's a more elegant way to write that - alistaire
Ah: df %>% gather(var, val, var1:var2) %>% spread(year, val) - alistaire

2 Answers

3
votes

Do not gather the case column.

set.seed(1234)

df <- tibble(
  case = c("a","a","b","b","c","c"),
  year = c(1990,2000,1990,2000,1990,2000),
  var1 = round(runif(6,0,1), 2),
  var2 = round(runif(6,10,20), 2)
)

library(tidyverse)

df %>% 
  gather(var, value, -c(1:2)) %>%
  spread(year, value)
# # A tibble: 6 x 4
#   case  var   `1990` `2000`
#   <chr> <chr>  <dbl>  <dbl>
# 1 a     var1   0.110  0.620
# 2 a     var2  10.1   12.3  
# 3 b     var1   0.610  0.620
# 4 b     var2  16.7   15.1  
# 5 c     var1   0.860  0.640
# 6 c     var2  16.9   15.4  
0
votes

Another option could be using dcast from 'reshape2package. But 1st it we need to gathervar1and var2 columns using gather.

    library(tidyverse)
    library(reshape2)
    set.seed(1234)
    df <- tibble(
      case = c("a","a","b","b","c","c"),
      year = c(1990,2000,1990,2000,1990,2000),
      var1 = round(runif(6,0,1), 2),
      var2 = round(runif(6,10,20), 2)

    )
    # User gather to combine var1 and var2 and then apply dcast
    gather(df, var, val, var1:var2) %>% dcast(case+var ~ year, value.var = "val")
  # Result
  #    case  var  1990  2000
  #  1    a var1  0.11  0.62
  #  2    a var2 10.09 12.33
  #  3    b var1  0.61  0.62
  #  4    b var2 16.66 15.14
  #  5    c var1  0.86  0.64
  #  6    c var2 16.94 15.45