I would like to perform calculations on a nested data frame (stored as a list-column), and add the calculated variable back to each dataframe using purrr functions. I'll use this result to join to other data, and keeping it compact helps me to organize and examine it better. I can do this in a couple of steps, but it seems like there may be a solution I haven't come across. If there is a solution out there, I haven't been able to find it easily.
Load libraries. example requires the following packages (available on CRAN):
library(dplyr)
library(purrr)
library(RcppRoll) # to calculate rolling mean
Example data with 3 subjects, and repeated measurements over time:
test <- data_frame(
id= rep(1:3, each=20),
time = rep(1:20, 3),
var1 = rnorm(60, mean=10, sd=3),
var2 = rnorm(60, mean=95, sd=5)
)
Store the data as nested dataframe:
t_nest <- test %>% nest(-id)
id data
<int> <list>
1 1 <tibble [20 x 3]>
2 2 <tibble [20 x 3]>
3 3 <tibble [20 x 3]>
Perform calculations. I will calculate multiple new variables based on the data, although a solution for just one could be expanded later. The result of each calculation will be a numeric vector, same length as the input (n=20):
t1 <- t_nest %>%
mutate(var1_rollmean4 = map(data, ~RcppRoll::roll_mean(.$var1, n=4, align="right", fill=NA)),
var2_delta4 = map(data, ~(.$var2 - lag(.$var2, 3))*0.095),
var3 = map2(var1_rollmean4, var2_delta4, ~.x -.y))
id data var1_rollmean4 var2_delta4 var3
<int> <list> <list> <list> <list>
1 1 <tibble [20 x 3]> <dbl [20]> <dbl [20]> <dbl [20]>
2 2 <tibble [20 x 3]> <dbl [20]> <dbl [20]> <dbl [20]>
3 3 <tibble [20 x 3]> <dbl [20]> <dbl [20]> <dbl [20]>
my solution is to unnest
this data, and then nest
again. There doesn't seem to be anything wrong with this, but seems like a better solution may exist.
t1 %>% unnest %>%
nest(-id)
id data
<int> <list>
1 1 <tibble [20 x 6]>
2 2 <tibble [20 x 6]>
3 3 <tibble [20 x 6]>
This other solution (from SO 42028710) is close, but not quite because it is a list rather than nested dataframes:
map_df(t_nest$data, ~ mutate(.x, var1calc = .$var1*100))
I've found quite a bit of helpful information using the purrr Cheatsheet but can't quite find the answer.