2
votes

I'd like to iterate over a series of dataframes and apply the same function to them all.

I'm trying this using tidyr::nest and purrr::map_df. Here's a reprex of the sort of thing I'm trying to achieve.

data(iris)
library(purrr)
library(tidyr)

iris_df <- as.data.frame(iris)
my_var <- 2

my_fun <- function(df) {
  sum_df <- sum(df) + my_var
}

iris_df %>% group_by(Species) %>% nest() %>% map_df(.$data, my_fun)
# Error: Index 1 must have length 1

What am I doing wrong? Is there a different approach?

EDIT: To clarify my desired output. Aiming for new column containing output eg

|Species|Data|my_function_output|
|:------|:---|:-----------------|
|setosa |<tibble>|509.1         |
1
could you give us an example of your desired output? - Pasqui
When you nest(), it actually creates a list column in your 'parent' data.frame (i.e., iris). To do what you want, you need to combine mutate with map like so: %>% mutate(data = map(data, ~my_fun)) - CPak
@CPak ran iris_df %>% group_by(Species) %>% nest() %>% mutate(my_col = map_df(data, ~my_fun)). Returns #Error in mutate_impl(.data, dots) : Evaluation error: Argument 1 must be a data frame or a named atomic vector, not a function. - mark
Apologies @CPak used map_df in error but map doesn't give correct output. iris_df %>% group_by(Species) %>% nest() %>% mutate(my_col = map_dbl(data, my_fun)) as per @Renu gives output needed. - mark
@mark - You're right - I didn't look at the return value of your function - assumed it was a data.frame you were returning. map_dbl works as you've pointed out because you're returning a numeric value - CPak

1 Answers

2
votes

The problem is that nest() gives you a data.frame with a column data which is a list of data.frames. You need to map or sapply over the data column of the nest() output, not the entire nest output. I use sapply, but you could also use map_dbl. If you use map you will end up with list output, and map_df will not work because it requires named input.

iris_df %>% 
  group_by(Species) %>% 
  nest() %>% 
  mutate(my_fun_out = sapply(data, my_fun))

# A tibble: 3 x 3
  Species    data              my_fun_out
  <fct>      <list>                 <dbl>
1 setosa     <tibble [50 x 4]>        509
2 versicolor <tibble [50 x 4]>        717
3 virginica  <tibble [50 x 4]>        859