1
votes

I need to do a summarise of a bunch of variables after grouping. There are couple of variables to which I need to apply different functions and there are around 25 variables to which I need to apply the same function. In my view it should be somehow optimized, so that I do not need to write manually 25 times the name of a new variable and the operation which should be done in order to get it.

I have tried two options via dplyr package, however they were not succesfull. My attempts on a toy example are presented further:

library('dplyr')
df <- data.frame(letter = c('A', 'A', 'B', 'C', 'A', 'B'), 
                 group = c('group1', 'group1', 'group2', 'group3', 'group1', 'group2'),
                 var1= c(1,2,3,4,5,6), 
                 var2=c(6,6,6,6,6,6),
                 var3=c(2,2,2,2,2,2), 
                 var4=c(5,5,5,5,5,5))
var_names <-c('var2', 'var3', 'var4')
groupped <-df%>%
      group_by(letter, group)%>%
      summarise(var1_mean = mean(var1), 
                freq = n(),
                for (varp in var_names) {
                  nam <- paste(varp, "_sum", sep = "")
                  assign(nam, eval(parse(text=paste0("sum(", varp,")"))))
                }
                )

I got the error, that assign() does not work with dplyr, so I have tried with these:

groupped <- df%>%
  group_by(letter, group)%>%
  summarise(var1_mean = mean(var1), 
            freq=n(),
            for(i in vars){
            nam <- paste0("sum", i) 
            !!sym(nam) := sum(i)
             })

I have an idea how to accomplish this task via arrange() in a loop by binding column by column via cbind() to a df, but this is way too inefficient, so that just creation 25 variables manually seems to be more efficient:) Any ideas how to automate this process?

1
What is your expected output in new_col_name or nam?iago
does not matter. But I have edited it, to keep the name consistencyplyusha
My question is what is your expected output? independent on the name.iago

1 Answers

1
votes

You can use summarise_at for this purpose:

df %>% summarise_at(vars(var_names), list(mean = mean, sum = sum))

If there are additionally some functions which you do not want to apply to all the columns you would need to do it separately and left_join the results:

df %>% 
 group_by(letter, group) %>%
 summarise(freq = n()) %>%
 left_join(df %>% 
             group_by(letter, group)%>%
             summarise_at(vars(var_names), list(mean = mean, sum = sum)),
           by = c("letter", "group")
           )