I need to do a summarise of a bunch of variables after grouping. There are couple of variables to which I need to apply different functions and there are around 25 variables to which I need to apply the same function. In my view it should be somehow optimized, so that I do not need to write manually 25 times the name of a new variable and the operation which should be done in order to get it.
I have tried two options via dplyr package, however they were not succesfull. My attempts on a toy example are presented further:
library('dplyr')
df <- data.frame(letter = c('A', 'A', 'B', 'C', 'A', 'B'),
group = c('group1', 'group1', 'group2', 'group3', 'group1', 'group2'),
var1= c(1,2,3,4,5,6),
var2=c(6,6,6,6,6,6),
var3=c(2,2,2,2,2,2),
var4=c(5,5,5,5,5,5))
var_names <-c('var2', 'var3', 'var4')
groupped <-df%>%
group_by(letter, group)%>%
summarise(var1_mean = mean(var1),
freq = n(),
for (varp in var_names) {
nam <- paste(varp, "_sum", sep = "")
assign(nam, eval(parse(text=paste0("sum(", varp,")"))))
}
)
I got the error, that assign()
does not work with dplyr, so I have tried with these:
groupped <- df%>%
group_by(letter, group)%>%
summarise(var1_mean = mean(var1),
freq=n(),
for(i in vars){
nam <- paste0("sum", i)
!!sym(nam) := sum(i)
})
I have an idea how to accomplish this task via arrange()
in a loop by binding column by column via cbind()
to a df, but this is way too inefficient, so that just creation 25 variables manually seems to be more efficient:) Any ideas how to automate this process?
new_col_name
ornam
? – iago