When using dplyr's "group_by" and "mutate", if I understand correctly, the data frame is split in different sub-dataframes according to the group_by argument. For example, with the following code :
set.seed(7)
df <- data.frame(x=runif(10),let=rep(letters[1:5],each=2))
df %>% group_by(let) %>% mutate(mean.by.letter = mean(x))
mean() is applied successively to the column x of 5 sub-dfs corresponding to a letter between a & e.
So you can manipulate the columns of the sub-dfs but can you access the sub-dfs themselves ? To my surprise, if I try :
set.seed(7)
data <- data.frame(x=runif(10),let=rep(letters[1:5],each=2))
data %>% group_by(let) %>% mutate(mean.by.letter = mean(.$x))
the result is different. From this result, one can infer that the "." df doesn't represent successively the sub-dfs but just the "data" one (the group_by function doens't change anything).
The reason is that I want to use a stat function that take a data frame as an arguments on each of this sub-dfs.
Thanks !
?do
– akrundo.call(rbind, lapply(split(df, df$let), myfun))
– Frankdata %>% group_by(let) %>% mutate(mean.by.letter = mean(x))
(unless I'm missing something) but will likely be slower because of the extrado
-call – talat