1
votes

I'm trying to use the ddply-summarise function (e.g. mean()) within a custom function. However, instead of resulting in the means for each group, it results in a dataframe showing the mean of all observations.

Many thanks already in advance for your help!

library(plyr)
library(dplyr)
df <- data.frame(Titanic)
colnames(df)

# ddply-summarise - Outside of function
df.OutsideOfFunction <- ddply(df, c("Class","Sex"), summarise,
                          Mean=mean(Freq))

# new function
newFunction <- function(data, GroupVariables, ColA){ 
  mean(data[[ColA]])
  plyr::ddply(data, GroupVariables, summarise,
                       Mean=mean(data[[ColA]]))
}

#ddply-summarise - InsideOfFunction
df.InsideOfFunction <- newFunction(data=df,
                                   GroupVariables=c("Class","Sex"),
                                   ColA ="Freq")
1
I get errors trying to execute your code. Does it work in your environment? - Pawel Stradowski
For me it works. A colleague just tried and first received an error message - after closing/reopening RStudio, it worked also for her. - Anja

1 Answers

1
votes

It should work this way, by converting ColA input first to symbol and then evaluating it:

# new function
newFunction <- function(data, GroupVariables, ColA){ 
  #mean(data[[ColA]])
  plyr::ddply(data, GroupVariables, summarise, Mean=mean(UQ(sym(ColA))))
}

Please take a look also in this post as to why this happens. It's the first time i've seen it myself so i am not the best one to explain it - it looks like it depends on the way summarize and/or other plyr or dplyr functions accept parameters as input (with/without quote) and how these are evaluated.

Also since you are loading dplyr as well, you can stick to one package if you like and write your function like this:

newFunction <- function(data, GroupVariables, ColA){
  data %>% group_by(.dots=GroupVariables) %>% summarise(Mean=mean(UQ(sym(ColA))))
}

Hope this helps