4
votes

I am hoping to use ddply within a function to summarise groups based on a user determined summary statistic (e.g. the mean, median, min, max), by passing the name of the summary function to apply as a variable in the function call. However, I'm not sure how to pass this to ddply.

Simple e.g.

library(plyr)
test.df<-data.frame(group=c("a","a","b","b"),value=c(1,5,5,15))
ddply(test.df,.(group),summarise, mean=mean(value, na.rm=TRUE))

how could I set this up something like below, with the relevant function passed to ddply (additionally within a function of course, although this should be straightforward once the first problem is solved). Note each summary measure (mean etc.), will require na.rm=TRUE. I could do this by writing my own replacement function for each summary statistic, but this seems overly complex.

Desired:

#fn<-"mean"     
#ddply(test.df,.(group),summarise, fn=fn(value, na.rm=TRUE))

Thanks for any help people can provide.

EDIT! Thanks all for these responses. I initially thought leaving out the quotes was working, however that approach, nor the use of getFunction or match.fun work once fn is specific as part of a function call. What I'm actually hoping to get working is something along the lines of the code below (which returns an error). Apologies for not providing a more thorough example in the first instance...

test.df<-data.frame(group=c("a","a","b","b"),value=c(1,5,5,15))
my.fun <- function(df, fn="mean") {
    summary <- ddply(df,.(group),summarise, summary=match.fun(fn)(value, na.rm=T))
  return(summary)
}
my.fun(test.df, fn="mean")
3
I think if you remove the quotes from the function name it will work fn<-mean.nograpes
A quick follow up. I ended up circumventing this problem by avoiding ddply and using aggregate instead. With that change in place the correct function is called whether quoted or unquoted in my function call, and with or without using getFunction or match.fun. I'd still be interested in knowing how to make this work with ddply, but for now I suppose it highlights the utility of being able to draw on the base functions as well as some of the great contributed packages such as plyr.nickb

3 Answers

4
votes

The function that you provided in the question looks like it should work. (And indeed it took me a few moment to remember why it wouldn't). Here it is again, slightly rewritten for clarity (Iwastemptedtoansweryourquestionwithoutanyspacesiniteither;)

df <- data.frame(
  group = c("a", "a" ,"b" ,"b" ), 
  value = c(1, 5, 5, 15)
)

my_fun <- function(df, fn = "mean") {
  fn <- match.fun(fn)
  ddply(df, .(group), summarise, summary = fn(value, na.rm = TRUE))
}

The reason it doesn't work is a little subtle but comes down to how scoping (the process of looking up the values of variables from their names) works. summarise() uses non-standard evaluation to look up values in data frame, and the environment from which it was called. That works for value, but not for fn because it's not present where summarise() is called, i.e. in ddply().

There are two solutions:

  1. Use the here() function which was added to plyr to work around this problem

    my_fun <- function(df, fn = "mean") {
      fn <- match.fun(fn)
      ddply(df, .(group), here(summarise), summary = fn(value, na.rm = TRUE))
    }
    my_fun(df, "mean")
    
  2. Be slightly less concise and use an explicit function:

    my_fun <- function(df, fn = "mean") {
      fn <- match.fun(fn)
      ddply(df, .(group), function(df) {
        summarise(df, summary = fn(value, na.rm = TRUE))
      })
    }
    my_fun(df, "mean")
    

I now understand how I could have avoided this problem in the first place in the design of plyr, but it requires some custom C/C++ code. It's fixed in dplyr but is unlikely to be ported back to plyr because it might break existing code.

2
votes

You can use getFunction:

fn<-"mean"     
ddply(test.df,.(group),summarise, fn=getFunction(fn)(value, na.rm=TRUE))
#  group fn
#1     a  3
#2     b 10

However, if you put this into a wrapper function you could get lost in the jungle of environments.

1
votes

It works with match.fun:

fn <- "mean"

ddply(test.df, .(group), summarise, fn = match.fun(fn) (value, na.rm = TRUE))
#  group fn
# 1     a  3
# 2     b 10