0
votes

I am trying to call an aggregate function in R "on the fly", by being able to create and pass strings as arguments to the aggregate() function. I have been able to do that for almost all parameters except for the subset parameter.

for example: I want to run the following piece of code: gg1 <- aggregate(metric ~ grouping_var1 + grouping_var2, data=data.set, FUN= "mean", subset=exclude.filter==0)

I can create string variables that I can pass to aggregate function:

    current.metric <- "metric"
    rhs <- c("grouping_var1","grouping_var2")
    func1 <- "mean"
    filter <- "exclude.filter == 0"
    gg1 <- aggregate(as.formula(paste(current.metric, paste(rhs, collapse="+"), sep="~")), data=data.set, FUN= "mean", subset=exclude.filter==0)

The above piece of code works as long as subset=exclude.filter==0 is provided as the argument. I am unable to figure out how I can use something like subset = filter. I have tried expression(filter), substitute(filter), as.formula(filter) as the arguments in the subset parameter, but I am not parsing the expression in the right environment I think.

An alternative is to construct the whole string and pass it to eval(). However, I am wondering if it can be done within the aggregate function itself. Can somebody point me in the right direction to solve this please?

2
Can you add tags explaining what platform of coding language you are using. Be more explicit. - Joshua Duxbury
I apologise, I am new to this forum. I had added "R" tag earlier, but haven't been able to add "R-Windows". I am using R on windows 8 (within RStudio, if that helps) - Ravi Chandra
Why do you want to do this? There's a call to model.frame() evaluated in the parent.frame() going on here, which is what uses the subset argument. I think it's going to be hard to hack that. - Thomas
Let me give an example: I have 2 metrics - Revenue & Profit. I only want to aggregate the dataset for valid values of Revenue and Profit. I might have 2 different filters for these metrics (exclude.revenue.filter, exclude.profit.filter). Therefore, depending on what my current.metric parameter is set to, the "filter" parameter might change, which is what I want to account for. - Ravi Chandra
Wouldn't it be easier to use the "normal" aggregate method (I mean, not the formula interface)? - talat

2 Answers

0
votes

A better answer is "find a better way" but you can still pass strings like this:

data.set <- mtcars
current.metric <- "mpg"
rhs <- c("vs","gear")
func1 <- "mean"
filt <- "am == 0"

aggregate(as.formula(paste(current.metric, paste(rhs, collapse="+"), sep="~")),
          data=data.set, FUN= func1, subset = eval(parse(text = filt)))
#   vs gear      mpg
# 1  0    3 15.05000
# 2  1    3 20.33333
# 3  1    4 21.05000

aggregate(mpg ~ vs + gear, data = mtcars[mtcars$am == 0, ], FUN = mean)
#   vs gear      mpg
# 1  0    3 15.05000
# 2  1    3 20.33333
# 3  1    4 21.05000

## or
aggregate(mpg ~ vs + gear, data = mtcars, FUN = mean, subset = am == 0)

EDIT

Not such a good example since the task is a one-liner already, but you can create a wrapper function to set defaults and make typing easier, eliminate mistakes, etc

agg <- function(data = mtcars, vars = 'mpg', by_vars = c('vs','gear'), FUN = mean, ...) {
  aggregate(x = data[, vars], by = data[, by_vars], FUN = FUN, ...)
}


agg()
#   vs gear      mpg
# 1  0    3 15.05000
# 2  1    3 20.33333
# 3  0    4 21.00000
# 4  1    4 25.24000
# 5  0    5 19.12500
# 6  1    5 30.40000

agg(vars = c('mpg','disp'))
#   vs gear      mpg     disp
# 1  0    3 15.05000 357.6167
# 2  1    3 20.33333 201.0333
# 3  0    4 21.00000 160.0000
# 4  1    4 25.24000 115.6200
# 5  0    5 19.12500 229.3250
# 6  1    5 30.40000  95.1000
agg(FUN = sum)
#   vs gear   mpg
# 1  0    3 180.6
# 2  1    3  61.0
# 3  0    4  42.0
# 4  1    4 252.4
# 5  0    5  76.5
# 6  1    5  30.4

agg(mtcars[mtcars$am == 0, ])
#   vs gear      mpg
# 1  0    3 15.05000
# 2  1    3 20.33333
# 3  1    4 21.05000

agg(by_vars = c('cyl','gear'))
#   cyl gear    mpg
# 1   4    3 21.500
# 2   6    3 19.750
# 3   8    3 15.050
# 4   4    4 26.925
# 5   6    4 19.750
# 6   4    5 28.200
# 7   6    5 19.700
# 8   8    5 15.400
0
votes

Change to this:

filt <- quote(exclude.filter == 0)

OR if you need to pass a value in then use

Then try:

varname = "exclude.filter"
val = 0
filt <- substitute(var == value, list(var=as.name(varname),value=val))

gg1 <- aggregate(as.formula(paste(current.metric, paste(rhs, collapse="+"), sep="~")), 
                  data=data.set, FUN= "mean", subset= eval(filt) )