20
votes

If you run:

mod <- lm(mpg ~ factor(cyl), data=mtcars)

It runs, because lm knows to look in mtcars to find both mpg and cyl.

Yet mean(mpg) fails as it can't find mpg, so you do mean(mtcars$mpg).

How do you code a function so that it knows to look in 'data' for the variables?

myfun <- function (a,b,data){
    return(a+b)
}

This will work with:

myfun(mtcars$mpg, mtcars$hp)

but will fail with:

myfun(mpg,hp, data=mtcars )

Cheers

3
I don't think I understand this enough to post a concise answer, but @Hadley has put together a quite thorough walk through explaining this here: github.com/hadley/devtools/wiki/Evaluation.Chase
Cheers Chase, I knew Hadley would be one to ask, given ggplot sprung to mind as a set of functions that work this way. Will have a read now.nzcoops

3 Answers

20
votes

Here's how I would code myfun():

myfun <- function(a, b, data) {
    eval(substitute(a + b), envir=data, enclos=parent.frame())
}

myfun(mpg, hp, mtcars)
#  [1] 131.0 131.0 115.8 131.4 193.7 123.1 259.3  86.4 117.8 142.2 140.8 196.4
# [13] 197.3 195.2 215.4 225.4 244.7  98.4  82.4  98.9 118.5 165.5 165.2 258.3
# [25] 194.2  93.3 117.0 143.4 279.8 194.7 350.0 130.4

If you're familiar with with(), it's interesting to see that it works in almost exactly the same way:

> with.default
# function (data, expr, ...) 
# eval(substitute(expr), data, enclos = parent.frame())
# <bytecode: 0x016c3914>
# <environment: namespace:base>

In both cases, the key idea is to first create an expression from the symbols passed in as arguments and then evaluate that expression using data as the 'environment' of the evaluation.

The first part (e.g. turning a + b into the expression mpg + hp) is possible thanks to substitute(). The second part is possible because eval() was beautifully designed, such that it can take a data.frame as its evaluation environment.

5
votes

lm "knows" to look in its data argument because it actually constructs a call to model.frame using its own call as the base. If you look at the code for lm, you'll see the necessary machinery in the first dozen lines or so.

You could replicate this for your own ends, but if your needs are simpler, you don't have to go to the same extent. For example:

myfun <- function(..., data)
eval(match.call(expand.dots=FALSE)$...[[1]], data)

Or, just look at evalq.

3
votes

This is not exactly like what you asked for, but if you don't know about with() this might be an option:

 myfun <- function (a,b){
    return(a+b)
 }
 with(mtcars, myfun(mpg, hp))

You can remove the data argument to myfun for this.