UPDATE (eddi): As of version 1.8.11 this has been fixed and .SD
is not needed in cases where the expression can be evaluated in place, like in OP. Since currently presence of .SD
triggers construction of full .SD
, this will result in much faster speeds in some cases.
What's going on is that calls to eval()
are treated differently than you likely imagine in the code that implements [.data.table()
. Specifically, [.data.table()
contains special evaluation branches for i
and j
expressions that begin with the symbol eval
. When you wrap the call to eval
inside of a call to sum()
, eval
is no longer the first element of the parsed/substituted expression, and the special evaluation branch is skipped.
Here is the bit of code in the monster function displayed by typing getAnywhere("[.data.table")
that makes a special allowance for calls to eval()
passed in via [.data.table()
's j
-argument:
jsub = substitute(j)
...
# Skipping some lines
...
jsubl = as.list.default(jsub)
if (identical(jsubl[[1L]], quote(eval))) { # The test for eval 'on the outside'
jsub = eval(jsubl[[2L]], parent.frame(), parent.frame())
if (is.expression(jsub))
jsub = jsub[[1L]]
}
As a workaround, either follow the example in data.table FAQ 1.6 (pdf here), or explicitly point eval()
towards .SD
, the local variable that holds columns of whatever data.table you are operating on (here d
). (For some more explanation of .SD
's role, see the first few paragraphs of this answer).
d[, sum(eval(quoted_a, envir=.SD))]
quote(sum(a))
instead ofexpression(sum(a))
. No idea why it mattered. – rbatt