13
votes

I'm trying to have an anonymous function return multiple columns in the j argument of a data.table. Here's an example:

## sample data
tmpdt <- data.table(a = c(rep("a", 5), rep("b", 5)),
                    b = c(rep("f", 3), rep("r", 7)),
                    c = 1:10,
                    d = 21:30)
tmpdt[c %in% c(2,4), c := NA]

## this works fine
tmpdt[ , list(testout =
                (function(x) {
                    model <- lm(c ~ d, x)
                    residuals(model)
                })(.SD)),
      by = a]

## but I want to return a data.frame from the
## anonymous function

tmpdt[ , list(testout =
                (function(x) {
                    model <- lm(c ~ d, x)
                    tmpresid <- residuals(model)
                    tmpvalue <- x$b[as.numeric(names(tmpresid))]
                    data.frame(tmpvalue, tmpresid)
                })(.SD)),
      by = a]

The second version doesn't work because the function returns a data.frame instead of just a vector. Is there any way to make this work without writing the function call outside of the data.table j argument?

2

2 Answers

18
votes

You don't need an anonymous functions - you can have whatever expression you want wrapped in { } (anonymous body) in j.

tmpdt[, {
          model <- lm(c ~ d, .SD)
          tmpresid <- residuals(model)
          tmpvalue <- b[as.numeric(names(tmpresid))]
          list(tmpvalue, tmpresid) # every element of the list becomes a column in result
        }
      , by = a]

Some documentation on the use of anonymous body { } in j:

  1. Comment in Examples in ?data.table:

anonymous lambda in j: j accepts any valid expression. TO REMEMBER: every element of the list becomes a column in result.

  1. data.table FAQ 2.8 What are the scoping rules for j expressions?

No anonymous function is passed to j. Instead, an anonymous body [{ }] is passed to j [...] Some programming languages call this a lambda.

  1. Blog post by Andrew Brooks on the use of { } in j: Suppressing intermediate output with {}
0
votes

Just realized the issue right after I did it. No need to have a list:

tmpdt[,(function(x) {
                    model <- lm(c~d,x)
                    tmpresid <- residuals(model)
                    tmpvalue <- x$b[as.numeric(names(tmpresid))]
                    data.frame(tmpvalue,tmpresid)
                })(.SD)),
      by=a]