3
votes

I have a data.table, and I'd like to apply a function over its columns. Usually this is done like so:

dt[, lapply(.SD, func), .SDcols = c("col1", "col2")]

And this would apply the function func over those two columns. What if, however, I'd like to apply it over the sum of those two columns? Something like

dt[, lapply(.SD, func), .SDcols = "col1 + col2"]

obviously doesn't work.

You could generalise this to applying func to the result of another function (in this case, sum) that takes in columns as arguments. I know I can create another column containing the results of the first function, but is there a way around that?

1
I usually "keep it simple" by creating that intermediary variable and remove later if I really have to get rid of it. Honestly, how long time (or extra memory) does it take to add that extra column with data.table? :) - daroczig
@daroczig True, I'm just interested in saving a couple of lines of code and columns if the intermediate columns can be skipped. - Samuel Tan

1 Answers

2
votes

To add the columns, try

dt[, func(Reduce(`+`,.SD)), .SDcols = c("col1","col2")]

This works with more than two columns as well, adding them all together before applying func.