2
votes

In a previous question Return a list in dplyr mutate() it was clairified that although dlpyr cannot in release 0.2 create new variables from a vector returned by a function, data.table() can with the syntax -:

it[, c(paste0("V", 4:5)) := myfun(V2, V3)]

If the function myfun from that question is altered to -:

myfun = function(arg1,arg2) {


if (arg1 > arg2) {
temp1 = arg1 + arg2
temp2 = arg1 - arg2 }
else {
temp1 = arg1 * arg2
temp2 = arg1 / arg2 }
list(temp1,temp2)

}

the solution posted above returns a warning -:

it = data.table(c("a","a","b","b","c"),c(1,2,3,4,5), c(2,3,4,2,2))
it[, c(paste0("V", 4:5)) := myfun(V2, V3)]

Warning message:
In if (arg1 > arg2) { :
  the condition has length > 1 and only the first element will be used

This implies that somehow data.table() is passing more than a single row to the function. Why is this occurring?

1
That warning is coming from your function. Just doing myfun(it$V2, it$V3) gives the same warning. It's because you are comparing two vectors (of length > 1) when doing arg1 > arg2. So, it takes just the first value (and provides the warning).Arun

1 Answers

4
votes

Ron, this is expected behavior. data.table always passes the full columns (unless you use by, in which case you get the part of the column that corresponds to each sub group). In order to get around this, you need to vectorize your function:

myfun2 = function(arg1,arg2) {
  temp1 <- ifelse(arg1 > arg2, arg1 + arg2, arg1 * arg2)
  temp2 <- ifelse(arg1 > arg2, arg1 - arg2, arg1 / arg2)
  list(temp1,temp2)
}

I do this here using ifelse instead of if/else. Then it works:

it = data.table(c("a","a","b","b","c"),c(1,2,3,4,5), c(2,3,4,2,2))
it[, c(paste0("V", 4:5)) := myfun2(V2, V3)]
it
#    V1 V2 V3 V4        V5
# 1:  a  1  2  2 0.5000000
# 2:  a  2  3  6 0.6666667
# 3:  b  3  4 12 0.7500000
# 4:  b  4  2  6 2.0000000
# 5:  c  5  2  7 3.0000000

Another alternative, if you don't want to modify your function, is to break up the data.table into one row groups. We do this by passing a vector to by that has a distinct value for each row in the data.table (so that each row is a group):

it[, c(paste0("V", 4:5)) := myfun(V2, V3), by=1:nrow(it)]

Notice the by argument. This also works, but is slower. Generally, if you can vectorize you should.