3
votes

In R data.table, it is possible to reshape on multiple columns by passing a list of column names (value.var=) along with a list of aggregation functions (fun.aggregate=). This works well when those lists are explicitly passed to the function arguments. It appears to result in an error condition if the lists are passed as variables.

For example, lets create a data table dt as follows:

dt = data.table(x=sample(5,20,TRUE), y=sample(2,20,TRUE), 
                z=sample(letters[1:2], 20,TRUE), d1 = runif(20), d2=1L)

The reshape operation on two columns, d1 and d1, works with arguments passed as follows:

dcast(dt, x + y ~ z, fun=list(sum, mean), value.var=list("d1", "d2"))

However, the same operations fails when arguments are passed as named variables.

funs = list(sum, mean)
vars = list("d1", "d2")
dcast(dt, x + y ~ z, fun=funs, value.var=vars)

The error message is:

Error in aggregate_funs(fun.call, lvals, sep, ...) : 
  When 'fun.aggregate' and 'value.var' are both lists, 'value.var' must be either of length =1 or =length(fun.aggregate)

Is this a bug, or am I going about this the wrong way?

Update: Tried in R version 3.5.0 and data.table version 1.11.4 on Windows. In the actual scenario, my table has 171 columns and over 300,000 rows. The pivot operation involves 31 columns. I have unexpectedly encountered an error in trying to pass function arguments as variables instead of long "in situ" lists. I am looking for an explanation why this error condition occurs. Thank you!

1
I think it is very odd that this error is occurring. I'm pretty sure it has something to do with non-standard evaluation used in the aggregate_funs function in data.table package. I would think about submitting an issue to the data.table Github repo, as this shouldn't be considered wanted behaviour.Blaza
@42- I run R 3.5.0 and data.table 1.11.4. Those are I think the latest versions. The example the OP gives reproduces exactly.Blaza
No, the code that OP says gives an error is not in the examples in the docs. In the examples, there is the line where the function is called with passing ...fun=list(sum, mean), value.var=list("d1", "d2") and that works, as the OP said already. When you instead pass variables which are lists, as ..., fun=funs, value.var=vars), the error appears. That is the problem that the OP is talking about. Do you not get an error even when running the last 3 lines of code in the question?Blaza
Actually, it's not my question :) Does dcast(dt, x + y ~ z, fun=funs, value.var=c("d1","d2")) work for you? I get an error again that a function funs could not be found. That indicates that there is a bug in parsing the fun argument if it's passed as a variable. Passing everything with list(...), list(...) (or c(...)) like in the example isn't answering the OP's problem, but rather providing a workaround, which the OP explicitly said s/he wants to avoid, as s/he (I would presume) wants to avoid hard coding the fun and value.var arguments.Blaza
@42- In my opinion, it's fine if in the form of "looks like maybe a bug. you might want to check out their guidance: github.com/Rdatatable/data.table/wiki/Support " Anyway, it was long ago filed and gets refiled every so often like github.com/Rdatatable/data.table/issues/2064 As far as "current version of R", it should be sufficient to test on any version that the package purports to support (... I don't really have strong opinions on this; just saying.)Frank

1 Answers

-1
votes

It's expecting a character vector rather than a list:

 dcast(dt, x + y ~ z, fun=list(sum, mean), value.var=c("d1","d2"))
   x y  d1_sum_a  d1_sum_b d2_sum_a d2_sum_b d1_mean_a d1_mean_b d2_mean_a d2_mean_b
1: 1 1 0.3437415 1.7922195        1        3 0.3437415 0.5974065         1         1
2: 2 1 0.0000000 0.5831969        0        1       NaN 0.5831969       NaN         1
3: 2 2 0.6644480 0.5086218        1        2 0.6644480 0.2543109         1         1
4: 3 1 2.0642855 0.9072466        3        3 0.6880952 0.3024155         1         1
5: 3 2 0.0000000 0.7751363        0        1       NaN 0.7751363       NaN         1
6: 4 2 0.5024032 0.8132855        1        1 0.5024032 0.8132855         1         1
7: 5 2 0.1153944 0.8494716        1        2 0.1153944 0.4247358         1         1

Run on a Mac (El Capitan) running R 3.5.0 and data.table 1.14.1