The exercise consists in aggregating a numeric vector of values by a combination of factors with data.table in R. Take the following data table as example:
require (data.table)
require (plyr)
dtb <- data.table (cbind (expand.grid (month = rep (month.abb[1:3], each = 3),
fac = letters[1:3]),
value = rnorm (27)))
Notice that every unique combination of 'month' and 'fac' shows up three times. So, when I try to average values by both these factors, I should expect a data frame with 9 unique rows:
(agg1 <- ddply (dtb, c ("month", "fac"), function (dfr) mean (dfr$value)))
month fac V1
1 Jan a -0.36030953
2 Jan b -0.58444588
3 Jan c -0.15472876
4 Feb a -0.05674483
5 Feb b 0.26415972
6 Feb c -1.62346772
7 Mar a 0.24560510
8 Mar b 0.82548140
9 Mar c 0.18721114
However, when aggregating with data.table, I keep getting the results provided by every redundant combination of the two factors:
(agg2 <- dtb[, value := mean (value), by = list (month, fac)])
month fac value
1: Jan a -0.36030953
2: Jan a -0.36030953
3: Jan a -0.36030953
4: Feb a -0.05674483
5: Feb a -0.05674483
6: Feb a -0.05674483
7: Mar a 0.24560510
8: Mar a 0.24560510
9: Mar a 0.24560510
10: Jan b -0.58444588
11: Jan b -0.58444588
12: Jan b -0.58444588
13: Feb b 0.26415972
14: Feb b 0.26415972
15: Feb b 0.26415972
16: Mar b 0.82548140
17: Mar b 0.82548140
18: Mar b 0.82548140
19: Jan c -0.15472876
20: Jan c -0.15472876
21: Jan c -0.15472876
22: Feb c -1.62346772
23: Feb c -1.62346772
24: Feb c -1.62346772
25: Mar c 0.18721114
26: Mar c 0.18721114
27: Mar c 0.18721114
month fac value
Is there an elegant way to collapse these results to one row per unique combination of factors with data table?
data.frame, it is not necessary to usecbindwhen creating thedata.table- Ricardo Saporta