Group a data.table using a column which is list

Question

I have a really big problem and looping through the data.table to do what I want is too slow, so I am trying to get around looping. Let assume I have a data.table as follows:

a <- data.table(i = c(1,2,3), j = c(2,2,6), k = list(c("a","b"),c("a","c"),c("b")))

> a
  i j   k
1: 1 2 a,b
2: 2 2 a,c
3: 3 6   b

And I want to group based on the values in k. So something like this:

a[, sum(j), by = k]

right now I am getting the following error:

 Error in `[.data.table`(a, , sum(i), by = k) : 
 The items in the 'by' or 'keyby' list are length (2,2,1). Each must be same length as rows in x or number of rows returned by i (3).

The answer I am looking for is to group first all the rows having "a" in column k and calculate sum(j) and then all rows having "b" and so on. So the desired answer would be:

k V1 
a 4
b 8
c 2

Any hint how to do it efficiently? I cant melt the column K by repeating the rows since the size of the data.table would be too big for my case.

Does this answer your question? R data.table - group by column includes list. I think this question is a duplicate, however it also provides better and more answers. — andschar

Mike H. Mike H. · Accepted Answer · 2016-07-31T15:36:24

I think this might work:

a[, .(k = unlist(k)), by=.(i,j)][,sum(j),by=k]

   k V1
1: a  4
2: b  8
3: c  2

Group a data.table using a column which is list

3 Answers