data.table row comparison in a list column

Question

Basically I have a data.table with a list column with any kind of vector entries and want to know if any entry of one row is present in any other row of the listed vectors. And to get at the end a column with a grouping variable.

It's working with a combination of lapply() and by = row.names(), but of course it's getting awfully slow as soon as the row number increases. The paste() has the purpose to get a string with all combination possibilities for the current row to group by later on.

So is there any more elegant (and faster!) solution?

library(data.table)

ex_dat <- data.table(
  ls_col = list(
    c(1,2,3),
    c(3,4),
    c(3,4,5,6,7,8),
    c(5)
  )
)

ex_dat[, grp_string := list(list(paste(unique(unlist(
  lapply(ex_dat[['ls_col']], function(x) {
    if (any(unlist(ls_col) %in% x)){
      x
    }
  }))), collapse = " | "))), 
  by = row.names(ex_dat)]

Current and desired output (the grouping variable may differ though):

> ex_dat
        ls_col                    grp_string
1:       1,2,3 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8
2:         3,4 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8
3: 3,4,5,6,7,8 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8
4:           5         3 | 4 | 5 | 6 | 7 | 8

chinsoon12 chinsoon12 · Accepted Answer · 2018-05-31T00:41:49

Not sure if this will help. You can covert into a long format first, then use union for each element

ex_dat[, .(ls_col, elements=unlist(ls_col)), by=seq_len(ex_dat[,.N])][,
    .(members=Reduce(union, ls_col)), by=elements]

result (which is probably in an easier format for your next step):

    elements members
 1:        1       1
 2:        1       2
 3:        1       3
 4:        2       1
 5:        2       2
 6:        2       3
 7:        3       1
 8:        3       2
 9:        3       3
10:        3       4
11:        3       5
12:        3       6
13:        3       7
14:        3       8
15:        4       3
16:        4       4
17:        4       5
18:        4       6
19:        4       7
20:        4       8
21:        5       3
22:        5       4
23:        5       5
24:        5       6
25:        5       7
26:        5       8
27:        6       3
28:        6       4
29:        6       5
30:        6       6
31:        6       7
32:        6       8
33:        7       3
34:        7       4
35:        7       5
36:        7       6
37:        7       7
38:        7       8
39:        8       3
40:        8       4
41:        8       5
42:        8       6
43:        8       7
44:        8       8
    elements members

data.table row comparison in a list column

1 Answers