Basically I have a data.table with a list column with any kind of vector entries and want to know if any entry of one row is present in any other row of the listed vectors. And to get at the end a column with a grouping variable.
It's working with a combination of lapply() and by = row.names(), but of course it's getting awfully slow as soon as the row number increases.
The paste() has the purpose to get a string with all combination possibilities for the current row to group by later on.
So is there any more elegant (and faster!) solution?
library(data.table)
ex_dat <- data.table(
ls_col = list(
c(1,2,3),
c(3,4),
c(3,4,5,6,7,8),
c(5)
)
)
ex_dat[, grp_string := list(list(paste(unique(unlist(
lapply(ex_dat[['ls_col']], function(x) {
if (any(unlist(ls_col) %in% x)){
x
}
}))), collapse = " | "))),
by = row.names(ex_dat)]
Current and desired output (the grouping variable may differ though):
> ex_dat
ls_col grp_string
1: 1,2,3 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8
2: 3,4 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8
3: 3,4,5,6,7,8 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8
4: 5 3 | 4 | 5 | 6 | 7 | 8