I have implemented a simple group-by-operation with the ?stats::aggregate function. It collects elements per group in a vector. I would like to make it faster using the data.table package. However I'm not able to reproduce the wanted behaviour with data.table.
Sample dataset:
df <- data.frame(group = c("a","a","a","b","b","b","b","c","c"), val = c("A","B","C","A","B","C","D","A","B"))
Output to reproduce with data.table:
by_group_aggregate <- aggregate(x = df$val, by = list(df$group), FUN = c)
What I've tried:
data_t <- data.table(df)
# working, but not what I want
by_group_datatable <- data_t[,j = paste(val,collapse=","), by = group]
# no grouping done when using c or as.vector
by_group_datatable <- data_t[,j = c(val), by = group]
by_group_datatable <- data_t[,j = as.vector(val), by = group]
# grouping leads to error when using as.list
by_group_datatable <- data_t[,j = as.list(val), by = group]
Is it possible to have vectors of different size in a data.table column? If yes, how do I achieve it?