4
votes

i have a data.table and want to apply a function to on each subset of a row. Normaly one would do as follows: DT[, lapply(.SD, function), by = y]

But in my case the function does not return a atomic vector but simply a vector. Is there a chance to do something like this?

library(data.table)
set.seed(9)
DT <- data.table(x1=letters[sample(x=2L,size=6,replace=TRUE)],
                 x2=letters[sample(x=2L,size=6,replace=TRUE)],
                 y=rep(1:2,3), key="y")
DT
#   x1 x2 y
#1:  a  a 1
#2:  a  b 1
#3:  a  a 1
#4:  a  a 2
#5:  a  b 2
#6:  a  a 2

DT[, lapply(.SD, table), by = y]
# Desired Result, something like this:
# x1_a x2_a x2_b
#    3    2    1
#    3    2    1

Thanks in advance, and also: I would not mind if the result of the function must have a fixed length.

1
Do you want x1 & x2 to have the same levels? - Ricardo Saporta
@RicardoSaporta For simplicity lets asume so. - jakob-r
it's simpler without it. Otherwise, the levels need to be modified. I was asking based on how you were creating your data.table. (ie, sampling from the same set, but due to the random sampling, not all values are represented. Should the non-values be 0, or should they simply be ignored)? - Ricardo Saporta

1 Answers

5
votes

You simply need to unlist the table and then coerce back to a list:

> DTCounts <- DT[, as.list(unlist(lapply(.SD, table))), by=y]
> DTCounts

   y x1.a x2.a x2.b
1: 1    3    2    1
2: 2    3    2    1

.


if you do not like the dots in the names, you can sub them out:

> setnames(DTCounts, sub("\\.", "_", names(DTCounts)))
> DTCounts

   y x1_a x2_a x2_b
1: 1    3    2    1
2: 2    3    2    1

Note that if not all values in a column are present for each group
(ie, if x2=c("a", "b") when y=1, but x2=c("b", "b") when y=2)
then the above breaks.

The solution is to make the columns factors before counting.

DT[, lapply(.SD, is.factor)]

## OR
columnsToConvert <- c("x1", "x2")  # or .. <- setdiff(names(DT), "y") 
DT <- cbind(DT[, lapply(.SD, factor), .SDcols=columnsToConvert], y=DT[, y])