I would like to use data.table to compute variables for each group specified. For the sake of simplicity, let's say the data is split according to groups in x1:
x1 x2
a 3
a 4
b 1
b 5
And I want to create a variable for the mean of each group but I dont know how to index each group:
DT[,list(
mean_a=mean(x2) #for all rows containing "a"
mean_b=mean(x2) #for all rows containing "b"
by="x1")]
How can I rewrite the lines with comments? (i.e. find the mean for all rows with "a", same for "b")
I need the output as a data.table in separate columns, as it will be processed further:
mean_a mean_b
3.5 3
EDIT: after playing around with it, here is the solution I wanted.
> DT2=DT[,list(
+ mean_a=mean(x[grep("a",x1),x2]),
+ mean_b=mean(x[grep("b",x1),x2])),
+ by=NULL]
>
> DT2
mean_a mean_b
1: 3.5 3
It's not as efficient as Frank's but it's what I asked for originally, i.e. to rewrite the lines with comments.
dt[, mean(x2), by = x1]
??? Can you update your question to clarify what you're actually trying to ask, and, if necessary, also show a sample of the output you expect. – A5C1D2H2I1M1N2O1R2T1dt[,mean(x2),by=x1][,{names(V1) <- paste("mean_",x1,sep=""); V1}]
, just adding an extra step onto Ananda's answer/comment. – Frank1:
at the beginning of the first line and so on). – Frank