1
votes

I would like to use data.table to compute variables for each group specified. For the sake of simplicity, let's say the data is split according to groups in x1:

x1  x2
a   3
a   4
b   1
b   5

And I want to create a variable for the mean of each group but I dont know how to index each group:

DT[,list(
    mean_a=mean(x2) #for all rows containing "a"
    mean_b=mean(x2) #for all rows containing "b"
  by="x1")]

How can I rewrite the lines with comments? (i.e. find the mean for all rows with "a", same for "b")

I need the output as a data.table in separate columns, as it will be processed further:

mean_a  mean_b
3.5     3

EDIT: after playing around with it, here is the solution I wanted.

> DT2=DT[,list(
+     mean_a=mean(x[grep("a",x1),x2]),
+     mean_b=mean(x[grep("b",x1),x2])),
+     by=NULL]
> 
> DT2
   mean_a mean_b
1:    3.5      3

It's not as efficient as Frank's but it's what I asked for originally, i.e. to rewrite the lines with comments.

2
dt[, mean(x2), by = x1]??? Can you update your question to clarify what you're actually trying to ask, and, if necessary, also show a sample of the output you expect.A5C1D2H2I1M1N2O1R2T1
Ananda's comment works but shows the result in rows and not in columns like in your example. Do you need it in columns?Dean MacGregor
In case that's the case, you can use dt[,mean(x2),by=x1][,{names(V1) <- paste("mean_",x1,sep=""); V1}], just adding an extra step onto Ananda's answer/comment.Frank
Yes, the goal is to get the output in a data.table with multiple columns. Frank's solution only provides the names of the variables. Furthermore, it would be good to have the flexibility of naming each variable separately as I did (has to do with the specifics of my data).AlexR
Okay, I've offered an answer. By the way, neither your example input nor the output actually look like a data.table (you know, with the 1: at the beginning of the first line and so on).Frank

2 Answers

4
votes

I don't think it's worth your while to set names separately for each value of x1, just choose an appropriate prefix:

dt[,mean(x2),by=x1][,{
    names(V1) <- paste("mean_",x1,sep="")
    do.call(data.table,as.list(V1))
}]

However, if you really want custom names, you can put them into a vector beforehand:

mynames <- c(
a = "mean_a",
b = "mean_b"
)

dt[,mean(x2),by=x1][,{
    names(V1) <- mynames[x1]
    do.call(data.table,as.list(V1))
}]

The result is

   mean_a mean_b
1:    3.5      3

EDIT: As @eddi pointed out, this is a better way of doing the same thing:

setnames(dt[,mean(x2),keyby=x1][, as.list(V1)], sort(mynames))

If you assign this somewhere, you will see that it is the desired data.table. You can also get it to print by appending [] at the end of the call or by putting the whole thing in parentheses.

0
votes

I'm fairly certain there's a more elegant solution, but this works:

x1 <- c('a','a','b','b')
x2 <- c(3,4,1,5)

df=data.frame(as.factor(x1),x2)
groupmeans = sapply(unique(x1), function(x) {mean(df[x1==x,]$x2)})

#   a    b
# 3.5  3.0

groupmeans_table = data.table(t(groupmeans))

#       a  b
# 1:  3.5  3