6
votes

I have a data.table shown below. I'm trying to calculate the weighted mean for subsets of the data. I've tried two approaches with the MWE below

    set.seed(12345)
    dt = data.table(a =c(10,20,25,10,10),b=rnorm(5),c=rnorm(5),d=rnorm(5),e=rnorm(5))
    dt$key = sample(toupper(letters[1:3]),5,replace=T)
    setkey(dt, key)

First subsetting the .SD and using an lapply call, which doesnt work (and wasn't really expected to)

dt[,lapply(.SD,function(x) weighted.mean(x,.SD[1])),by=key]

Second trying to define a function to apply to the .SD as I would if I were using ddply.

This fails too.

wmn=function(x){
  tmp = NULL
  for(i in 2:ncol(x)){
    tmp1 = weighted.mean(x[,i],x[,1])
    tmp = c(tmp,tmp1)
  }
  return(tmp)
}

dt[,wmn,by=key]

Any thoughts on how best to do this?

Thanks

EDIT

Change to error on wmn formula on columns selected.

SECOND EDIT

Weighted Mean formula reversed back and added set.seed

1
.SD[1]? The first argument of weighted.mean is supposed to be the thing you're taking the mean of; the weights go into the second argument. Also, you are not subsetting the data above, as far as I can tell... Finally, please use set.seed before making simulated data so we're all looking at the same thing. - Frank
If column "a" is supposed to serve as your weights, while you take the weighted means of the other columns, there's this: dt[,lapply(.SD,weighted.mean,w=a),by=key,.SDcols=letters[1:5]] - Frank
Thanks @Frank, if you put that in as an answer I'll be happy to accept it. Also thanks for the gotcha on the weighted.mean columns. Greatly embarassed by the error. Can you please explain your point about not subsetting? I didn't understand it I'm afraid. I had hoped to understand how to use specific columns from .SD to calculate more complex functions (wighted.mean) I guess mnels link suggests this isn't possible. - Tahnoon Pasha
Thanks @mnel. Appreciate the pointer. - Tahnoon Pasha

1 Answers

12
votes

If you want to take the weighted means of "b"..."e" using "a" as the weight, I think this does the trick:

dt[,lapply(.SD,weighted.mean,w=a),by=key,.SDcols=letters[1:5]]