I want to compute in R data.table deviations from group means. To do this efficiently, I would want to use the optimised mean function in data.table, but haven't found a way to use it within a composite call (i.e. x - mean(x))?
What I mean is that I can use x[, lapply(.SD, function(x) x - mean(x)), by=id], but I suspect that this approach does not use the optimised version of mean in data.table. Indeed, comparing the speed of:
x[, lapply(.SD, mean), by=id]x[, lapply(.SD, function(x) mean(x)), by=id]
It turns out that in some cases 1) is 10 times faster than 2)! So how could I use a call like in 1), but this time for a composite function like x -mean(x)? I did not succeed using anonymous call {...} within lapply.
Thanks!
Simulation showing how faster mean versus function(x) mean(x) is:
library(data.table)
T = 50
N = 20000
set.seed(123)
data_sim <- data.table(A = rnorm(N * T),
B1 = sample(c(0,1), N * T, replace = TRUE),
B2 = rnorm(N * T),
individual = rep(1:N, each = T))
library(microbenchmark)
mean2 <- function(x) mean(x)
microbenchmark(sol1 = data_sim[, lapply(.SD, mean), by=individual],
sol2 = data_sim[, lapply(.SD, mean2), by=individual],
sol3 = data_sim[, lapply(.SD, function(x) mean(x)), by=individual],
dev_mean = data_sim[, lapply(.SD, function(x) x- mean(x)), by=individual],
Results:
|expr | min| mean| max| neval|
|:--------|---------:|---------:|---------:|-----:|
|sol1 | 17.67686| 18.68033| 21.04078| 5|
|sol2 | 369.69595| 378.91943| 400.77024| 5|
|sol3 | 149.57088| 154.76857| 159.93155| 5|
|dev_mean | 218.44641| 286.00977| 404.06092| 5|
?GForceand also switch onverbose=TRUE- chinsoon12DT[, mu := mean(x), by=g][, v := x - mu](except with lapply and Map to iterate over columns), but the mean is not yet optimized with:=. - Frankverbose=TRUEargument, which explains the speed difference! But indeed, not sure how to apply in my context? And I think according to @Frank, there's little hope to use the optimized mean for my problem? Solution then seem is to compute table of group means (use gforce), and bind it back to original table!? This explains why solution in this similar post worked so well!? - Matifou