I often need to filter out columns with a low variance from a data.table. The column names are not known in advance.
dt = data.table(mtcars)
# calculate standard deviation with arbitrary max value of 1:
mask = dt[,lapply(.SD, function(x) sd(x, na.rm = TRUE) > 1)]
# The columns with the FALSE values in row 1 need to be removed
mask.t = t(mask)
mask.t = which(mask.t)
dt[,mask.t,with=FALSE]
The approach above is clunky. Is there a more elegant way to filter out columns out of a data.table for which the column statistic evaluates to TRUE?
dt[, names(mask)[unlist(mask)], with=FALSE]maybe? Ordt[, names(which(unlist(mask))), with=FALSE]- Arun