I converted iris data set to data.table format.My aim was to take mean of all columns and group them by Species using data.table.
DT <-as.data.table(iris)
Below is my desired output
DT[,.(mean(Sepal.Length),mean(Sepal.Width),mean(Petal.Length),mean(Petal.Width)),by =.(Species)]
Species V1 V2 V3 V4
1: setosa 5.006 3.428 1.462 0.246
2: versicolor 5.936 2.770 4.260 1.326
3: virginica 6.588 2.974 5.552 2.026
But typing all the column names using the above manner when we have large number of column is time confusing. I tried this using the below command.But the output is arranged in a slightly different manner
DT[, .(vapply(DT[,!'Species',with=FALSE],mean,FUN.VALUE =1)),by = .(Species)]
Species V1
1: setosa 5.843333
2: setosa 3.057333
3: setosa 3.758000
4: setosa 1.199333
5: versicolor 5.843333
6: versicolor 3.057333
7: versicolor 3.758000
8: versicolor 1.199333
9: virginica 5.843333
10: virginica 3.057333
11: virginica 3.758000
12: virginica 1.199333
is there any way to use above kind of things and get rid of typing all the column names just to take out the mean and group them by species using data.table
Please Do not suggest to use 'with = FALSE' . I already knew that.
DT[, lapply(.SD,mean), by=Species, .SDcols=names(DT)[1:4]]
or similar should sort you out. – thelatemailDT[, lapply(.SD,mean), by=Species]
if you just want to exclude theby=
variable. – thelatemail