calculate mean for multiple columns in data.frame

Question

Just wondering whether it is possible to calculate means for multiple columns by just using the mean function

e.g.

mean(iris[,1])

is possible but not

mean(iris[,1:4])

tried:

mean(iris[,c(1:4)])

got this error message:

Warning message: In mean.default(iris[, 1:4]) : argument is not numeric or logical: returning NA

I know I can just use lapply(iris[,1:4],mean) or sapply(iris[,1:4],mean)

How about colMeans(iris[,1:4]). Taking the mean of a data.frame has been deprecated. I'm not sure why you expect that to work when you already know about the sapply/lapply solutions. — MrFlick
Use ?colMeans or an apply approach but maybe for such a question the best would be a search to stackoverflow. — SabDeM

Pierre L Pierre L · Accepted Answer · 2015-06-19T15:12:24

Try colMeans:

But the column must be numeric. You can add a test for it for larger datasets.

colMeans(iris[sapply(iris, is.numeric)])
Sepal.Length  Sepal.Width Petal.Length  Petal.Width 
    5.843333     3.057333     3.758000     1.199333

Benchmark

Seems long for dplyr and data.table. Perhaps someone can replicate the findings for veracity.

microbenchmark(
  plafort = colMeans(big.df[sapply(big.df, is.numeric)]),
  Carlos  = colMeans(Filter(is.numeric, big.df)),
  Cdtable = big.dt[, lapply(.SD, mean)],
  Cdplyr  = big.df %>% summarise_each(funs(mean))
  )
#Unit: milliseconds
#    expr       min        lq     mean    median       uq       max
# plafort  9.862934 10.506778 12.07027 10.699616 11.16404  31.23927
#  Carlos  9.215143  9.557987 11.30063  9.843197 10.21821  65.21379
# Cdtable 57.157250 64.866996 78.72452 67.633433 87.52451 264.60453
#  Cdplyr 62.933293 67.853312 81.77382 71.296555 91.44994 182.36578

Data

m <- matrix(1:1e6, 1000)
m2 <- matrix(rep('a', 1000), ncol=1)
big.df <- as.data.frame(cbind(m2, m), stringsAsFactors=F)
big.df[,-1] <- lapply(big.df[,-1], as.numeric)
big.dt <- as.data.table(big.df)

calculate mean for multiple columns in data.frame

3 Answers