How to get summary statistics by group

Question

I'm trying to get multiple summary statistics in R/S-PLUS grouped by categorical column in one shot. I found couple of functions, but all of them do one statistic per call, like aggregate().

data <- c(62, 60, 63, 59, 63, 67, 71, 64, 65, 66, 68, 66, 
          71, 67, 68, 68, 56, 62, 60, 61, 63, 64, 63, 59)
grp <- factor(rep(LETTERS[1:4], c(4,6,6,8)))
df <- data.frame(group=grp, dt=data)
mg <- aggregate(df$dt, by=df$group, FUN=mean)    
mg <- aggregate(df$dt, by=df$group, FUN=sum)

What I'm looking for is to get multiple statistics for the same group like mean, min, max, std, ...etc in one call, is that doable?

This one is a pretty basic question with multiple answers. You may not be familiar with RSeek (LINK) and the sos library (LINK) Both are great resources to help you figure out the answers to questions. Ibet with those resources you'll be able to answer your own question in seconds. — Tyler Rinker
I just found a wonderful R package tables. You can tabulate data by as many categories as you desire and calculate multiple statistics for multiple variables - it truly is amazing! But wait, there's more! The package has functions to generate LaTeX code for your tables for easy import to your documents. — StatGrrl

BenBarnes BenBarnes · Accepted Answer · 2012-03-24T10:12:33

1. `tapply`

I'll put in my two cents for tapply().

tapply(df$dt, df$group, summary)

You could write a custom function with the specific statistics you want or format the results:

tapply(df$dt, df$group,
  function(x) format(summary(x), scientific = TRUE))
$A
       Min.     1st Qu.      Median        Mean     3rd Qu.        Max. 
"5.900e+01" "5.975e+01" "6.100e+01" "6.100e+01" "6.225e+01" "6.300e+01" 

$B
       Min.     1st Qu.      Median        Mean     3rd Qu.        Max. 
"6.300e+01" "6.425e+01" "6.550e+01" "6.600e+01" "6.675e+01" "7.100e+01" 

$C
       Min.     1st Qu.      Median        Mean     3rd Qu.        Max. 
"6.600e+01" "6.725e+01" "6.800e+01" "6.800e+01" "6.800e+01" "7.100e+01" 

$D
       Min.     1st Qu.      Median        Mean     3rd Qu.        Max. 
"5.600e+01" "5.975e+01" "6.150e+01" "6.100e+01" "6.300e+01" "6.400e+01"

2. `data.table`

The data.table package offers a lot of helpful and fast tools for these types of operation:

library(data.table)
setDT(df)
> df[, as.list(summary(dt)), by = group]
   group Min. 1st Qu. Median Mean 3rd Qu. Max.
1:     A   59   59.75   61.0   61   62.25   63
2:     B   63   64.25   65.5   66   66.75   71
3:     C   66   67.25   68.0   68   68.00   71
4:     D   56   59.75   61.5   61   63.00   64

How to get summary statistics by group

12 Answers

1. tapply

2. data.table

1. `tapply`

2. `data.table`