16
votes

Is it possible to include two functions within a single tapply or aggregate statement?

Below I use two tapply statements and two aggregate statements: one for mean and one for SD.
I would prefer to combine the statements.

my.Data = read.table(text = "
  animal    age     sex  weight
       1  adult  female     100
       2  young    male      75
       3  adult    male      90
       4  adult  female      95
       5  young  female      80
", sep = "", header = TRUE)

with(my.Data, tapply(weight, list(age, sex), function(x) {mean(x)}))
with(my.Data, tapply(weight, list(age, sex), function(x) {sd(x)  }))

with(my.Data, aggregate(weight ~ age + sex, FUN = mean)
with(my.Data, aggregate(weight ~ age + sex, FUN =   sd)

# this does not work:

with(my.Data, tapply(weight, list(age, sex), function(x) {mean(x) ; sd(x)}))

# I would also prefer that the output be formatted something similar to that 
# show below.  `aggregate` formats the output perfectly.  I just cannot figure 
# out how to implement two functions in one statement.

  age    sex   mean        sd
adult female   97.5  3.535534
adult   male     90        NA
young female   80.0        NA
young   male     75        NA

I can always run two separate statements and merge the output. I was just hoping there might be a slightly more convenient solution.

I found the answer below posted here: Apply multiple functions to column using tapply

f <- function(x) c(mean(x), sd(x))
do.call( rbind, with(my.Data, tapply(weight, list(age, sex), f)) )

However, neither the rows or columns are labeled.

     [,1]     [,2]
[1,] 97.5 3.535534
[2,] 80.0       NA
[3,] 90.0       NA
[4,] 75.0       NA

I would prefer a solution in base R. A solution from the plyr package was posted at the link above. If I can add the correct row and column headings to the above output, it would be perfect.

4

4 Answers

18
votes

But these should have:

with(my.Data, aggregate(weight, list(age, sex), function(x) { c(MEAN=mean(x), SD=sd(x) )}))

with(my.Data, tapply(weight, list(age, sex), function(x) { c(mean(x) , sd(x) )} ))
# Not a nice structure but the results are in there

with(my.Data, aggregate(weight ~ age + sex, FUN =  function(x) c( SD = sd(x), MN= mean(x) ) ) )
    age    sex weight.SD weight.MN
1 adult female  3.535534 97.500000
2 young female        NA 80.000000
3 adult   male        NA 90.000000
4 young   male        NA 75.

The principle to be adhered to is to have your function return "one thing" which could be either a vector or a list but cannot be the successive invocation of two function calls.

10
votes

If you'd like to use data.table, it has with and by built right into it:

library(data.table)
myDT <- data.table(my.Data, key="animal")


myDT[, c("mean", "sd") := list(mean(weight), sd(weight)), by=list(age, sex)]


myDT[, list(mean_Aggr=sum(mean(weight)), sd_Aggr=sum(sd(weight))), by=list(age, sex)]
     age    sex mean_Aggr   sd_Aggr
1: adult female     96.0  3.6055513
2: young   male     76.5  2.1213203
3: adult   male     91.0  1.4142136
4: young female     84.5  0.7071068

I used a slightly different data set so as to not have NA values for sd

7
votes

In the spirit of sharing, if you are familiar with SQL, you might also consider the "sqldf" package. (Emphasis added because you do need to know, for instance, that mean is avg in order to get the results you want.)

sqldf("select age, sex, 
      avg(weight) `Wt.Mean`, 
      stdev(weight) `Wt.SD` 
      from `my.Data` 
      group by age, sex")
    age    sex Wt.Mean    Wt.SD
1 adult female    97.5 3.535534
2 adult   male    90.0 0.000000
3 young female    80.0 0.000000
4 young   male    75.0 0.000000
5
votes

Reshape lets you pass 2 functions; reshape2 does not.

library(reshape)
my.Data = read.table(text = "
  animal    age     sex  weight
       1  adult  female     100
       2  young    male      75
       3  adult    male      90
       4  adult  female      95
       5  young  female      80
", sep = "", header = TRUE)
my.Data[,1]<- NULL
(a1<-  melt(my.Data, id=c("age", "sex"), measured=c("weight")))
(cast(a1, age + sex ~ variable, c(mean, sd), fill=NA))

#     age    sex weight_mean weight_sd
# 1 adult female        97.5  3.535534
# 2 adult   male        90.0        NA
# 3 young female        80.0        NA
# 4 young   male        75.0        NA

I owe this to @Ramnath, who noted this just yesterday.