One-way repeated measures ANOVA with unbalanced data

Question

I'm new to R, and I've read these forums (for help with R) for awhile now, but this is my first time posting. After googling each error here, I still can't figure out and fix my mistakes.

I am trying to run a one-way repeated measures ANOVA with unequal sample sizes. Here is a toy version of my data and the code that I'm using. (If it matters, my real data have 12 bins with up to 14 to 20 values in each bin.)

## the data: average probability for a subject, given reaction time bin
bin1=c(0.37,0.00,0.00,0.16,0.00,0.00,0.08,0.06)
bin2=c(0.33,0.21,0.000,1.00,0.00,0.00,0.00,0.00,0.09,0.10,0.04)
bin3=c(0.07,0.41,0.07,0.00,0.10,0.00,0.30,0.25,0.08,0.15,0.32,0.18)

## creating the data frame

# dependent variable column
probability=c(bin1,bin2,bin3)

# condition column
bin=c(rep("bin1",8),rep("bin2",11),rep("bin3",12))

# subject column (in the order that will match them up with their respective
# values in the dependent variable column)
subject=c("S2","S3","S5","S7","S8","S9","S11","S12","S1","S2","S3","S4","S7",
  "S9","S10","S11","S12","S13","S14","S1","S2","S3","S5","S7","S8","S9","S10",
  "S11","S12","S13","S14")

# putting together the data frame
dataFrame=data.frame(cbind(probability,bin,subject))

## one-way repeated measures anova
test=aov(probability~bin+Error(subject/bin),data=dataFrame)

These are the errors I get:

Error in qr.qty(qr.e, resp) : 
  invalid to change the storage mode of a factor
In addition: Warning messages:
1: In model.response(mf, "numeric") :
  using type = "numeric" with a factor response will be ignored
2: In Ops.factor(y, z$residuals) : - not meaningful for factors
3: In aov(probability ~ bin + Error(subject/bin), data = dataFrame) :
  Error() model is singular

Sorry for the complexity (assuming it is complex; it is to me). Thank you for your time.

In a repeated measures ANOVA every subject must appear exactly once in every condition so you cannot have unequal sample sizes. You need the drop the subjects that don't fall into each bin. Or, given what it looks like your data could be, add in 0 probability bins. You also need to set the types of your variables as indicated by your errors. ... for starters — John
Thanks for the advice. According to this link, I thought unequal sample sizes were fine for this analysis. Also, for the types, I've tried making sure that the dependent variable column was numeric with as.numeric (and I tried various other things), but none of it seemed to work. What should the types be? — Charlette Lin

Ben Bolker Ben Bolker · Accepted Answer · 2013-07-16T20:34:42

For an unbalanced repeated-measures design, it might be easiest to use lme (from the nlme package):

## this should be the same as the data you constructed above, just 
## a slightly more compact way to do it.
datList <- list(
   bin1=c(0.37,0.00,0.00,0.16,0.00,0.00,0.08,0.06),
   bin2=c(0.33,0.21,0.000,1.00,0.00,0.00,0.00,0.00,0.09,0.10,0.04),
   bin3=c(0.07,0.41,0.07,0.00,0.10,0.00,0.30,0.25,0.08,0.15,0.32,0.18))
subject=c("S2","S3","S5","S7","S8","S9","S11","S12",
          "S1","S2","S3","S4","S7","S9","S10","S11","S12","S13","S14",
          "S1","S2","S3","S5","S7","S8","S9","S10","S11","S12","S13","S14")
d <- data.frame(probability=do.call(c,datList),
                bin=paste0("bin",rep(1:3,sapply(datList,length))),
                subject)

library(nlme)
m1 <- lme(probability~bin,random=~1|subject/bin,data=d)
summary(m1)

The only real problem is that some aspects of the interpretation etc. are pretty far from the classical sum-of-squares-decomposition approach (e.g. it's fairly tricky to do significance tests of variance components). Pinheiro and Bates (Springer, 2000) is highly recommended reading if you're going to head in this direction.

It might be a good idea to simulate/make up some balanced data and do the analysis with both aov() and lme(), look at the output, and make sure you can see where the correspondences are/know what's going on.

One-way repeated measures ANOVA with unbalanced data

1 Answers