Single factor repeated Anova Unbalanced design

Question

I'm trying to realize an Anova 1 factor on a dataset with several measurement for one subject.

> str(LMDAv) #To check class
'data.frame':   1075 obs. of  11 variables:
 $ CowID       : Factor w/ 71 levels "1921","1923",..: 1 1 1 2 2 2 3 3 3 4 ...
 $ Date        : Date, format: "2014-01-27" "2014-01-28" "2014-01-29" ...
 $ Feeding     : Factor w/ 2 levels "hoko","strap": 1 1 1 1 1 1 1 1 1 1 ...
 $ Mean        : num  246 232 159 115 154 ...
 $ SD          : num  291.7 178.4 161.2 141.1 73.3 ...
 $ Min         : num  27 20 14 16 35 13 15 25 37 9 ...
 $ Max         : num  1634 821 547 838 440 ...
 $ MeasTime    : Factor w/ 38 levels "60","120","180",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ NumObs      : num  121 120 121 121 121 122 121 121 121 121 ...
 $ MeasTimeLow : Factor w/ 4 levels "60","120","180",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ MeasTimeHigh: Factor w/ 5 levels "60","120","180",..: 1 1 1 1 1 1 1 1 1 1 ...

The columns that interest me first are : - Mean, which the variable I want to explain ; - CowID, which is the subject ; - MeasTimeLow, which is the measurement time = factor I want to test. The purpose of this analysis will be to see if it changes something to measure for 60, 120, 180 or 240s. So for each CowID, I have several measurement time but not the same number. I have at least 3 Mean for each couple CowID:MeasurementTime. So here my subject will be CowID and I want to compare each Measurement Time for each CowID.

> table(LMDAv$CowID, LMDAv$MeasTimeLow)

       60 120 180 240
  1921  3   3   3   6
  1923  3   3   3   6
  1924  3   3   3   6
  1953  3   3   3   6
  1962  3   3   3   6
  1967  3   3   3   6
  1982  3   3   5   4
  1989  3   3   3   6
  1990  2   2   4   2
  1993  3   3   3   6
  1995  3   3   3   6
  2003  3   3   3   6
  2005  3   3   3   6
  2007  3   3   3   6
  2019  3   3   3   6
  2023  3   3   3   6
  2028  3   3   3   6
  2038  3   3   3   6
  2040  3   3   3   6
  2045  3   3   3   6
  2046  3   3   3   6
  2047  3   3   3   6
  2049  3   3   3   6
  2053  3   3   3   6
  2062  3   3   3   6
  2067  3   3   3   6
  2069  3   3   5   4
  2070  3   3   3   6
  2094  3   3   3   6
  2103  3   3   5   4
  2108  3   3   3   6
  2111  3   3   3   6
  2112  3   3   3   6
  2118  3   3   3   6
  2124  3   3   3   6
  2132  3   3   5   4
  2133  3   3   3   6
  2134  3   3   3   6
  2136  3   3   3   6
  2138  3   3   3   6
  2140  6   6   6  12
  2143  3   3   5   4
  2155  3   3   3   6
  2161  3   3   3   6
  2163  3   3   3   6
  2165  3   3   3   6
  2171  3   3   3   6
  2183  3   3   3   6
  2187  3   3   3   6
  2200  3   3   3   6
  2209  3   3   3   6
  2211  3   3   3   6
  2213  3   3   3   6
  2222  3   3   3   6
  2223  3   3   3   6
  2227  3   3   3   6
  2228  3   3   3   6
  2234  3   3   3   6
  2235  3   3   3   6
  2239  3   3   3   6
  2242  3   3   3   6
  2245  3   3   3   6
  2246  3   3   3   6
  2248  3   3   3   6
  2252  3   3   3   6
  2254  3   3   3   6
  2257  3   3   3   6
  2259  3   3   3   6
  2261  3   3   3   6
  2265  3   3   3   6
  2275  3   3   3   6

I can't do an aov because of the difference in sample sizes. So I've tried to run a xyplot :

library(Matrix)
library(nlme)
library(lme4)
xyplot(Mean~MeasTimeLow|CowID, type=c("p","r"), data=LMDAv)

But I have an error message telling me that it can't find the command "xyplot" so I ran a lme :

> Time.Cow<-lme(Mean~MeasTimeLow,random=~1|CowID/MeasTimeLow, data=LMDAv)
> summary(Time.Cow)
Linear mixed-effects model fit by REML
 Data: LMDAv 
       AIC      BIC   logLik
  11014.08 11048.91 -5500.04

Random effects:
 Formula: ~1 | CowID
        (Intercept)
StdDev:    23.50398

 Formula: ~1 | MeasTimeLow %in% CowID
        (Intercept) Residual
StdDev: 0.002203522 38.24478

Fixed effects: Mean ~ MeasTimeLow 
                   Value Std.Error  DF  t-value p-value
(Intercept)    135.80891  3.820368 791 35.54865  0.0000
MeasTimeLow120   7.64225  3.688654 210  2.07183  0.0395
MeasTimeLow180   8.59529  3.644843 210  2.35821  0.0193
MeasTimeLow240  12.81500  3.211478 210  3.99037  0.0001
 Correlation: 
               (Intr) MTL120 MTL180
MeasTimeLow120 -0.483              
MeasTimeLow180 -0.489  0.506       
MeasTimeLow240 -0.554  0.574  0.579

Standardized Within-Group Residuals:
        Min          Q1         Med          Q3         Max 
-2.41920844 -0.64816910 -0.05149839  0.59765336  4.81399462 

Number of Observations: 1075
Number of Groups: 
                 CowID MeasTimeLow %in% CowID 
                    71                    284

And it gives me what I want : the comparison between the different measurement time but I'm not sure about the interpretation and about the fact that it really gives me what I'm looking for. If I take the first table in "Fixed effects: Mean ~ MeasTimeLow", I can conclude that measuring for 60s is significantly different from 120, 180 or 240s right ? The second one is comparing the other one right ?

Then, I was just wondering if this test could work with missing values for a couple CowID:MeasurementTime. (It's not the case here but I may need that for another test). Then, will the lme function work with two factors and unbalanced design ? Is the order between the variable important ? Finally, this Mean value comes from another data frame and it's the average for each CowID on 60, 120,180 and 240s. That's why there's a column SD. Is there a test on the Standard Deviation I could make to be sure that the analysis is okay ? Thanks

Jens Jens · Accepted Answer · 2016-12-07T14:19:57

With your LME model you have tested all values against a baseline value. This is the standard form of contrasts used in R. So your output tells you that the average value for 60 was 135 and this value is significantly different from zero. Your 120 is now 7 values higher (no negative sign in front of the 7) so the mean value for 120 was 135+7= 142 +-3.6. This value was significantly different from 135 (intercept) with p=0.035. The next value, 180 is 8 values higher than 135 = 143, but I think it would not be significantly different from 120. The p-value relates to the comparison of the intercept (135) to the value of 180 (143). Finally, 240 is 12 values higher than 135 and is highly significant.

Your random effect for cowID (are these really cows?) tells you that the standard deviation of all cowID values is 23.5 which is to be expected, because you have the increase in your data.

There are two things that could improve your model. First, take out the nested random effect. cowID should be enough? As MeasTimeLow is your parameter of interest (fixed), so your actual treatment, why should you also have it as a random term?

Second, if you are interested whether any other Time class differs between each other, you could add a multiple comparison i.e. a planned contrast or a simple post-hoc test. Check out the multcomp package in R.

HTH

Single factor repeated Anova Unbalanced design

1 Answers