I'm trying to realize an Anova 1 factor on a dataset with several measurement for one subject.
> str(LMDAv) #To check class
'data.frame': 1075 obs. of 11 variables:
$ CowID : Factor w/ 71 levels "1921","1923",..: 1 1 1 2 2 2 3 3 3 4 ...
$ Date : Date, format: "2014-01-27" "2014-01-28" "2014-01-29" ...
$ Feeding : Factor w/ 2 levels "hoko","strap": 1 1 1 1 1 1 1 1 1 1 ...
$ Mean : num 246 232 159 115 154 ...
$ SD : num 291.7 178.4 161.2 141.1 73.3 ...
$ Min : num 27 20 14 16 35 13 15 25 37 9 ...
$ Max : num 1634 821 547 838 440 ...
$ MeasTime : Factor w/ 38 levels "60","120","180",..: 1 1 1 1 1 1 1 1 1 1 ...
$ NumObs : num 121 120 121 121 121 122 121 121 121 121 ...
$ MeasTimeLow : Factor w/ 4 levels "60","120","180",..: 1 1 1 1 1 1 1 1 1 1 ...
$ MeasTimeHigh: Factor w/ 5 levels "60","120","180",..: 1 1 1 1 1 1 1 1 1 1 ...
The columns that interest me first are : - Mean, which the variable I want to explain ; - CowID, which is the subject ; - MeasTimeLow, which is the measurement time = factor I want to test. The purpose of this analysis will be to see if it changes something to measure for 60, 120, 180 or 240s. So for each CowID, I have several measurement time but not the same number. I have at least 3 Mean for each couple CowID:MeasurementTime. So here my subject will be CowID and I want to compare each Measurement Time for each CowID.
> table(LMDAv$CowID, LMDAv$MeasTimeLow)
60 120 180 240
1921 3 3 3 6
1923 3 3 3 6
1924 3 3 3 6
1953 3 3 3 6
1962 3 3 3 6
1967 3 3 3 6
1982 3 3 5 4
1989 3 3 3 6
1990 2 2 4 2
1993 3 3 3 6
1995 3 3 3 6
2003 3 3 3 6
2005 3 3 3 6
2007 3 3 3 6
2019 3 3 3 6
2023 3 3 3 6
2028 3 3 3 6
2038 3 3 3 6
2040 3 3 3 6
2045 3 3 3 6
2046 3 3 3 6
2047 3 3 3 6
2049 3 3 3 6
2053 3 3 3 6
2062 3 3 3 6
2067 3 3 3 6
2069 3 3 5 4
2070 3 3 3 6
2094 3 3 3 6
2103 3 3 5 4
2108 3 3 3 6
2111 3 3 3 6
2112 3 3 3 6
2118 3 3 3 6
2124 3 3 3 6
2132 3 3 5 4
2133 3 3 3 6
2134 3 3 3 6
2136 3 3 3 6
2138 3 3 3 6
2140 6 6 6 12
2143 3 3 5 4
2155 3 3 3 6
2161 3 3 3 6
2163 3 3 3 6
2165 3 3 3 6
2171 3 3 3 6
2183 3 3 3 6
2187 3 3 3 6
2200 3 3 3 6
2209 3 3 3 6
2211 3 3 3 6
2213 3 3 3 6
2222 3 3 3 6
2223 3 3 3 6
2227 3 3 3 6
2228 3 3 3 6
2234 3 3 3 6
2235 3 3 3 6
2239 3 3 3 6
2242 3 3 3 6
2245 3 3 3 6
2246 3 3 3 6
2248 3 3 3 6
2252 3 3 3 6
2254 3 3 3 6
2257 3 3 3 6
2259 3 3 3 6
2261 3 3 3 6
2265 3 3 3 6
2275 3 3 3 6
I can't do an aov because of the difference in sample sizes. So I've tried to run a xyplot :
library(Matrix)
library(nlme)
library(lme4)
xyplot(Mean~MeasTimeLow|CowID, type=c("p","r"), data=LMDAv)
But I have an error message telling me that it can't find the command "xyplot" so I ran a lme :
> Time.Cow<-lme(Mean~MeasTimeLow,random=~1|CowID/MeasTimeLow, data=LMDAv)
> summary(Time.Cow)
Linear mixed-effects model fit by REML
Data: LMDAv
AIC BIC logLik
11014.08 11048.91 -5500.04
Random effects:
Formula: ~1 | CowID
(Intercept)
StdDev: 23.50398
Formula: ~1 | MeasTimeLow %in% CowID
(Intercept) Residual
StdDev: 0.002203522 38.24478
Fixed effects: Mean ~ MeasTimeLow
Value Std.Error DF t-value p-value
(Intercept) 135.80891 3.820368 791 35.54865 0.0000
MeasTimeLow120 7.64225 3.688654 210 2.07183 0.0395
MeasTimeLow180 8.59529 3.644843 210 2.35821 0.0193
MeasTimeLow240 12.81500 3.211478 210 3.99037 0.0001
Correlation:
(Intr) MTL120 MTL180
MeasTimeLow120 -0.483
MeasTimeLow180 -0.489 0.506
MeasTimeLow240 -0.554 0.574 0.579
Standardized Within-Group Residuals:
Min Q1 Med Q3 Max
-2.41920844 -0.64816910 -0.05149839 0.59765336 4.81399462
Number of Observations: 1075
Number of Groups:
CowID MeasTimeLow %in% CowID
71 284
And it gives me what I want : the comparison between the different measurement time but I'm not sure about the interpretation and about the fact that it really gives me what I'm looking for. If I take the first table in "Fixed effects: Mean ~ MeasTimeLow", I can conclude that measuring for 60s is significantly different from 120, 180 or 240s right ? The second one is comparing the other one right ?
Then, I was just wondering if this test could work with missing values for a couple CowID:MeasurementTime. (It's not the case here but I may need that for another test). Then, will the lme function work with two factors and unbalanced design ? Is the order between the variable important ? Finally, this Mean value comes from another data frame and it's the average for each CowID on 60, 120,180 and 240s. That's why there's a column SD. Is there a test on the Standard Deviation I could make to be sure that the analysis is okay ? Thanks