Error when running mixed anova in R with two within factor and one between factor

Question

I would like some help to use R to analyze an experiment in which three groups of participants were shown two types of stimulus three times each. The dependent variable is a continuous measure. Here is an example of how the dataset looks like.

 SubjectID  Group        Trial StimType    Measure
1          1 group1 trial3_stimA        A 0.55908866
2          2 group1 trial3_stimA        A 0.98884446
3          3 group2 trial3_stimA        A 0.00000000
4          4 group2 trial3_stimA        A 0.27067991
5          5 group3 trial3_stimA        A 0.37169285
6          6 group3 trial3_stimA        A 0.42113984
7          1 group1 trial3_stimB        B 0.00000000
8          2 group1 trial3_stimB        B 0.49892807
9          3 group2 trial3_stimB        B 0.14602589
10         4 group2 trial3_stimB        B 0.50946555
11         5 group3 trial3_stimB        B 0.25572820
12         6 group3 trial3_stimB        B 0.22932966
13         1 group1 trial1_stimA        A 0.42207604
14         2 group1 trial1_stimA        A 0.85599588
15         3 group2 trial1_stimA        A 0.36428381
16         4 group2 trial1_stimA        A 0.46679336
17         5 group3 trial1_stimA        A 0.69379734
18         6 group3 trial1_stimA        A 0.55607716
19         1 group1 trial1_stimB        B 0.24261465
20         2 group1 trial1_stimB        B 0.35176384
21         3 group2 trial1_stimB        B 0.21116215
22         4 group2 trial1_stimB        B 0.33112544
23         5 group3 trial1_stimB        B 0.00000000
24         6 group3 trial1_stimB        B 0.00000000
25         1 group1 trial2_stimA        A 0.05506943
26         2 group1 trial2_stimA        A 0.22537470
27         3 group2 trial2_stimA        A 0.00000000
28         4 group2 trial2_stimA        A 0.18511144
29         5 group3 trial2_stimA        A 0.15586156
30         6 group3 trial2_stimA        A 0.04467100
31         1 group1 trial2_stimB        B 0.03890585
32         2 group1 trial2_stimB        B 0.29787709
33         3 group2 trial2_stimB        B 0.00000000
34         4 group2 trial2_stimB        B 0.28971992
35         5 group3 trial2_stimB        B 0.12993238
36         6 group3 trial2_stimB        B 0.05066011

Here is the structure of my data

'data.frame':   36 obs. of  5 variables:
 $ SubjectID: Factor w/ 6 levels "1","2","3","4",..: 1 2 3 4 5 6 1 2 3 4 ...
 $ Group    : Factor w/ 3 levels "group1","group2",..: 1 1 2 2 3 3 1 1 2 2 ...
 $ Trial    : Factor w/ 6 levels "trial1_stimA",..: 5 5 5 5 5 5 6 6 6 6 ...
 $ StimType : Factor w/ 2 levels "A","B": 1 1 1 1 1 1 2 2 2 2 ...
 $ Measure  : num  0.559 0.989 0 0.271 0.372 ...

I need to run a mixed ANOVA with groups as between subjects factor and trials and stimulus type as within subjects factor. I have tried using three different R packages and different syntaxes but R either returns an error message or the output is missing the interactions group x trial x stim type.

For example, when I use anova_test() from the rstatix package

#Trying mixed ANOVA with rstatix
> 
> mixed.anova <- anova_test(
+   data = prepared_data, dv = Measure, wid = SubjectID,
+   between = Group, within = c(Trial,StimType)
+ )
Error in check.imatrix(X.design) : 
  Terms in the intra-subject model matrix are not orthogonal.
> get_anova_table(all_subjects)
Error in is.data.frame(x) : object 'all_subjects' not found

When I use the the aov from afex package

> #using afex
> 
> 
> mixed.anova2 <- aov_car(Measure ~ Group*Trial*StimType + Error(1|SubjectID/(Trial*StimType)), prepared_data)
Error: Empty cells in within-subjects design  (i.e., bad data structure).
table(data[c("Trial", "StimType")])
#               StimType
# Trial          A B
#   trial1_stimA 6 0
#   trial1_stimB 0 6
#   trial2_stimA 6 0
#   trial2_stimB 0 6
#   trial3_stimA 6 0
#   trial3_stimB 0 6
> 
> aov.bww

Call:
aov(formula = SCR ~ Group * Trial * CSType + Error(SubjectID) + 
    Group, data = sixPhasesAbs2)

Grand Mean: 397.1325

Stratum 1: SubjectID

Terms:
                     Group  Residuals
Sum of Squares   187283464 6399838881
Deg. of Freedom          2         81

Residual standard error: 8888.777
18 out of 20 effects not estimable
Estimated effects may be unbalanced

Stratum 2: Within

Terms:
                      Trial Group:Trial   Residuals
Sum of Squares    158766098   374583941 12799639740
Deg. of Freedom           5          10         405

Residual standard error: 5621.748
12 out of 27 effects not estimable
Estimated effects may be unbalanced

Finally, I tried using lmer

> #using lmer
> model = lmer(Measure ~Group*Trial*StimType +(1|SubjectID), data = prepared_data)
fixed-effect model matrix is rank deficient so dropping 18 columns / coefficients
> 
> summary(model)
Linear mixed model fit by REML. t-tests use Satterthwaite's method ['lmerModLmerTest']
Formula: Measure ~ Group * Trial * StimType + (1 | SubjectID)
   Data: prepared_data

REML criterion at convergence: -16.8

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-1.1854 -0.5424  0.0000  0.5424  1.1854 

Random effects:
 Groups    Name        Variance Std.Dev.
 SubjectID (Intercept) 0.024226 0.15565 
 Residual              0.006861 0.08283 
Number of obs: 36, groups:  SubjectID, 6

Fixed effects:
                               Estimate Std. Error        df t value Pr(>|t|)    
(Intercept)                    0.639036   0.124672  4.459241   5.126 0.005095 ** 
Groupgroup2                   -0.223497   0.176313  4.459241  -1.268 0.267088    
Groupgroup3                   -0.014099   0.176313  4.459241  -0.080 0.939728    
Trialtrial1_stimB             -0.341847   0.082829 15.000000  -4.127 0.000896 ***
Trialtrial2_stimA             -0.498814   0.082829 15.000000  -6.022 2.34e-05 ***
Trialtrial2_stimB             -0.470644   0.082829 15.000000  -5.682 4.35e-05 ***
Trialtrial3_stimA              0.134931   0.082829 15.000000   1.629 0.124124    
Trialtrial3_stimB             -0.389572   0.082829 15.000000  -4.703 0.000283 ***
Groupgroup2:Trialtrial1_stimB  0.197452   0.117138 15.000000   1.686 0.112550    
Groupgroup3:Trialtrial1_stimB -0.283091   0.117138 15.000000  -2.417 0.028865 *  
Groupgroup2:Trialtrial2_stimA  0.175831   0.117138 15.000000   1.501 0.154095    
Groupgroup3:Trialtrial2_stimA -0.025857   0.117138 15.000000  -0.221 0.828271    
Groupgroup2:Trialtrial2_stimB  0.199966   0.117138 15.000000   1.707 0.108413    
Groupgroup3:Trialtrial2_stimB -0.063997   0.117138 15.000000  -0.546 0.592870    
Groupgroup2:Trialtrial3_stimA -0.415129   0.117138 15.000000  -3.544 0.002946 ** 
Groupgroup3:Trialtrial3_stimA -0.363452   0.117138 15.000000  -3.103 0.007276 ** 
Groupgroup2:Trialtrial3_stimB  0.301779   0.117138 15.000000   2.576 0.021070 *  
Groupgroup3:Trialtrial3_stimB  0.007164   0.117138 15.000000   0.061 0.952043    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Correlation matrix not shown by default, as p = 18 > 12.
Use print(x, correlation=TRUE)  or
    vcov(x)        if you need it

fit warnings:
fixed-effect model matrix is rank deficient so dropping 18 columns / coefficients
> 
> anova(model)
Missing cells for: Trialtrial1_stimB:StimTypeA, Trialtrial2_stimB:StimTypeA, Trialtrial3_stimB:StimTypeA, Trialtrial1_stimA:StimTypeB, Trialtrial2_stimA:StimTypeB, Trialtrial3_stimA:StimTypeB, Groupgroup1:Trialtrial1_stimB:StimTypeA, Groupgroup2:Trialtrial1_stimB:StimTypeA, Groupgroup3:Trialtrial1_stimB:StimTypeA, Groupgroup1:Trialtrial2_stimB:StimTypeA, Groupgroup2:Trialtrial2_stimB:StimTypeA, Groupgroup3:Trialtrial2_stimB:StimTypeA, Groupgroup1:Trialtrial3_stimB:StimTypeA, Groupgroup2:Trialtrial3_stimB:StimTypeA, Groupgroup3:Trialtrial3_stimB:StimTypeA, Groupgroup1:Trialtrial1_stimA:StimTypeB, Groupgroup2:Trialtrial1_stimA:StimTypeB, Groupgroup3:Trialtrial1_stimA:StimTypeB, Groupgroup1:Trialtrial2_stimA:StimTypeB, Groupgroup2:Trialtrial2_stimA:StimTypeB, Groupgroup3:Trialtrial2_stimA:StimTypeB, Groupgroup1:Trialtrial3_stimA:StimTypeB, Groupgroup2:Trialtrial3_stimA:StimTypeB, Groupgroup3:Trialtrial3_stimA:StimTypeB.  
Interpret type III hypotheses with care.
Type III Analysis of Variance Table with Satterthwaite's method
                      Sum Sq  Mean Sq NumDF DenDF F value    Pr(>F)    
Group                0.00723 0.003614     2     3  0.5267 0.6367151    
Trial                0.96171 0.192343     5    15 28.0356 4.172e-07 ***
Group:Trial          0.44102 0.044102    10    15  6.4283 0.0007411 ***
StimType                                                               
Group:StimType                                                         
Trial:StimType                                                         
Group:Trial:StimType                                                   
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

I have searched for similar questions but I have not found any answers that would help me with this problem. I suspect (given the error messages) that I might have to change the structure of the dataset, but I do not know how to do this as I am a beginner in R. How can I run a mixed ANOVA in this dataset?

Len Greski Len Greski · Accepted Answer · 2020-05-16T14:25:02

Since Trial contains information about both Trial and StimType, the fixed effect model matrix is rank deficient. One can see this from the afex::aov_car() output in the original post that illustrates empty cells that are the effect of Trial containing information from two distinct variables.

> mixed.anova2 <- aov_car(Measure ~ Group*Trial*StimType + Error(1|SubjectID/(Trial*StimType)), prepared_data)
Error: Empty cells in within-subjects design  (i.e., bad data structure).
table(data[c("Trial", "StimType")])
#               StimType
# Trial          A B
#   trial1_stimA 6 0
#   trial1_stimB 0 6
#   trial2_stimA 6 0
#   trial2_stimB 0 6
#   trial3_stimA 6 0
#   trial3_stimB 0 6
> 
> aov.bww

We can correct the rank deficiency by splitting the Trial variable into two variables.

With the lme4 package and tidyr::separate() to separate trial from stim value, the analysis looks like this:

textFile <- "rowId SubjectID  Group        Trial StimType    Measure
1          1 group1 trial3_stimA        A 0.55908866
2          2 group1 trial3_stimA        A 0.98884446
3          3 group2 trial3_stimA        A 0.00000000
4          4 group2 trial3_stimA        A 0.27067991
5          5 group3 trial3_stimA        A 0.37169285
6          6 group3 trial3_stimA        A 0.42113984
7          1 group1 trial3_stimB        B 0.00000000
8          2 group1 trial3_stimB        B 0.49892807
9          3 group2 trial3_stimB        B 0.14602589
10         4 group2 trial3_stimB        B 0.50946555
11         5 group3 trial3_stimB        B 0.25572820
12         6 group3 trial3_stimB        B 0.22932966
13         1 group1 trial1_stimA        A 0.42207604
14         2 group1 trial1_stimA        A 0.85599588
15         3 group2 trial1_stimA        A 0.36428381
16         4 group2 trial1_stimA        A 0.46679336
17         5 group3 trial1_stimA        A 0.69379734
18         6 group3 trial1_stimA        A 0.55607716
19         1 group1 trial1_stimB        B 0.24261465
20         2 group1 trial1_stimB        B 0.35176384
21         3 group2 trial1_stimB        B 0.21116215
22         4 group2 trial1_stimB        B 0.33112544
23         5 group3 trial1_stimB        B 0.00000000
24         6 group3 trial1_stimB        B 0.00000000
25         1 group1 trial2_stimA        A 0.05506943
26         2 group1 trial2_stimA        A 0.22537470
27         3 group2 trial2_stimA        A 0.00000000
28         4 group2 trial2_stimA        A 0.18511144
29         5 group3 trial2_stimA        A 0.15586156
30         6 group3 trial2_stimA        A 0.04467100
31         1 group1 trial2_stimB        B 0.03890585
32         2 group1 trial2_stimB        B 0.29787709
33         3 group2 trial2_stimB        B 0.00000000
34         4 group2 trial2_stimB        B 0.28971992
35         5 group3 trial2_stimB        B 0.12993238
36         6 group3 trial2_stimB        B 0.05066011
"
data <- read.table(text = textFile,header = TRUE)

library(dplyr)
library(tidyr)
# split trial from Stim
data %>% separate(Trial,into = c("trialName","stimName")) -> data
library(lme4)
model = lmer(Measure ~Group*trialName*StimType +(1|SubjectID), data = data)

summary(model)

...and the output:

> summary(model)
Linear mixed model fit by REML ['lmerMod']
Formula: Measure ~ Group * trialName * StimType + (1 | SubjectID)
   Data: data

REML criterion at convergence: -16.8

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-1.1854 -0.5424  0.0000  0.5424  1.1854 

Random effects:
 Groups    Name        Variance Std.Dev.
 SubjectID (Intercept) 0.024226 0.15565 
 Residual              0.006861 0.08283 
Number of obs: 36, groups:  SubjectID, 6

Fixed effects:
                                      Estimate Std. Error t value
(Intercept)                            0.63904    0.12467   5.126
Groupgroup2                           -0.22350    0.17631  -1.268
Groupgroup3                           -0.01410    0.17631  -0.080
trialNametrial2                       -0.49881    0.08283  -6.022
trialNametrial3                        0.13493    0.08283   1.629
StimTypeB                             -0.34185    0.08283  -4.127
Groupgroup2:trialNametrial2            0.17583    0.11714   1.501
Groupgroup3:trialNametrial2           -0.02586    0.11714  -0.221
Groupgroup2:trialNametrial3           -0.41513    0.11714  -3.544
Groupgroup3:trialNametrial3           -0.36345    0.11714  -3.103
Groupgroup2:StimTypeB                  0.19745    0.11714   1.686
Groupgroup3:StimTypeB                 -0.28309    0.11714  -2.417
trialNametrial2:StimTypeB              0.37002    0.11714   3.159
trialNametrial3:StimTypeB             -0.18266    0.11714  -1.559
Groupgroup2:trialNametrial2:StimTypeB -0.17332    0.16566  -1.046
Groupgroup3:trialNametrial2:StimTypeB  0.24495    0.16566   1.479
Groupgroup2:trialNametrial3:StimTypeB  0.51946    0.16566   3.136
Groupgroup3:trialNametrial3:StimTypeB  0.65371    0.16566   3.946

Correlation matrix not shown by default, as p = 18 > 12.
Use print(x, correlation=TRUE)  or
    vcov(x)        if you need it

>

UPDATE: Parsing effect names

In the comments to my answer, the questioner explained that the data used to describe trials and stim types were different than what was represented in the data posted with the question. Given the contents of the comment, here is an approach with dplyr to parse out the trial conditions and stim types.

# solving the data cleaning problem in comments

df <- data.frame(treatmentName = c("MeanCondPlusLate",
                   "MeanCondMinusLate", 
                   "MeanExtPlusLate", 
                   "MeanExtMinusLate", 
                   "TestPlus1",
                   "TestMinus1"))

library(dplyr)
df %>% mutate(trial = case_when(grepl("Cond",treatmentName) == TRUE ~ 1,
                                grepl("Ext",treatmentName) == TRUE ~ 2,
                                grepl("Test",treatmentName) == TRUE ~ 3), 
              stimLevel = case_when(grepl("Plus",treatmentName) == TRUE ~ "A",
                                    grepl("Minus",treatmentName) == TRUE ~ "B"))

...and the output:

      treatmentName trial stimLevel
1  MeanCondPlusLate     1         A
2 MeanCondMinusLate     1         B
3   MeanExtPlusLate     2         A
4  MeanExtMinusLate     2         B
5         TestPlus1     3         A
6        TestMinus1     3         B
>

Error when running mixed anova in R with two within factor and one between factor

1 Answers

UPDATE: Parsing effect names