I am looking for feedback to determine how to correctly specify random effects to account for correlation in a repeated measures design, but with multiple levels of correlation (including the data being longitudinal for each combination of predictors). The outcome is binary, so I will be fitting a logistic mixed model. I was planning to use the glmer()
function from the lme4
package. If you're wondering how these data arise, one example is from an eye tracker: people's eyes are "tracked" for 30 seconds, e.g., under different levels of the predictors, determining if they looked at a certain object on the screen or not (hence the binary outcome).
Study design (which can be seen by processing the code under "Dummy dataset" below in R):
- The outcome (Binary_outcome) is binary.
- There are repeated measures: each subject's binary response is recorded multiple times within each combination of predictors (see "Dummy dataset" below for structure).
- There are two predictors of interest (both binary, categorical):
- One between-subjects factor, Sex (male/female).
- One within-subjects factor, Intervention (pre/post).
- Each subject is measured over six trials (under which there are repeated measures), Trial.
- Note there are 12 possible trials a person could be assigned. Thus, not every subject is in all 12 trials, but rather a random set of 6 trials.
- Trial is not a variable of interest. It is merely thought that observations within an individual, within a trial could be more alike, and thus Trial should also be accounted for as a form of cluster correlation.
Dummy dataset: Shows the general structure of my data (although this is not the actual dataset):
structure(list(Subject = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), Trial = c("A", "A",
"A", "B", "B", "B", "C", "C", "C", "D", "D", "D", "E", "E", "E",
"F", "F", "F", "G", "G", "G", "E", "E", "E", "D", "D", "D", "A",
"A", "A", "J", "J", "J", "L", "L", "L"), Intervention = c("Pre", "Pre", "Pre", "Pre",
"Pre", "Pre", "Pre", "Pre", "Pre", "Post", "Post", "Post", "Post",
"Post", "Post", "Post", "Post", "Post", "Pre", "Pre", "Pre",
"Pre", "Pre", "Pre", "Pre", "Pre", "Pre", "Post", "Post", "Post",
"Post", "Post", "Post", "Post", "Post", "Post"), Sex = c("Female",
"Female", "Female", "Female", "Female", "Female", "Female", "Female",
"Female", "Female", "Female", "Female", "Female", "Female", "Female",
"Female", "Female", "Female", "Male", "Male", "Male", "Male",
"Male", "Male", "Male", "Male", "Male", "Male", "Male", "Male",
"Male", "Male", "Male", "Male", "Male", "Male"), Binary_outcome = c(1L,
1L, 1L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 1L, 1L, 1L, 0L, 0L,
1L, 1L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 0L, 1L,
1L, 1L, 1L)), class = "data.frame", row.names = c(NA, -36L))
Current code being used: This is what I'm using currently, but I do not know if I should be specifying the random effects differently based on the structure of the data (outlined below under "Accounting correctly for correlation").
install.packages("lme4")
library(lme4)
logit_model <- glmer(Binary_outcome ~ factor(Sex)*factor(Intervention) +
(1 | Trial) +
(1 | Subject),
data = data01,
family="binomial")
Accounting correctly for correlation: This is where my question lies. Comments/Questions:
- I believe both the Subject and Trial random effects are crossed (not nested), because Subject 1 is always Subject 1, and Trial A is always Trial A. There is no way to re-number/re-letter these as you could if the design were nested (see, e.g.: https://stats.stackexchange.com/questions/228800/crossed-vs-nested-random-effects-how-do-they-differ-and-how-are-they-specified).
- As can be seen above under "Current code being used," I have included the fixed effects of interest (Sex, Intervention, and Sex**Intervention*), and random intercepts for Trial and Subject using
+ (1 | Trial) + (1 | Subject)
.- Does
+ (1 | Trial) + (1 | Subject)
correctly "tell" the model to account for the correlation within a person, within a trial, or does this need to be specified in another way? Even though I don't think the random effects are nested, it still feels like there's a "hierarchy," but maybe this is already accounted for by+ (1 | Trial) + (1 | Subject)
. - These data seem unique in that, even within a trial, there are multiple measurements (0s/1s) for each subject. I am unsure of the implications of this with regard to the model fitting.
- Do I need to further tell the model to differentiate the within- and between-subjects fixed effects? Or does the code "pick-up" on this "automatically" with
+ (1 | Trial) + (1 | Subject)
? It correctly does this when you simply specify a random intercept for subject inlme()
with+ (1 | Subject)
, oraov()
with+ Error(Subject)
, for example. This is why I simply used+ (1 | Trial) + (1 | Subject)
here.
- Does
- Lastly, I don't know if it matters that not every subject gets every trial (but rather 6 out of 12 possible trials), and if this affects some aspect of the code.
I am looking for your feedback, and preferably also the reference(s) (texts, peer-reviewed papers) used to determine your feedback. I have multiple texts on logistic regression, broader categorical data analysis, and mixed models, but - as far as I can tell - none of them bring together the ideas I have posed here. Thus, knowing if a resource that is particularly useful to this situation would also be helpful.
+ (1 | Item) + (1 | Subj)
, which is equivalent to my+ (1 | Trial) + (1 | Subject)
in my post. – Meg+ (1 | Word) + (1 | Subject)
, so this may indicate the answer to my question about whether or not it matters that not all subjects have the same trials (in my case, images) is no. – Meg