I want to impute a part of my data set with mice. My data set has very many variables, which is why I don't want to impute all the variables but only those which I will use in my model. (I know that as much information as possible should be used for the imputation, but I am already using 41 variables, which according to literature should be more than enough.)
My problem: I don't want every variable to be imputed at all times, because I have several measurement points. So of course, my variables at t4 have many missing, but I don't want to impute them when people just haven't filled out the questionnaire at that point.
So I specified a predictor matrix, in which all of the variables at t0 (e.g. A103.0) are imputed, but not at t4 (A103.4). However, when running mice, it just uses "pmm" for all of the variables, and every variables is imputed.
Any suggestions on what went wrong are highly appreciated, I spent quite some time now trying to find out what happened..
This is what I've done:
I create an object with all the columns I want to impute
impute <- c("A103", "A104", "A107", #SVF
"A302.0", "A303.0", "A304.0", "A305.0", "A306.0",
"A502_01.0", "A502_02.0", "A502_03.0", "A502_04.0",
"A504.0","A506.0", "A508.0", "W003.0", "W005.0",
"A509_02.0", "A509_03.0", "A509_06.0", "A509_10.0",
"A302.4", "A303.4", "A304.4", "A305.4", "A306.4",
"A502_01.4", "A502_02.4", "A502_03.4", "A502_04.4",
"A504.4", "A506.4", "A508.4","W003.4", "W005.4", "SD02_01",
"SD03",
"A509_02.4", "A509_03.4", "A509_06.4", "A509_10.4")
I create a subset of the columns (and all rows of course) which I want to impute
imp <- mice(ds_wide[ ,impute], maxit=0)
imp$PredictorMatrix
pred <- imp$predictorMatrix
pred [c("A302.4", "A303.4", "A304.4", "A305.4", "A306.4", #ABB.4
"A502_01.4", "A502_02.4", "A502_03.4", "A502_04.4", #PSWQ.4
"A504.4", "A506.4", "A508.4","W003.4", "W005.4", "SD02_01",
"SD03",
"A509_02.4", "A509_03.4", "A509_06.4", "A509_10.4"), ] <- 0
View(pred) #looks exactly how I want it to look like
imp <- mice(ds_wide[ ,impute], m=5, predictorMatrix = pred)
miceimp <- complete (imp)
anyNA(miceimp)
View(miceimp)
When I check miceimp (my result), there are no missing values whatsoever, so all the variables at t4 are imputed even though I specified otherwise. What did I do wrong?
Actually, what would be really best for me, would be if I could somehow impute those variables at t4 which do not only have missings. So those people, who filled out t4, should be imputed, and those, who are not at that measurement point, should not. If anyone has any ideas how to make that possible, that would be great!
Many thanks!