1
votes

I want to impute a part of my data set with mice. My data set has very many variables, which is why I don't want to impute all the variables but only those which I will use in my model. (I know that as much information as possible should be used for the imputation, but I am already using 41 variables, which according to literature should be more than enough.)

My problem: I don't want every variable to be imputed at all times, because I have several measurement points. So of course, my variables at t4 have many missing, but I don't want to impute them when people just haven't filled out the questionnaire at that point.

So I specified a predictor matrix, in which all of the variables at t0 (e.g. A103.0) are imputed, but not at t4 (A103.4). However, when running mice, it just uses "pmm" for all of the variables, and every variables is imputed.

Any suggestions on what went wrong are highly appreciated, I spent quite some time now trying to find out what happened..

This is what I've done:

I create an object with all the columns I want to impute

impute <- c("A103", "A104", "A107", #SVF
            "A302.0", "A303.0", "A304.0", "A305.0", "A306.0",
            "A502_01.0", "A502_02.0", "A502_03.0", "A502_04.0",
            "A504.0","A506.0", "A508.0", "W003.0", "W005.0", 
            "A509_02.0", "A509_03.0", "A509_06.0", "A509_10.0",
            "A302.4", "A303.4", "A304.4", "A305.4", "A306.4", 
            "A502_01.4", "A502_02.4", "A502_03.4", "A502_04.4", 
            "A504.4", "A506.4", "A508.4","W003.4", "W005.4", "SD02_01",
            "SD03",
            "A509_02.4", "A509_03.4", "A509_06.4", "A509_10.4")

I create a subset of the columns (and all rows of course) which I want to impute

imp <- mice(ds_wide[ ,impute], maxit=0)
imp$PredictorMatrix

pred <- imp$predictorMatrix

pred [c("A302.4", "A303.4", "A304.4", "A305.4", "A306.4", #ABB.4
      "A502_01.4", "A502_02.4", "A502_03.4", "A502_04.4", #PSWQ.4
      "A504.4", "A506.4", "A508.4","W003.4", "W005.4", "SD02_01",
      "SD03",
      "A509_02.4", "A509_03.4", "A509_06.4", "A509_10.4"), ] <- 0

View(pred) #looks exactly how I want it to look like

imp <- mice(ds_wide[ ,impute], m=5, predictorMatrix = pred)
miceimp <- complete (imp)
anyNA(miceimp)
View(miceimp)

When I check miceimp (my result), there are no missing values whatsoever, so all the variables at t4 are imputed even though I specified otherwise. What did I do wrong?

Actually, what would be really best for me, would be if I could somehow impute those variables at t4 which do not only have missings. So those people, who filled out t4, should be imputed, and those, who are not at that measurement point, should not. If anyone has any ideas how to make that possible, that would be great!

Many thanks!

2

2 Answers

0
votes

I am not completely sure I understood 100% what you are trying to archive.

I understood, that you do not want to impute all your variables (but you want to include all your variables as input to the algorithm)

You were trying to define the parameter predictorMatrix

predictorMatrix
A numeric matrix of length(blocks) rows and ncol(data) columns, containing 0/1 data specifying the set of predictors to be used for each target column. Each row corresponds to a variable block, i.e., a set of variables to be imputed. A value of 1 means that the column variable is used as a predictor for the target block (in the rows). By default, the predictorMatrix is a square matrix of ncol(data) rows and columns with all 1's, except for the diagonal. Note: For two-level imputation models (which have "2l" in their names) other codes (e.g, 2 or -2) are also allowed.

To me i sounds like this parameter is used to define, what variables are used as input. In comparison the where parameter sounds to me as the correct parameter to specify which variables should be imputed.

where A data frame or matrix with logicals of the same dimensions as data indicating where in the data the imputations should be created. The default, where = is.na(data), specifies that the missing data should be imputed. The where argument may be used to overimpute observed data, or to skip imputations for selected missing values.

So my conclusion would be to try out the where parameter instead of predictorMatrix.

0
votes

In "mice", in addition to specifying "predMatrix" as zero for the variables that should not be imputed, you must specify ("") in "method" for those variables.