1
votes

Alternate title: Model matrix and set of coefficients show different numbers of variables

I am using the mice package for R to do some analyses. I wanted to compare two models (held in mira objects) using pool.compare(), but I keep getting the following error:

Error in model.matrix(formula, data) %*% coefs : non-conformable arguments

The binary operator %*% indicates matrix multiplication in R.

The expression model.matrix(formula, data) produces "The design matrix for a regression-like model with the specified formula and data" (from the R Documentation for model.matrix {stats}).

In the error message, coefs is drawn from est1$qbar, where est1 is a mipo object, and the qbar element is "The average of complete data estimates. The multiple imputation estimate." (from the documentation for mipo-class {mice}).

In my case

  • est1$qbar is a numeric vector of length 36
  • data is a data.frame with 918 observations of 82 variables
  • formula is class 'formula' containing the formula for my model
  • model.matrix(formula, data) is a matrix with dimension 918 x 48.

How can I resolve/prevent this error?

1
This is more about how R works than about statistics per se. It probably belongs on Stack Overflow rather than here. We can migrate it there.gung - Reinstate Monica
Do you think I should change the title of the question to be more code oriented?Paul de Barros
It's hard to say; it's up to you.gung - Reinstate Monica

1 Answers

1
votes

As occasionally happens, I found the answer to my own question while writing the question.

The clue I was that the estimates for categorical variables in est1.qbar only exist if that level of that variables was present in the data. Some of my variables are factor variables where not every level is represented. This caused the warning "contrasts dropped from factor variable name due to missing levels", which I foolishly ignored.

On the other hand, looking at dimnames(model.matrix.temp)[[2]] shows that the model matrix has one column for each level of each factor variable, regardless of whether that level of that variable was present in the data. So, although the contrasts for missing factor levels are dropped in terms of estimating the coefficients, those factor levels still appear in the model matrix. This means that the model matrix has more columns than the length of est1.qbar (the vector of estimated coefficients), so matrix multiplication is not going to work.

The answer here is to fix the factor variables so that there are no unused levels. This can be done with the factor() function (as explained here). Unfortunately, this needs to be done on the original dataset, prior to imputation.