3
votes

My code stopped working after updating the mice (Multiple Equations by Chained Equations) package to version >3. I wish to retrieve the estimated variance-covariance matrix from linear regressions on multiply imputed datasets. This quantity (which mice calls t) could be easily accessed in version 2.46.0 using the pool function. In version >3.0 of mice, the pool function does not return the full variance-covariance matrix anymore, it only returns the diagonal elements of the variance-covariance matrix.

Here is a working example:

First create some dataset with missing values:

set.seed(243)
iris$Sepal.Length[sample(length(iris$Sepal.Length), size = 5)] <- NA
iris$Sepal.Width[sample(length(iris$Sepal.Width), size = 5)] <- NA
iris$Petal.Length[sample(length(iris$Petal.Length), size = 5)] <- NA
iris$Species[sample(length(iris$Species), size = 5)] <- NA

Second multiply impute the missing data

iris.mi <- mice(iris, 5)

Third perform linear regression on each of the multiply imputed dataset, storing results in a mira object

mira.out <- with(iris.mi, lm(Sepal.Width ~ Sepal.Length + Petal.Length + Petal.Width + Species))

Fourth, pool results from these analyses using Rubin's rules. This is implemented by the pool function in mice.

pool.out <- pool(analyses)

In version 2.46.0 of the mice package, one could retrieve the full variance covariance matrix t by typing

pool.out$t

In newer versions (>3.0) of the mice package, the pool.out$t object does not exist. All one can do, is retrieve the variances by typing

pool.out$pooled

and selecting the column labeled t. There seems to be no way of accessing the full variance-covariance matrix. All one has access to are the diagonal elements of the matrix, which are stored in the t column of the pool.out$pooled data.frame.

I want to access the full variance covariance matrix because I need to calculate marginal effects and confidence intervals for interacted terms in a linear regression with multiply imputed data. These confidence intervals can be approximated by using only the diagonal elements of the variance-covariance matrix, but it would make much better sense to use the full variance-covariance matrix.

I wonder why this change was implemented in the mice package, and how I might be able to access the variance-covariance matrix in the newer versions.

Thank you for your help.

3
may be worth asking at github.com/stefvanbuuren/mice/issuesuser20650

3 Answers

1
votes

As to why it's no longer available, I assume it has something to do with mice > 3.0 depending on broom functionality to gather model output. Its vignette (https://cran.r-project.org/web/packages/broom/vignettes/broom.html) states "The broom package takes the messy output of built-in functions in R, such as lm, nls, or t.test, and turns them into tidy data frames." While indeed try, this doesn't easily accommodate the vcov matrices of the individual models necessary to derive the full matrix t instead of the currently displayed diagonal of what was previously t.

As a work around, you can derive it yourself (see for instance Dong and Peng 2013 http://springerplus.springeropen.com/articles/10.1186/2193-1801-2-222) with the following code (including your example code with some very minor renaming):

set.seed(243)
iris$Sepal.Length[sample(length(iris$Sepal.Length), size = 5)] <- NA
iris$Sepal.Width[sample(length(iris$Sepal.Width), size = 5)] <- NA
iris$Petal.Length[sample(length(iris$Petal.Length), size = 5)] <- NA
iris$Species[sample(length(iris$Species), size = 5)] <- NA

iris.mi <- mice(iris, 5)

fit.mi <- with(iris.mi, lm(Sepal.Width ~ Sepal.Length + Petal.Length + Petal.Width + Species))

fil.pooled <- pool(fit.mi)

# get the full matrix ubar (instead of only the diagonal)
m <- fil.pooled$m
ubar <- Reduce("+", lapply(fit.mi$analyses, vcov)) / (m)
b <- fil.pooled$pooled$b # this one is still provided by mice

# # or by hand as well
# qbar <- getqbar(fil.pooled)  # pooled estimates  
# b <- 1 / (m-1) * rowSums((sapply(fit.mi$analyses, coef) - qbar)^2)

t <- ubar + (1 + 1 / (m)) * b  # this is t as it used to be

# check versus the diagonal of t that is still provided
all.equal(as.numeric(diag(t)), fil.pooled$pooled$t) # check
0
votes

Faced the same problem. For now, I used Using devtools to install older package version of mice to be able to proceed with my analysis.

0
votes

Here's a minimal working example of how to derive the full ('t') covariance matrix. This uses the equations from Enders (2010) book Applied Missing Data Analysis (page 234), which gives a great introduction to the multivariate versions of the equations to pool variances. For those without access to this text, the equations are also given on page 42 of this article by Krause, Huisman and Snijders (2018).

http://sa-ijas.stat.unipd.it/sites/sa-ijas.stat.unipd.it/files/10.26398-IJAS.0030-002.pdf

The pooled covariance matrix Vt is a combination of the within imputation variance Vw and the between imputation variance Vb.

Vw is the average covariance matrix across m imputed data sets.

Vw = 1/m * the sum of the m covariance matrices.

Vb is the extent to which parameter estimates differ across imputations, and the extent to which this covaries for the different parameters. It's roughly the average of m matrices created from the vectors of the difference between each m imputations' parameter estimates, Qhat-m, and the pooled parameter estimates, Qbar.

Vb = 1/(m - 1)) sum of m [(Qhat-m - Qbar) * transpose(Qhat-m - Qbar)].

Here's a working example based on Jeroen Hoogland's answer to this question.

# data preparation ---------------------

# generating missingness
set.seed(243)
iris$Sepal.Length[sample(length(iris$Sepal.Length), size = 5)] <- NA
iris$Sepal.Width[sample(length(iris$Sepal.Width), size = 5)] <- NA
iris$Petal.Length[sample(length(iris$Petal.Length), size = 5)] <- NA
iris$Species[sample(length(iris$Species), size = 5)] <- NA

# imputing data
iris.mi <- mice(iris, 5)

# fiting models
fit.mi <- with(iris.mi,
               lm(Sepal.Width ~ Sepal.Length + Petal.Length +
                    Petal.Width + Species))

# pooling
fit.pooled <- pool(fit.mi)

# deriving the full variance covariance matrix ----------------------
# m different imputations
m <- fit.pooled$m

# Vw, the within imputation covariance matrix
vw <- Reduce("+", lapply(fit.mi$analyses, vcov)) / (m)

# Vb, the between imputation covariance matrix
# mice now only provides the diagonal elements
bdiag <- fit.pooled$pooled$b

# getting the full Vb matrix by hand
# Qbar, pooled parameter estimates
qbar <- getqbar(fit.pooled)
# Qhats, each imputations parameter estimates
qhats <- sapply(fit.mi$analyses, coef)


vb <- (1 / (m-1)) * (qhats - qbar) %*% t(qhats - qbar)

vt <- vw + (1 + 1 / (m)) * vb  # this is t as it used to be

# checking against the diagonal of t that is still provided
all.equal(as.numeric(diag(vt)), fit.pooled$pooled$t)