Summary statistics for imputed data from Zelig & Amelia

Question

I'm using Amelia to impute the missing values.

While I'm able to use Zelig and Amelia to do some calculations...

How do I use these packages to find the pooled means and standard deviations of the newly imputed data?

library(Amelia)
library(Zelig)

n= 100
x1= rnorm(n,0,1) #random normal distribution
x2= .4*x1+rnorm(n,0,sqrt(1-.4)^2) #x2 is correlated with x1, r=.4
x1= ifelse(rbinom(n,1,.2)==1,NA,x1) #randomly creating missing values
d= data.frame(cbind(x1,x2))

m=5 #set 5 imputed data frames
d.imp=amelia(d,m=m) #imputed data

summary(d.imp) #provides summary of imputation process

Are you looking for something more than foo <- function(x, fcn) apply(x, 2, fcn) means <- lapply(d.imp$imputations, mean) lapply(d.imp$imputations, foo, fcn = sd) — bsbk
Yes I'd like to be a be able to calculate the pooled means and sds — User7598

bsbk bsbk · Accepted Answer · 2015-03-28T02:22:19

I couldn't figure out how to format the code in a comment so here it is.

foo <- function(x, fcn) apply(x, 2, fcn)
lapply(d.imp$imputations, foo, fcn = mean)
lapply(d.imp$imputations, foo, fcn = sd)

d.imp$imputations gives a list of all the imputed data sets. You can work with that list however you are comfortable with to get out the means and sds by column and then pool as you see fit. Same with correlations.

lapply(d.imp$imputations, cor)

Edit: After some discussion in the comments I see that what you are looking for is how to combine results using Rubin's rules for, for example, the mean of imputed data sets generated by Amelia. I think you should clarify in the title and body of your post that what you are looking for is how to combine results over imputations to get appropriate standard errors with Rubin's rules after imputing with package Amelia. This was not clear from the title or original description. "Pooling" can mean different things, particularly w.r.t. variances.

The mi.meld function is looking for a q matrix of estimates from each imputation, an se matrix of the corresponding se estimates, and a logical byrow argument. See ?mi.meld for an example. In your case, you want the sample means and se_hat(sample means) for each of your imputed data sets in the q and se matrices to pass to mi_meld, respectively.

q <- t(sapply(d.imp$imputations, foo, fcn = mean))
se <- t(sapply(d.imp$imputations, foo, fcn = sd)) / sqrt(100)
output <- mi.meld(q = q, se = se, byrow = TRUE)

should get you what you're looking for. For other statistics than the mean, you will need to get an SE either analytically, if available, or by, say, bootstrapping, if not.

Summary statistics for imputed data from Zelig & Amelia

1 Answers