4
votes

I have a question regarding the aggregation of imputed data as created by the R-package 'mice'.

As far as I understand it, the 'complete'-command of 'mice' is applied to extract the imputed values of, e.g., the first imputation. However, when running a total of ten imputations, I am not sure, which imputed values to extract. Does anyone know how to extract the (aggregate) imputed data across all imputations?

Since I would like to enter the data into MS Excel and perform further calculations in another software tool, such a command would be very helpful.

Thank you for your comments. A simple example (from 'mice' itself) can be found below:

R> library("mice")
R> nhanes
R> imp <- mice(nhanes, seed = 23109) #create imputation
R> complete(imp) #extraction of the five imputed datasets (row-stacked matrix)

How can I aggregate the five imputed data sets and extract the imputed values to Excel?

3
I think it stands for National Health and Nutrition Examination SurveyMetrics

3 Answers

3
votes

I had similar issue. I used the code below which is good enough to numeric vars. For others I thought about randomly choose one of the imputed results (because averaging can disrupt it).

My offered code is (for numeric):

tempData <- mice(data,m=5,maxit=50,meth='pmm',seed=500)
completedData <- complete(tempData, 'long')
a<-aggregate(completedData[,3:6] , by = list(completedData$.id),FUN= mean)
  1. you should join the results back.
  2. I think the 'Hmisc' is a better package.
  3. if you already found nicer/ more elegant/ built in solution - please share with us.
1
votes

You should use complete(imp,action="long") to get values for each imputation. If you see ?complete, you will find

complete(x, action = 1, include = FALSE)

Arguments

x   
An object of class mids as created by the function mice().

action  
If action is a scalar between 1 and x$m, the function returns the data with imputation number action filled in. Thus, action=1 returns the first completed data set, action=2 returns the second completed data set, and so on. The value of action can also be one of the following strings: 'long', 'broad', 'repeated'. See 'Details' for the interpretation.

include 
Flag to indicate whether the orginal data with the missing values should be included. This requires that action is specified as 'long', 'broad' or 'repeated'.

So, the default is to return the first imputed values. In addition, the argument action can also be a string: long, broad, and repeated. If you enter long, it will give you the data in long format. You can also set include = TRUE if you want the original missing data.

0
votes

ok, but still you have to choose one imputed dataset for further analyses... I think the best option is to analyze using your complete(imp,action="long") and pool the results afterwards.fit <- with(data=imp,exp=lm(bmi~hyp+chl)) pool(fit)

but I also assume its not forbidden to use just one of the imputed datasets ;)