How to choose best imputed data using mice

Question

Using mice package I imputed a dataset like:

imp <- mice(nhanes)

It generates 5 imputed datasets for each variables:

imp$imp$bmi
#      1    2    3    4    5
#1  35.3 30.1 26.3 28.7 27.2
#3  30.1 22.0 30.1 28.7 22.0
#4  21.7 27.2 25.5 24.9 21.7
#6  24.9 25.5 24.9 27.5 22.5
#10 20.4 33.2 26.3 27.2 27.4
#11 22.0 27.2 27.2 30.1 22.0
#12 27.4 20.4 27.2 27.2 20.4
#16 30.1 30.1 27.2 22.5 29.6
#21 27.4 27.2 26.3 22.0 30.1

So I do not understand how to choose the best imputed data.

For example for bmi (above) what of 5 columns will be the best choice ?

Thank you

mmarks mmarks · Accepted Answer · 2017-11-16T12:31:15

There isn't a best dataset. Selecting a single dataset would only account for within dataset variation/error but not the between-imputed-datasets variation.

This is why analysis such as regression should utilise the with and pool commands when working with imputed data.

How to choose best imputed data using mice

2 Answers