I have been performing multiple imputation using the mice-package (van Buuren) in R, with m = 50 (50 imputation datasets) and 20 iterations for roughly 9 variables with missing data (MAR = missing at random) ranging from 5-13 %. After this, I want to proceed with estimating descriptive statistics for my dataset (i.e. not use complete case analysis only for descriptive statistics, but also compare the results with the descriptive statistics from my imputation). So my question is now, how to proceed.
I know that the correct procedure for dealing with MICE-data is:
- Impute the missing data by the mice function, resulting in a multiple imputed data set (class mids);
- Fit the model of interest (scientific model) on each imputed data set by the with() function, resulting an object of class mira;
- Pool the estimates from each model into a single set of estimates and standard errors, resulting is an object of class mipo; Optionally, compare pooled estimates from different scientific models by the D1() or D3() functions.
My problem is that I do not understand how to apply this theory on my data. I have done:
#Load package:
library(mice)
library(dplyr)
#Perform imputation:
Imp_Data <- mice(MY_DATA, m=50, method = "pmm", maxit = 20, seed = 123)
#Make the imputed data in long format:
Imp_Data_Long <- complete(Imp_Data, action = "long", include = FALSE)
I then assumed that this procedure is correct for getting the median of the BMI variable, where the .imp variable is the number of the imputation dataset (i.e. from 1-50):
BMI_Medians_50 <- Imp_Data_Long %>% group_by(.imp, Smoker) %>% summarise(Med_BMI = median(BMI))
BMI_Median_Pooled <- mean(BMI_Medians_50$Med_BMI)
I might have understood things completely wrong, but I have very much tried to understand the right procedure and found very different procedures here on StackOverflow and StatQuest.