I noticed that predict() will only create predictions on complete cases. I had included medianImpute
in the preProcess options, such as the following:
train(outcome ~ .,
data = df,
method = "rf",
tuneLength = 5,
preProcess = c("YeoJohnson", "center", "scale", "medianImpute"),
metric = 'ROC',
trControl = train_ctrl)
}
Does this mean that I should be doing imputation for the missing values before training the set? If not, I am unable to create a prediction for all cases in the test set. I had read in Dr. Kuhn's book that pre-processing should occur during cross validation... Thanks!