1
votes

I'm fitting a model using caret and I have some missing data. I recall once before passing a argument to train "preProcess = "medianImpute" however I am receiving an unexpected error:

library(caret)
x <- mtcars
x[1:5, "cyl"] <- c(NA, NA, NA, NA, NA)

mod.mt <- train(
  mpg ~.,
  method = "rpart", # decision tree
  tuneLength = 3,
  preProcess = "medianImpute",
  data = x)

Gives:

Error in na.fail.default(list(mpg = c(21, 21, 22.8, 21.4, 18.7, 18.1,  : 
  missing values in object

Because I was using preProcess I thought I was telling caret to use median impute for any missing values. So this error was unexpected?

1

1 Answers

6
votes

Pre-processing code is only designed to work when x is a simple matrix or data frame. Basicly doesn't work when using train with a formula interface.

Code below works. Or first do preProces, predict and then train (2nd part of code).

mod.mt <- train(
  x = x[,2:10],
  y = x$mpg,
  method = "rpart", # decision tree
  tuneLength = 3,
  preProcess = "medianImpute"
  )

# first impute / predict 
d <- preProcess(x, "medianImpute")
x1 <- predict(d, x)

mod.mt <- train(
  mpg ~.,
   data = x1,
  method = "rpart", # decision tree
  tuneLength = 3
)