Caret - Some PreProcessing Options Not Available in Train

Question

In caret::train there are many pre-processing options that can be passed via the 'preProcessing' argument. This makes life super-simple because the test data is then auto-magically pre-processed in the same manner as the training data when calling 'predict.train'. Is it possible to do the same with 'findCorrelation' and 'nearZeroVar' in some manner?

I clearly understand from the documentation why the following code does not work, but I am hoping this clarifies my question. Ideally, I could do the following.

library("caret")
set.seed (1234)
data (iris)

# split test vs training
train.index <- createDataPartition (y = iris[,5], p = 0.80, list = F)
train <- iris [ train.index, ]
test  <- iris [-train.index, ]

# train the model after imputing the missing data
fit <- train (Species ~ ., 
              train, 
              preProcess = c("findCorrelation", "nearZeroVar"), 
              method     = "rpart" )
predict (fit, test)

topepo topepo · Accepted Answer · 2013-11-20T18:10:33

Right now, you are tied to whatever preProcess will do.

However, the next version (around the start of the year, I hope) will allow you to more easily write custom models and pre-processing. For example, you might want to down-sample the data etc.

Let me know if you would like to test that version when we have a beta availible.

Max

Caret - Some PreProcessing Options Not Available in Train

1 Answers