10
votes

In the train function of the caret package it is possible to perform centering and scaling of predictors as in the following example:

knnFit <- train(Direction ~ ., data = training, method = "knn",
                preProcess = c("center","scale"))

Setting this transformation in train should give a better evaluation of the performance of the algorithm during resampling.

In this case when I use the model to predict the response for new data should I care about centering and scaling or this operation is included in the final model?

Is the following operation sufficient?

pred <- predict(knnFit, newdata = test)

Thanks!

1
No, previously you should center and scaling. stackoverflow.com/questions/15468866/… and stackoverflow.com/questions/15215457/…PereG

1 Answers

8
votes

preProces specified in the train object will be applied to the new data without preprocessing the new data first. So your operation is sufficient.

Also have a look at the extract from the caret website below. There is also a whole section purely about preprocessing. Definitely worth your time reading through it.

You can find the caret website here.

These processing steps would be applied during any predictions generated using predict.train, extractPrediction or extractProbs (see details later in this document). The pre-processing would not be applied to predictions that directly use the object$finalModel object.