trainControl in caret package

Question

In caret package, there is a thing called trainControl that allow us to perform variety of cross validation. To perform 10-fold cross-validation, one would use

fitControl <- trainControl(method= "repeatedcv", number = 10, repeats = 10)
fitJ48_10_fold <- train(x = x, y =y, method = "J48", trControl= fitControl)

while for training set, it is

fitControl <- trainControl(method= "none")
fitJ48train <- train(x = x, y =y, method = "J48", trControl= fitControl)

However, confusion matrix of these model show the same for both 10-fold and training.

Activity <- predict(fitJ48_10_fold, newdata = Train)
confusionMatrix(Activity, Train$Activity)

Activity <- predict(fitJ48train, newdata = Train)
confusionMatrix(Activity, Train$Activity)

I used the weka classifier GUI and indeed the performance of J48 from 10-fold cross validation is lower than that of training set. Am I wrong to suspect that the trainControl from caret isn't working or I pass this in a wrong way?

Yes, Thanks for prompt reply and R community. Data can be accessible at the following link. github.com/Rnewbie/LikitMorganFP/blob/master/… link — doe doe

topepo topepo · Accepted Answer · 2015-02-15T22:27:14

Am I wrong to suspect that the trainControl from caret isn't working or I pass this in a wrong way?

A little. For J48, there is a tuning parameter but the default grid only fits a single value C = 0.25. The final model will be the same no matter what value of method that you use in trainControl so the confusion matrices will always be the same.

Max

trainControl in caret package

1 Answers