Obtaining training Error using Caret package in R

Question

I am using caret package in order to train a K-Nearest Neigbors algorithm. For this, I am running this code:

Control <- trainControl(method="cv", summaryFunction=twoClassSummary, classProb=T)

tGrid=data.frame(k=1:100)

trainingInfo <- train(Formula, data=trainData, method = "knn",tuneGrid=tGrid,
                              trControl=Control, metric =  "ROC")

As you can see, I am interested in obtain the AUC parameter of the ROC. This code works good but returns the testing error (which the algorithm uses for tuning the k parameter of the model) as the mean of the error of the CrossValidation folds. I am interested in return, in addition of the testing error, the training error (the mean across each fold of the error obtained with the training data). ¿How can I do it?

Thank you

topepo topepo · Accepted Answer · 2014-10-12T01:23:13

What you are asking is a bad idea on multiple levels. You will grossly over-estimate the area under the ROC curve. Consider the 1-NN model: you will have perfect predictions every time.

To do this, you will need to run train again and modify the index and indexOut objects:

library(caret)

set.seed(1)
dat <- twoClassSim(200)

set.seed(2)
folds <- createFolds(dat$Class, returnTrain = TRUE)

Control <- trainControl(method="cv", 
                        summaryFunction=twoClassSummary, 
                        classProb=T,
                        index = folds,
                        indexOut = folds)

tGrid=data.frame(k=1:100)

set.seed(3)
a_bad_idea <- train(Class ~ ., data=dat, 
                    method = "knn",
                    tuneGrid=tGrid,
                    trControl=Control, metric =  "ROC")

Max

Obtaining training Error using Caret package in R

1 Answers