I am new to the caret library. I would like to use the train function to run cross-validation on my dataset (using the rpart method for classification). My goal is is to produce learning curves using the data returned from my call to train. The learning curve would plot the dataset size on the x-axis. The error of the predictions on the training and cross validation sets would be plotted as a function of dataset size.
My question is, does caret make predictions on both the training and cv folds? If the answer is yes, how would I go about extracting that data?
Assuming the answer is yes, here is a simple code sample that you could append to to illustrate:
library(MASS)
data(biopsy)
biopsy <- biopsy[, -1]
names(biopsy) <- c("thick", "u.size", "u.shape", "adhsn", "s.size", "nucl", "chrom", "n.nuc", "mit", "class")
biopsy.v2 <- na.omit(biopsy)
set.seed(1)
ind <- sample(2, nrow(biopsy.v2), replace = TRUE, prob = c(0.7, + 0.3))
biop.train <- biopsy.v2[ind == 1, ]
tr.model <- caret::train(class ~ ., data= biop.train, trControl = trainControl(method="cv", number=4, verboseIter = FALSE, savePredictions = "final"), method='rpart')
#Can I extract train and cv accuracies from tr.model?
Thanks.
note: I realize that I may need to call train repeatedly with different samples of my dataset (assuming caret doesn't also support this), and that is not reflected in the code sample here.