3
votes

I tried to understand the 5 fold cross validation algorithm in Caret package but I could not find out how to get train set and test set for each fold and I also could not find this from the similar suggested questions. Imagine if I want to do cross validation by random forest method, I do the following:

set.seed(12)
train_control <- trainControl(method="cv", number=5,savePredictions = TRUE)
rfmodel <- train(Species~., data=iris, trControl=train_control, method="rf")
first_holdout <- subset(rfmodel$pred, Resample == "Fold1")
str(first_holdout)
'data.frame':   90 obs. of  5 variables:
$ pred    : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1     
$ obs     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 
$ rowIndex: int  2 3 9 11 25 29 35 36 41 50 ...
$ mtry    : num  2 2 2 2 2 2 2 2 2 2 ...
$ Resample: chr  "Fold1" "Fold1" "Fold1" "Fold1" ...

Are these 90 observations in Fold1 used as training set? If yes then where is the test set for this fold?

1
No need to do it manually. Check str(rfModel) You will find it there in index and indexOut having samples roow indexthat went to train and hold out.Sowmya S. Manian

1 Answers

0
votes
 str(rfmodel)

Model performed stores everything in the below form. control in the below stores the indexes for samples that went to Train and respective hold outs in index and indexOut.

 names(rfmodel)
 #  [1] "method"       "modelInfo"    "modelType"    "results"      "pred"        
 #  [6] "bestTune"     "call"         "dots"         "metric"       "control"     
 # [11] "finalModel"   "preProcess"   "trainingData" "resample"     "resampledCM" 
 # [16] "perfNames"    "maximize"     "yLimits"      "times"        "levels"      
 # [21] "terms"        "coefnames"    "xlevels" 

Path to indexes of Train and Hold Out samples

 # Indexes of Hold Out Sets
 rfmodel$control$indexOut

 # Indexes of Train Sets for above hold outs
 rfmodel$control$index