2
votes

I am trying to understand how Caret control setting works. I am running some experiments using cross-validation via Caret control function e.g.

fitControl <- trainControl(## 10-fold CV
                           method = "repeatedcv",
                           number = 10,
                           ## repeated ten times
                           repeats = 10)

or

control <- rfeControl(functions=rfFuncs, method="repeatedcv", number=5, repeats = 5)

My question is that if I set some seed number before I run experiments i.e.

set.seed(5432)
control <- trainControl(...)
results <- train(..., control)
...

Does it guarantee that each fold contains exactly the same samples every time I run an experiment? For example, say I have samples with id = {1:100} and with Caret 10-fold cross-validation, my folds are: fold1 = {1:10}, fold2 = {11:20}, ..., fold10 = {91:100}. My question is if I rerun the experiment using the same seed number, my folds are still exactly the same as the previous run?

I know setting seed number helps with reproducibility, but I just need a confirm answer that that is what exactly what happens.

Many thanks,

1

1 Answers

3
votes

There are 2 ways of setting the seed for reproducibility.

  1. calling set.seed just before the train function.
  2. setting the seed inside trainControl (or rfeControl)

For more info on option 2 check the help, but also this SO question

More detailed information is available on the training page from caret website, section Notes on Reproducibility