First, if you happen to use an older version of R and caret, be sure to use named lists for the index
parameter (not using such might cause relatively hard to track down errors, e.g. this one).
Max, the maintainer of caret, stated in this answer that in tuneControl
the method
parameter does not matter any more if you set the index
parameter - which makes sense as you thereby define which samples your partitions contain, and how many partitions you have, which specifies pretty much the resampling process.
Here's a minimal working example as a reference (note the naming of index
):
library(caret)
library(plyr)
# 5CV with 3 repeats = 15 partitions
m1 <- train(x = iris[,1:2], y = iris[,5], method='lda',
trControl = trainControl(method = 'repeatedcv', number = 5, repeats = 3))
# similar behaviour with using index
index <- llply(1:15, function(x) sample(nrow(iris), round(nrow(iris)*4/5)))
names(index) <- 1:15
m2 <- train(x = iris[,1:2], y = iris[,5], method='lda', trControl = trainControl(index = index))
This is what your index
could look like:
> str(index)
List of 15
$ 1 : int [1:120] 47 28 91 54 130 53 37 19 5 85 ...
$ 2 : int [1:120] 65 58 39 120 127 80 102 145 97 132 ...
$ 3 : int [1:120] 113 14 7 62 65 99 108 105 76 123 ...
$ 4 : int [1:120] 124 92 46 1 27 140 33 147 57 6 ...
[...]
PS: if you don't set indexOut
in trainControl
, all samples that are not in a particular partition of index
will be used to evaluate the model trained with this partition. This might be undesired in case you want to subset the evaluation samples as well. See this answer for more details.