Custom training rows when using caret package

Question

I am using the caret package in R to fit several models. When I use the trainControl function without setting the index argument everything works fine for various methods. However when I want to set manually the training rows for the different folds through the index argument, I get the following error when fitting:
Something is wrong; all the RMSE metric values are missing I tried different method arguments as the documentation doesn't tell which one the user should use with the index argument set.

I will provide an example if the answer is not trivial.

Thanks!

geekoverdose geekoverdose · Accepted Answer · 2016-06-10T19:29:29

First, if you happen to use an older version of R and caret, be sure to use named lists for the index parameter (not using such might cause relatively hard to track down errors, e.g. this one).

Max, the maintainer of caret, stated in this answer that in tuneControl the method parameter does not matter any more if you set the index parameter - which makes sense as you thereby define which samples your partitions contain, and how many partitions you have, which specifies pretty much the resampling process.

Here's a minimal working example as a reference (note the naming of index):

library(caret)
library(plyr)
# 5CV with 3 repeats = 15 partitions
m1 <- train(x = iris[,1:2], y = iris[,5], method='lda', 
            trControl = trainControl(method = 'repeatedcv', number = 5, repeats = 3))

# similar behaviour with using index
index <- llply(1:15, function(x) sample(nrow(iris), round(nrow(iris)*4/5)))
names(index) <- 1:15
m2 <- train(x = iris[,1:2], y = iris[,5], method='lda', trControl = trainControl(index = index))

This is what your index could look like:

> str(index)
List of 15
 $ 1 : int [1:120] 47 28 91 54 130 53 37 19 5 85 ...
 $ 2 : int [1:120] 65 58 39 120 127 80 102 145 97 132 ...
 $ 3 : int [1:120] 113 14 7 62 65 99 108 105 76 123 ...
 $ 4 : int [1:120] 124 92 46 1 27 140 33 147 57 6 ...
 [...]

PS: if you don't set indexOut in trainControl, all samples that are not in a particular partition of index will be used to evaluate the model trained with this partition. This might be undesired in case you want to subset the evaluation samples as well. See this answer for more details.

Custom training rows when using caret package

1 Answers