1
votes

The way to create stratified folds for cv in caret is like this

library(caret)
library(data.table)
train_dat <- data.table(group = c(rep("group1",10), rep("group2",5)), x1 = rnorm(15), x2 = rnorm(15), label = factor(c(rep("treatment",15), rep("control",15))))

folds <- createFolds(train_dat[, group], k = 5)

fitCtrl <- trainControl(method = "cv", index = folds, classProbs = T, summaryFunction = twoClassSummary)
train(label~., data = train_dat[, !c("group"), with = F], trControl = fitCtrl, method = "xgbTree", metric = "ROC")

To balance group1 and group2, the creation of fold indexes is based on "group" variable.

However, is there any way to createFolds for repeatedcv in caret? So, I can have a balanced split for repeatedcv. Should I combined several createFolds and run trainControl?

trControl = trainControl(method = "cv", index = many_repeated_folds) 

Thanks!

1
I'm voting to close this question as off-topic because it is about how to use R without a reproducible example. - gung - Reinstate Monica
Thanks for the comment! @gung - whatsnext
I have added a reproducible example. What do you think? @gung - whatsnext

1 Answers

0
votes

createMultiFolds is probably what you are interested in.