R caret : Regroup several train object

Question

Suppose i'm doing seveal runs of the same model, but only with different complexity parameters, on the same (seed fixed) cross-validation with the caret package, for exemple :

library(caret)
data(iris)

# controls are the same for every models
c = trainControl(method = "cv",number=10,verboseIter = TRUE) 
d = iris # data is also the same 
f = Species ~ . # formula is also the same 
m = "rpart" # method is also the same 


set.seed(1234)
model1 <- train(form = f, data = d, trControl = c, method = m,
                tuneGrid = expand.grid(cp = c(0,0.5)))
set.seed(1234)
model2 <- train(form = f, data = d, trControl = c, method = m,
                tuneGrid = expand.grid(cp = c(0.1,0.2)))

set.seed(1234)
model3 <- train(form = f, data = d, trControl = c, method = m,
                tuneGrid = expand.grid(cp = c(0,0.5,0.1,0.2)))

Is there a way i could "build up" the model3 train object only from model1 and the model2 ? Calculations are long, and i did'nt ran all my different tuning in the same caret call. But having every run in the same train object will be much easier for comparing them (via the plot function, the update function, the resamples function, etc...)

I'm particularly looking for a way do do the same thing plot.train do but for all of them together.

Kim Kim · Accepted Answer · 2018-05-04T06:29:10

I perfectly understand your concern, because my computation sources are also very limited. However I would approach it as follows, instead of "building up" the model3 object.

Suppose what you wish to achieve is highest accuracy. Then you simply need to evaluate the following: which among the model1 and model2 do we see highest accuracy? Then we are only interested in choosing the best-result tuning parameter. For example, we see the following:

> model1$bestTune$cp
[1] 0
> model2$bestTune$cp
[1] 0.2
> model1$results$Accuracy ## Respectively for cp = 0.0 and cp = 0.5
[1] 0.9333 0.3333
> model2$results$Accuracy ## Respectively for cp = 0.1 and cp = 0.2
[1] 0.9267 0.9267

We would choose cp = 0.

Suppose you have broken things down to model1, model2, model3, ... and wish to explore all manually input parameter values using them.

k = 2 ## Here we only have model1 and model2 to compare
evaluate <- list()
for (i in 1:k) {
  model = eval(parse(text = paste0("model", i)))
  evaluate[["cp"]][[paste0("model", i)]] <- 
    model$finalModel$tuneValue$cp
  evaluate[["accuracy"]][[paste0("model", i)]] <- 
    model$results$Accuracy[[which(model$results$cp == model$bestTune$cp)]]
}

Then in our evaluate list, we have the following:

> evaluate
$cp
model1 model2 
   0.0    0.2 
$accuracy
model1 model2 
0.9333 0.9267

Upon this, we can do

> which(evaluate$accuracy == max(evaluate$accuracy))
model1 
     1 
> evaluate$cp[[which(evaluate$accuracy == max(evaluate$accuracy))]]
[1] 0

Now we can happily choose cp = 0 and we also know that the result from the optimal cp is stored in model1.

If you wish to still "build up" the model3, you can simply substitute some of the components (e.g. results in which AccuracySD, KappaSD, and such metrics would be stored) after having chosen what we evaluated as the best model---model1 in this case.

R caret : Regroup several train object

1 Answers