19
votes

I am trying to build model using train function from caret package:

 model <- train(training$class ~ .,data=training, method = "nb")

Training set contains about 20K observations, each observation has above 100 variables. I would like to know if building a model from that dataset will take hours or days.

How to estimate time needed to train model from data? How track a progress of training process when using functions from caret package?

1

1 Answers

38
votes

Assuming that you are training the model with

  • an expanded grid of tuning parameters (all combinations of the tuning parameters)
  • and a resampling technique of your choice (cross validation, bootstrap etc)

You could set

trainctrl <- trainControl(verboseIter = TRUE)

and set it in the trControl argument of the train function to track the training progress

model <- train(training$class ~ .,data=training, method = 'nb', trControl = trainctrl)

This prints out the progress out to the console at each resampling stage, and allows you to gauge the progress of the training/parameter tuning.

To estimate the total running time, you could run the model once to see how long it runs, and estimate the total time by multiplying accordingly based on your resampling scheme and number of parameter combinations. This can be done by setting the trainControl again, and setting the tuneLength to 1:

trainctrl <- trainControl(method = 'none')
model <- train(training$class ~ ., data = training, method = 'nb', trControl = trainctrl, tuneLength = 1)

Hope this helps! :)