16
votes

Trying to better understand how train(tuneLength = ) works in {caret}. My confusion happened when trying to understand some of the differences between the SVM methods from {kernlab} I've reviewed the documentation (here) and the caret training page (here).

My toy example was creating five models using the iris dataset. Results are here, and reproducible code is here (they're rather long so I didn't copy and paste them into the post).

From the {caret} documentation:

tuneLength
an integer denoting the amount of granularity in the tuning parameter grid. By default, this argument is the number of levels for each tuning parameters that should be generated by train. If trainControl has the option search = "random", this is the maximum number of tuning parameter combinations that will be generated by the random search. (NOTE: If given, this argument must be named.)

In this example, trainControl(search = "random") and train(tuneLength = 30), but there appears to be 67 results, not 30 (the maximum number of tuning parameter combinations)? I tried playing around to see if maybe there were 30 unique ROC values, or even ydim values, but by my count they're not.

For the toy example, I created the following table:

caret_SVM

Is there a way to see what's going on "under the hood"? For instance, M1 (svmRadial) and M3 (svmRadialSigma) both take, and are given, the same tune parameters, but based on calling $results appear to use them differently?

My understanding of train(tuneLength = 9) was that both models would produce results of sigma and C each with 9 values, 9 times since 9 is the number of levels for each tuning parameter (the exception being random search)? Similarly, M4 would be 9^3 since train(tuneLength = 9) and there are 3 tuning parameters?

Michael

1

1 Answers

18
votes

I need to update the package documentation more but the details are spelled on on the package web page for random search:

"The total number of unique combinations is specified by the tuneLength option to train."

However, this is particularly muddy SVMs using the RBF kernel. Here is a run down:

  • svmRadial tunes over cost and uses a single value of sigma based on kern lab's sigest function. For grid search, tuneLength is the number of cost values to test and for random search it is the total number of (cost, sigma) pairs to evaluate.
  • svmRadialCost is the same as svmRadial but sigest is run inside of each resampling loop. For random, search, it does not tune over sigma.
  • svmRadialSigma with grid search tunes over both cost and sigma. In a moment of sub-optimal cognitive performance, I set this up to try at most 6 values of sigma during grid search since I felt that cost space needed a wider range. For random search it does the same as svmRadial.
  • svmRadialWeight is the same as svmRadial but also considered class weights and is for 2-class problems only.

As for the SOM example on the webpage, well that's a bug. I over-sample the SOM parameter space since there needs to be a filter for xdim <= ydim & xdim*ydim < nrow(x). The bug is from me not keeping the right amount of parameters.