MLR: How exactly is the process when using sequential optimization in nested resampling?

Question

This is a questions of understanding. Suppose I want to do nested cross-validation (e.g. outer:5 x inner:4) and use sequential optimization to find the best set of parameters. Tuning parameters happens in the inner loop. When doing a normal grid search, I train on three folds and test on 1 fold of the inner loop for each combination of hyperparameters and then choose the best set of parameters. The hyperparameter combination of the inner loop is then trained and evaluated on the new test folds of the outer loop in a similar way as in the inner loop.

But since it is a grid search, all the parameters are a priori known. How are the new set of parameters determined when using sequential optimization? Do the newly suggested points depend on the previously evaluated points, averaged over all inner folds? But that seems intuitively wrong to me since it is like comparing apples and oranges. I hope my question is not too confusing.

If you're doing grid search, you don't need sequential optimization. The two are different ways of doing the same thing. — Lars Kotthoff
@LarsKotthoff Thank you for your reply. That's not what I am trying. I understand how grid search works in the setting of a nested cv. However, it's not clear to me how the optimization path is chosen in a nested cv when doing sequential optimization. — Patrick Balada
At each iteration of the optimization, the parameter setting with the largest expected improvement (by default) is chosen based on the predictions of the surrogate model. All previous parameter settings and their evaluations are taken into account to do this. In a cross-validation, the performance attached to a parameter setting is the mean across the folds. — Lars Kotthoff

pat-s pat-s · Accepted Answer · 2020-01-10T14:16:56

I think you might have a misunderstand of the term "sequential optimization" here.

It can mean two things, depending on the context:

In a tuning context, this term is sometimes used as a synonym for "forward feature selection" (FFS). In this case, no grid search is done. Variables of the dataset are added sequentially to the model to see if a better performance is achieved.
When you use that term while doing a "grid search", you most likely just mean that the process is running sequentially (i.e. on one core, one setting at a time). The counterpart to this would be "parallel grid search" where you evaluate the predefined grid choices at the same time using multiple cores.

MLR: How exactly is the process when using sequential optimization in nested resampling?

1 Answers