This is a questions of understanding. Suppose I want to do nested cross-validation (e.g. outer:5 x inner:4) and use sequential optimization to find the best set of parameters. Tuning parameters happens in the inner loop. When doing a normal grid search, I train on three folds and test on 1 fold of the inner loop for each combination of hyperparameters and then choose the best set of parameters. The hyperparameter combination of the inner loop is then trained and evaluated on the new test folds of the outer loop in a similar way as in the inner loop.
But since it is a grid search, all the parameters are a priori known. How are the new set of parameters determined when using sequential optimization? Do the newly suggested points depend on the previously evaluated points, averaged over all inner folds? But that seems intuitively wrong to me since it is like comparing apples and oranges. I hope my question is not too confusing.