0
votes

I'm reading this tutorial that combines PCA and then logistic regression in a pipeline and after then apply cross validation with a defined set of parameters for PCA and Logistic Regression. Here is what I understood from the example and then I will ask my question.

I understood:

When GridSearchCV is executed it first has a default of 3 folds. So it starts by computing PCA with 20 components and then transform the data and let it go into Logistic regression for training. Now for each of the values of the logistic regression C parameter it will apply 3 folds cross validation and see which values, thus will end up with 3*3=9 trainings for logistic regression because we have 3 values of C parameters and 3 folds of cross validation for each parameter value.

After that it will do the same with the second parameter for PCA which is 40, so other 9 trainings. And then also 9 trainings for the last parameter of PCA 64. So in total we will have 9 * 3 = 27 trainings for logistic regression.

My question: is my understanding correct for the procedure?

1

1 Answers

1
votes

Yes, this is entirely correct. You can easily check it by setting the grid search procedure in verbose mode:

>>> estimator = GridSearchCV(pipe, dict(pca__n_components=n_components,
...                                     logistic__C=Cs),
...                          verbose=1)
>>> estimator.fit(X_digits, y_digits)
Fitting 3 folds for each of 9 candidates, totalling 27 fits
[...snip...]

More generally, the number of fit calls is the product of the number of value per parameter, times k, +1 if you refit the best parameters on the full training set (which happens by default).