Why K-fold cross validation will built K+1 models?

Question

I have read the general step for K-fold cross validation under https://machinelearningmastery.com/k-fold-cross-validation/

It describe the general procedure is as follows:

Shuffle the dataset randomly.
Split the dataset into k groups (folds)
For each unique group:Take the group as a hold out or test data set
Take the remaining groups as a training data set Fit a model on the training set and evaluate it on the test set
Retain the evaluation score and discard the model
Summarize the skill of the model using the sample of model evaluation scores

So if it is K-fold then K models will be built, right? But why I read from the following link from H2O which is saying it built K+1 models?

https://github.com/h2oai/h2o-3/blob/master/h2o-docs/src/product/tutorials/gbm/gbmTuning.ipynb

Nitpick: it is K-fold, not K folder. Like "twofold" or "tenfold", but with K instead of two/ten. Nothing to do with folders. — Amadan
When editing your question after an answer has been added, it's good practice to indicate so, as well as to leave a comment to the respondent; it would also be nice to indicate where exactly such a reference is made (the linked document is rather long), or even better quote it here... In any case, the answer should already have resolved your question, so kindly accept it (or leave specific feedback if it has not) - thanks... — desertnaut
thanks for your suggestion. I have added exactly where I have read the statement for K + 1 — Gavin
Pls see comment above; "exactly" is an exaggeration - you have linked to a rather huge document, and I just couldn't find the exact reference... — desertnaut

desertnaut desertnaut · Accepted Answer · 2018-10-04T08:39:18

Arguably, "I read somewhere else" is too vague a statement (where?), because context does matter.

Most probably, such statements refer to some libraries which, by default, after finishing the CV proper procedure, go on to build a model on the whole training data using the hyperparameters found by CV to give best performance; see for example the relevant train function of the caret R package, which, apart from performing CV (if requested), returns also the finalModel:

finalModel

A fit object using the best parameters

Similarly, scikit-learn GridSearchCV has also a relevant parameter refit:

refit : boolean, or string, default=True

Refit an estimator using the best found parameters on the whole dataset.

[...]

The refitted estimator is made available at the best_estimator_ attribute and permits using predict directly on this GridSearchCV instance.

But even then, the models fitted are almost never just K+1: when you use CV in practice for hyperparameter tuning (and keep in mind that there there are other uses, too, for CV), you will end up fitting m*K models, where m is the length of your hyperparameters combination set (all K-folds in a single round are run with one single set of hyperparameters).

In other words, if your hypeparameter search grid consists of, say, 3 values for the no. of trees and 2 values for the tree depth, you will fit 2*3*K = 6*K models during the CV procedure, and possibly +1 for fitting your model at the end to the whole data with the best hyperparameters found.

So, to summarize:

By definition, each K-fold CV procedure consists of fitting just K models, one for each fold, with fixed hyperparameters across all folds
In case of CV for hyperparameter search, this procedure will be repeated for each hyperparameter combination of the search grid, leading to m*K fits
Having found the best hyperparameters, you may want to use them for fitting the final model, i.e. 1 more fit

leading to a total of m*K + 1 model fits.

Hope this helps...

Why K-fold cross validation will built K+1 models?

1 Answers