Arguably, "I read somewhere else" is too vague a statement (where?), because context does matter.
Most probably, such statements refer to some libraries which, by default, after finishing the CV proper procedure, go on to build a model on the whole training data using the hyperparameters found by CV to give best performance; see for example the relevant train
function of the caret
R package, which, apart from performing CV (if requested), returns also the finalModel
:
finalModel
A fit object using the best parameters
Similarly, scikit-learn GridSearchCV has also a relevant parameter refit
:
refit : boolean, or string, default=True
Refit an estimator using the best found parameters on the whole dataset.
[...]
The refitted estimator is made available at the best_estimator_
attribute and permits using predict
directly on this GridSearchCV
instance.
But even then, the models fitted are almost never just K+1: when you use CV in practice for hyperparameter tuning (and keep in mind that there there are other uses, too, for CV), you will end up fitting m*K
models, where m
is the length of your hyperparameters combination set (all K-folds in a single round are run with one single set of hyperparameters).
In other words, if your hypeparameter search grid consists of, say, 3 values for the no. of trees and 2 values for the tree depth, you will fit 2*3*K = 6*K
models during the CV procedure, and possibly +1 for fitting your model at the end to the whole data with the best hyperparameters found.
So, to summarize:
By definition, each K-fold CV procedure consists of fitting just K models, one for each fold, with fixed hyperparameters across all folds
In case of CV for hyperparameter search, this procedure will be repeated for each hyperparameter combination of the search grid, leading to m*K
fits
Having found the best hyperparameters, you may want to use them for fitting the final model, i.e. 1 more fit
leading to a total of m*K + 1
model fits.
Hope this helps...