First, I split the dataset into train and test, for example:
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.4, random_state=999)
I then use GridSearchCV
with cross-validation to find the best performing model:
validator = GridSearchCV(estimator=clf, param_grid=param_grid, scoring="accuracy", cv=cv)
And by doing this, I have:
A model is trained using k-1 of the folds as training data; the resulting model is validated on the remaining part of the data (scikit-learn.org)
But then, when reading about Keras fit
fuction, the document introduces 2 more terms:
validation_split: Float between 0 and 1. Fraction of the training data to be used as validation data. The model will set apart this fraction of the training data, will not train on it, and will evaluate the loss and any model metrics on this data at the end of each epoch. The validation data is selected from the last samples in the x and y data provided, before shuffling.
validation_data: tuple (x_val, y_val) or tuple (x_val, y_val, val_sample_weights) on which to evaluate the loss and any model metrics at the end of each epoch. The model will not be trained on this data. validation_data will override validation_split.
From what I understand, validation_split
(to be overridden by validation_data
) will be used as an unchanged validation dataset, meanwhile hold-out set in cross-validation changes during each cross-validation step.
- First question: is it necessary to use
validation_split
orvalidation_data
since I already do cross validation? Second question: if it is not necessary, then should I set
validation_split
andvalidation_data
to 0 and None, respectively?grid_result = validator.fit(train_images, train_labels, validation_data=None, validation_split=0)
Question 3: If I do so, what will happen during the training, would Keras just simply ignore the validation step?
Question 4: Does the
validation_split
belong tok-1 folds
or thehold-out fold
, or will it be considered as "test set" (like in the case ofcross validation
) which will never be used to train the model.