I'm using training data set (i.e., X_train, y_train) when tuning the hyperparameters of my model. I need to use the test data set (i.e., X_test, y_test) as a final check, to make sure my model isn't biased. I wrote
folds = 4
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=(1/folds), random_state=38, stratify=y)
clf_logreg = Pipeline(steps=[('preprocessor', preprocessing),
('model', LogisticRegression(solver='lbfgs', max_iter=100))])
cv = KFold(n_splits=(folds - 1))
scores_logreg = cross_val_score(clf_logreg, X_train, y_train, cv = cv)
and, to get f1-score,
cross_val_score(clf_logreg, X_train, y_train, scoring=make_scorer(f1_score, average='weighted'),
cv=cv)
This returns
scores_logreg: [0.94422311, 0.99335548, 0.97209302]
and for f1: [0.97201365, 0.9926906 , 0.98925453]
For checking the test, is it right to write
cross_val_score(clf_logreg, X_test, y_test, scoring=make_scorer(f1_score, average='weighted'), cv=cv) # not sure if it is ok to let cv
or maybe
predicted_logreg= clf_logreg.predict(X_test)
f1 = f1_score(y_test, predicted_logreg)
Value returned are different.