I am making a binary classifier with unbalanced classes (of ratio 1:10). I tried KNN, RFs, and XGB classifier. I am getting the best precision-recall tradeoff and F1 score among them from XGB classifer(perhaps because size of dataset is very less - (1900,19))
So after checking error plots for XGB, i decided to go for RandomizedSearchCV() from sklearn for parameter tuning of my XGB classifier. Based on another answer on stackexchange, this is my code :
from xgboost import XGBClassifier
from sklearn.model_selection import RandomizedSearchCV, StratifiedKFold
score_arr = []
clf_xgb = XGBClassifier(objective = 'binary:logistic')
param_dist = {'n_estimators': [50, 120, 180, 240, 400],
'learning_rate': [0.01, 0.03, 0.05],
'subsample': [0.5, 0.7],
'max_depth': [3, 4, 5],
'min_child_weight': [1, 2, 3],
'scale_pos_weight' : [9]
}
clf = RandomizedSearchCV(clf_xgb, param_distributions = param_dist, n_iter = 25, scoring = 'precision', error_score = 0, verbose = 3, n_jobs = -1)
print(clf)
numFolds = 6
folds = StratifiedKFold(n_splits = numFolds, shuffle = True)
estimators = []
results = np.zeros(len(X))
score = 0.0
for train_index, test_index in folds.split(X_train, y_train):
print(train_index)
print(test_index)
_X_train, _X_test = X.iloc[train_index,:], X.iloc[test_index,:]
_y_train, _y_test = y.iloc[train_index].values.ravel(), y.iloc[test_index].values.ravel()
clf.fit(_X_train, _y_train, eval_metric="error", verbose=True)
estimators.append(clf.best_estimator_)
results[test_index] = clf.predict(_X_test)
score_arr.append(f1_score(_y_test, results[test_index]))
score += f1_score(_y_test, results[test_index])
score /= numFolds
So RandomizedSearchCV actually selects the classifier and then in kfolds it got fit and predict result on the validation set. Note that i have given X_train and y_train in kfolds split, so that i have a seperate test dataset for testing the final algorithm.
Now, the problem is, if you actually looks the f1-score in each kfold iteration, it is like this score_arr = [0.5416666666666667, 0.4, 0.41379310344827586, 0.5, 0.44, 0.43478260869565216] .
But when I test clf.best_estimator_ as my model, on my test dataset, it gives f1-score of 0.80 and with {'precision': 0.8688524590163934, 'recall': 0.7571428571428571} precision and recall.
How come my score while validation is low and what has happened now on testset? Is my model correct or Did i missed something?
P.S. - Taking the parameters of clf.best_estimator_, i fitted them seperately on my training data using xgb.cv, then also the f1-score was near 0.55. I think this might be due to differences between training approaches of RandomizedSearchCV and xgb.cv. Please tell me if plots or more info needed.
Update : I am attaching error plots of train and test aucpr and classification accuracyfor the generated model. The plot is generated by running model.fit() only once (justifying the values of score_arr).

score_arr- 7bStan