How to compute false positive rate of an imbalanced dataset for Stratified K fold cross validation?

Question

The below lines are the sample code where I am able to compute accuracy, precision, recall, and f1 score. How can I also compute a false positive rate (FPR) for Stratified K fold cross-validation? Is there any way to find FPR too?

from sklearn.metrics import make_scorer, accuracy_score, precision_score, recall_score, 
f1_score
from sklearn.model_selection import StratifiedKFold
from sklearn.model_selection import cross_validate

scoring = {'accuracy' : make_scorer(accuracy_score), 
       'precision' : make_scorer(precision_score),
       'recall' : make_scorer(recall_score), 
       'f1_score' : make_scorer(f1_score)}
skfold = StratifiedKFold(n_splits=10)
dt_clf = DecisionTreeClassifier()

results = cross_validate(estimator=dt_clf,
                      X=data_train_X,
                      y=target_train_Y,
                      cv=skfold,
                      scoring=scoring)
print("Results", results)

Flavia Giammarino Flavia Giammarino · Accepted Answer · 2021-11-14T14:42:34

You could define a custom scorer as follows:

from sklearn.metrics import make_scorer, accuracy_score, precision_score, recall_score, f1_score
from sklearn.model_selection import StratifiedKFold, cross_validate
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import make_classification

def false_positive_rate(y_true, y_pred):

    # false positive
    fp = ((y_pred == 1) & (y_true == 0)).sum()

    # true negative
    tn = ((y_pred == 0) & (y_true == 0)).sum()

    # false positive rate
    return fp / (fp + tn)

scoring = {
    'accuracy': make_scorer(accuracy_score),
    'precision': make_scorer(precision_score),
    'recall': make_scorer(recall_score),
    'f1_score': make_scorer(f1_score),
    'false_positive_rate': make_scorer(false_positive_rate),
}

skf = StratifiedKFold(n_splits=3)

clf = DecisionTreeClassifier(random_state=42)

X, y = make_classification(random_state=42)

results = cross_validate(estimator=clf, X=X, y=y, cv=skf, scoring=scoring)

print(results['test_false_positive_rate'])
# [0.11764706 0.11764706 0.0625]

How to compute false positive rate of an imbalanced dataset for Stratified K fold cross validation?

1 Answers