2
votes

Using scikit-learn version 0.22.1 in jupyterlab. I can't provide a minimal reproducible example, however, hoping this is alright because it is more of a conceptual question.

I'm building a classification model. I have my features in X and my target variable in y. I fit a logistic regression model and calculate predictions:

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

from sklearn.linear_model import LogisticRegression
logmodel = LogisticRegression(solver='liblinear')
logmodel.fit(X_train, y_train)

predictions = logmodel.predict(X_test)

Now I want to view the confusion matrix, accuracy score, precision score, and recall score. So I run the following:

from sklearn.metrics import confusion_matrix, accuracy_score, precision_score, recall_score
print(f"Confusion matrix: \n {confusion_matrix(y_test, predictions)}")
print(f"Accuracy: \t {accuracy_score(y_test, predictions):.2%}")
print(f"Precision: \t {precision_score(y_test, predictions):.3f}")
print(f"Recall: \t {recall_score(y_test, predictions):.3f}")

>> Confusion matrix:
>> [[128838     54]
>>  [  8968    279]]
>> Accuracy:    93.47%
>> Precision:   0.838
>> Recall:      0.030

The recall score should be TP / (TP + FN) = 128838 / (128838 + 8968) = 0.934923008. Why is sklearn giving me 0.03 for the recall? Am I miscalculating, or does recall_score work differently than I'm expecting?

Edit: accidentally typed TP / (TP+FP) instead of above. Corrected.

1
First off, TP / (TP + FP) is precision, not recall. Recall is TP / (TP + FN). Second, you may have the wrong order on your confusion matrix labels. From the sklearn docs, check your labels with: list(zip(['tn', 'fp', 'fn', 'tp'],np.array([[128838, 54], [ 8968, 279]]).ravel())) - G. Anderson

1 Answers

2
votes

You are computing the recall for class 0.

The recall here (which, by the way, you are confusing with Precision) is R = 279/(279+8968) = 0.03

and the precision is P = 279/(279+54) = 0.83

the matrix is here is

---------------------------
|      x    |true 0  |true 1|
---------------------------
|predicted 0| 128838 |  8968|
|predicted 1|   54   |  279 |

meaning that:

  • TP = 279

  • FP = 54

  • FN = 8968

  • TN = 128838

and not the other way around.