2
votes

I am working on a multiclassification project and I noticed that no matter what classifier I run the precision and recall are the same within a model.

The classification problem has three distinct classes. The volume of the data is rather on the small side with 13k instances divided into test (0.8) and train (0.2).

Training data has a shape of (10608, 28) and labels are the shape of (10608, 3) (binarized label).

The classification is imbalanced:

  • label 0 represents 30% of all labels
  • label 1 represents 4% of all labels
  • label 2 represents 66% of all labels.

I am comparing different classifiers, to later focus on the most promising ones. While calculating precision and recall for each model I noticed that they are always the same within a model.

Due to how precision and recall are calculated they can be the same when the number of false-negative predictions equals the number of false-positive predictions FP = FN.

enter image description here

Examples:

SGD classifier

sgd_clf = OneVsRestClassifier(SGDClassifier(random_state=42))
sgd_clf.fit(data_tr, labels_tr)
y_pred_sgd = cross_val_predict(sgd_clf, data_tr, labels_tr, cv=5) 
cm_sgd = confusion_matrix(labels_tr.argmax(axis=1), y_pred_sgd.argmax(axis=1))  

cm_sgd:
array([[1038,   19, 2084],
       [ 204,   22,  249],
       [ 931,   48, 6013]], dtype=int64)
precision_score(labels_tr.argmax(axis=1), y_pred_sgd.argmax(axis=1), average="micro")  
0.666760935143288
recall_score(labels_tr.argmax(axis=1), y_pred_sgd.argmax(axis=1), average="micro") 
0.666760935143288

FP=FN=3535

Logistic regression

lr_clf = OneVsRestClassifier(LogisticRegression(random_state=42, max_iter=4000))
lr_clf.fit(data_tr, labels_tr)
y_pred_lr = cross_val_predict(lr_clf, data_tr, labels_tr, cv=5)
cm_lr = confusion_matrix(labels_tr.argmax(axis=1), y_pred_lr.argmax(axis=1))

cm_lr: 
array([[ 982,    1, 2158],
       [ 194,    7,  274],
       [ 774,    9, 6209]], dtype=int64)

precision_score(labels_tr.argmax(axis=1), y_pred_lr.argmax(axis=1), average="micro") 
0.6785444947209653
recall_score(labels_tr.argmax(axis=1), y_pred_lr.argmax(axis=1), average="micro") 
0.6785444947209653


FP=FN=3410

Random forest

rf_clf = OneVsRestClassifier(RandomForestClassifier(random_state=42))
rf_clf.fit(data_tr, labels_tr)
y_pred_forest = cross_val_predict(rf_clf, data_tr, labels_tr, cv=5)
cm_forest = confusion_matrix(labels_tr.argmax(axis=1), y_pred_forest.argmax(axis=1)) 

cm_forest: 
array([[1576,   56, 1509],
      [ 237,   45,  193],
      [1282,   61, 5649]], dtype=int64)
precision_score(labels_tr.argmax(axis=1), y_pred_forest.argmax(axis=1), average="micro")
0.6853318250377074
recall_score(labels_tr.argmax(axis=1), y_pred_forest.argmax(axis=1), average="micro")  
0.6853318250377074

FP=FN=3338

How likely is it that all the models have the same recall and precision within a model? Am I missing something?

1

1 Answers

4
votes

This is happening because you are calculating the micro average of your scores. In the docs, it is described as:

Calculate metrics globally by counting the total true positives, false negatives and false positives.

Now here is the catch: in classification tasks where every test case is guaranteed to be assigned to exactly one class, computing a micro average is equivalent to computing the accuracy score. This is why you get the same result for precision and recall in each model: you are basically computing the accuracy in all cases.

You can verify this by using accuracy_score and comparing the results.

As a consequence, you should better evaluate the precision and recall of your models with either macro or weighted average instead.