sklearn.metrics.precision_recall_curve: Why are the precision and recall returned arrays instead of single values

Question

I am calculating the precisions and recall for off-the-shelf algorithms on a dataset that I recently prepared.

It is a binary classification problem and I am looking to calculate the precision, recall and the f-scores for each of the classifier I built.

test_x, test_y, predics, pred_prob,score = CH.buildBinClassifier(data,allAttribs,0.3,50,'logistic')

The build classifier method basically builds a classifier, fits a training data and returns test_x(the features of the test data), test_y(the ground truth labels), predict(predictions made by the classifier), red_prob(prediction probabilities from the LogisiticRegression.predict_proba method).

Below is the code for calculating precision-recall:

from sklearn.metrics import precision_recall_curve

pr, re, _ = precision_recall_curve(test_y,pred_prob,pos_label=1)
pr
(array([ 0.49852507,  0.49704142,  0.49554896,  0.49702381,  0.49850746,
         0.5       ,  0.5015015 ,  0.50301205,  0.50453172,  0.50606061,
         . . . . . . . 
         0.875     ,  1.        ,  1.        ,  1.        ,  1.        ,
         1.        ,  1.        ,  1.        ,  1.        ])
re
array([ 1.        ,  0.99408284,  0.98816568,  0.98816568,  0.98816568,
         0.98816568,  0.98816568,  0.98816568,  0.98816568,  0.98816568,
         . . . . . . . 
         0.04142012,  0.04142012,  0.03550296,  0.0295858 ,  0.02366864,
         0.01775148,  0.01183432,  0.00591716,  0.        ]))

I do not understand why are precision and recall arrays? Shouldn't they be just single numbers?

Since precision is calculated as tpf/(tpf+fpf) and similarly recall as definition?

I am aware about calculating the average precision-recall by the following piece of code, but somehow seeing arrays instead of tpf, fpf, precision and recall is making me wonder what is going on.

from sklearn.metrics import precision_recall_fscore_support as prf

precision,recall,fscore,_ = prf(test_y,predics,pos_label=1,average='binary')

Edit: But without the average and pos_label parameter it reports the precisions for each of the class. Could someone explain the difference between the outputs of these two methods?

you asked for precision_recal_curve, and a curve is a sequence of points. If you want single values (actual precision and recall) - call precision and recall functions. Curve is used to visualize dependency on the threshold parameter. — lejlot
Thank you. But could you clarify the relevance of these values? Do they correspond to each individual prediction? — Sreejith Menon
Thank you @BrenBarn. I am wondering if you could point me to the right direction of what different thresholds mean? A way in which I can relate the numbers I see to the data or the decision. — Sreejith Menon
See for instance this Wikipedia article. The curve plotted there is not the precision-recall curve but the concept is similar. Basically, when deciding which category an item falls into, the model creates some sort of "likelihood score" that represents how likely it is to be in category B. To make the binary decision, you set some threshold value and label that all items above that threshold as B. By setting a high threshold you can avoid false positives at the cost of increased false negatives, and so on. — BrenBarn

Codedorf Codedorf · Accepted Answer · 2017-04-25T04:54:01

From the sklearn documentation for precision_recall_curve:

Compute precision-recall pairs for different probability thresholds.

Classifier models like logistic regression do not actually output class labels (like "0" or "1"), they output probabilities (like 0.67). These probabilities tell you the likelihood that the input sample is of a particular class, like the positive ("1") class. But you still need to choose a probability threshold so that the algorithm can convert the probability (0.67) into a class ("1").

If you choose a threshold of 0.5, then all input samples with calculated probabilities greater than 0.5 will be assigned to the positive class. If you choose a different threshold and you get a different number of samples assigned to the positive and negative class, and therefore different precision and recall scores.

sklearn.metrics.precision_recall_curve: Why are the precision and recall returned arrays instead of single values

2 Answers