Thresholds decided when using precision recall-curve

Question

I have been working with precision-recall curves and have trouble understanding how thresholds are decided.

This is my code:

import sklearn
precision, recall, thresholds = sklearn.metrics.precision_recall_curve(y_test,
                             probas_pred[:,1], pos_label=1, sample_weight=None)

which yields

precision = array([ 0.99971396, 1. , 1. , 1. , 1. , 1. , 1. ])
recall = array([ 1. , 0.99885551, 0.99341917, 0.96852647, 0.88898426, 0.70872675, 0. ])
thresholds = array[ 0.5,  0.6,  0.7,  0.8,  0.9,  1. ])

If I do np.unique(probas_pred[:,1]) (Random forest, high class imbalance) I get the following thresholds:

thresholds_probas_pred = array([ 0., 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.])

I thought that the precision recall curve plots Precision and Recall for all unique values in the probas_pred array. In this case the threshold values returned by precision-recall curve seems to ignore values less than 0.5. Could someone please explain ?

Thanks!

Are you sure that you do not have 6 or less unique probas_pred? The size of the threshold array in precision-recall curve follows the following rule: shape = [n_thresholds <= len(np.unique(probas_pred))] which is not consistent with the number of unique probabilities you are reporting in your question. — MhFarahani
n_thresholds = 6 in this case is less than len( unique(probas_pred)) =11, so the equation holds. — PriS

Ben2018 Ben2018 · Accepted Answer · 2018-06-06T15:44:50

The reason is that at threshold=0.5, recall already reach 1. In other word, for probas_pred<0.5, y_test is all zero. Further reducing threshold, recall will remain 1.

Thresholds decided when using precision recall-curve

1 Answers