5
votes

I am asking a follow-up question as suggested from my previous post - Good ROC curve but poor precision-recall curve. I am only using the default setting with Python scikit-learn. It seems like the optimization is on AUC-ROC, but I am more interested in optimizing precision-recall. The following is my codes.

# Get ROC 
y_score = classifierUsed2.decision_function(X_test)
false_positive_rate, true_positive_rate, thresholds = roc_curve(y_test, y_score)
roc_auc = auc(false_positive_rate, true_positive_rate)
print 'AUC-'+ethnicity_tar+'=',roc_auc
# Plotting
ax1.plot(false_positive_rate, true_positive_rate, c=color, label=('AUC-'+ethnicity_tar+'= %0.2f'%roc_auc))
ax1.plot([0,1],[0,1], color='lightgrey', linestyle='--')
ax1.legend(loc='lower right', prop={'size':8})

# Get P-R pairs
precision, recall, prThreshold = precision_recall_curve(y_test, y_score)
# Plotting
ax2.plot(recall, precision, c=color, label=ethnicity_tar)
ax2.legend(loc='upper right', prop={'size':8})

Where and how do I insert python codes to change the setting so I can optimize the precision-recall?

1

1 Answers

3
votes

There are in fact two questions in your one:

  1. How to evaluate how good a precision-recall curve is in a single number?
  2. How to build a model as to maximize this number?

I will answer them in turn:

1. The measure of quality of precision-recall curve is average precision. This average precision equals the exact area under not-interpolated (that is, piecewise constant) precision-recall curve.

2. To maximize average precision, you can only tune hyperparameters of your algorithm. You can do it with GridSearchCV, if you set scoring='average_precision'. Or you can find optimal hyperparameters manually or with some other tuning technique.

This is generally impossible to optimize average precision directly (during the model fitting), but there are some exceptions. E.g. this article describes an SVM that maximizes average precision.