Is it possible that Precision-Recall curve or a ROC curve is a horizontal line?

Question

I am working on a binary classification task on imbalanced data.

Since the accuracy is not so meaningful in this case. I use Scikit-Learn to compute the Precision-Recall curve and ROC curve in order to evaluate the model performance.

But I found both of the curves would be a horizontal line when I use Random Forest with a lot of estimators, it also happens when I use a SGD classifier to fit it.

The ROC chart is as following:

enter image description here

And the Precision-Recall chart:

enter image description here

Since Random Forest behaves randomly, I don't get a horizontal line in every run, sometimes I also get a regular ROC and PR curve. But the horizontal line is much more common.

Is this normal? Or I made some mistakes in my code?

Here is the snippet of my code:

classifier.fit(X_train, Y_train)
try:
    scores = classifier.decision_function(X_test)
except:
    scores = classifier.predict_proba(X_test)[:,1]

precision, recall, _ = precision_recall_curve(Y_test, scores, pos_label=1)
average_precision = average_precision_score(Y_test, scores)

plt.plot(recall, precision, label='area = %0.2f' % average_precision, color="green")
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('Recall')
plt.ylabel('Precision')
plt.title('Precision Recall Curve')
plt.legend(loc="lower right")
plt.show()

it looks a bit too good to be ture. :-) Could you please upload your sample data file via dropbox sharelink or google driver? — Jianxun Li
Take the time and think about what the plots actually tell you. You basically performed perfect predictions on the test set. Is this normal? No. Often problems tackled with machine learning techniques are much harder. Perfect predictions are usually not possible. Or did I make some mistakes in my code? In your code? Probably not. In your testing? Maybe. We don't know. I would suggest trying a cross validation instead. Maybe your problem is very easy to learn. Maybe your test set is problematic. A cross validation will show that. — cel
Thank you guys! It is really helping. I will try cross validation. I will upload the data if I still can't get regular curves. — Jim GB
Cel: It is truly the problem of the selection of testing data. I happen to choose an easy set of testing. That's why I got a horizontal line. Thank you! — Jim GB

Andreus Andreus · Accepted Answer · 2015-07-14T21:58:40

Yes, you can. If you perfectly separate the data into two piles, then you go vertically from zero to 1 true-positive-rate without any false positives (the vertical line) as your threshold passes over your pile of true positives, then from 0 to 1 false-positive-rate as your threshold passes over your pile of true negatives.

If you can get the same ROC curve from a test set, you are golden. If you can get the same ROC curve evaluated on 5 different k-fold cross validation test sets, you are platinum.

Is it possible that Precision-Recall curve or a ROC curve is a horizontal line?

3 Answers