I'm plotting ROC curves for several classifiers and am stumped to find that the random forest classifier is outputting a perfect ROC curve (see below) when I'm only getting an accuracy score of 85% for class 0 and 41% for class 1 (class 1 is the positive value).
The actual y values are y=[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 1. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 1. 0. 1. 0. 1. 0. 0. 1. 1. 0. 0. 0. 1. 1. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 1. 0. 0. 0. 0. 1. 1. 1. 1. 1. 1. 0. 1. 1. 0. 1. 1. 1. 1. 1. 1. 1. 0. 1. 0. 0. 1. 1. 0. 1. 0. 0. 0. 0. 1. 1. 0. 0. 0. 1. 0. 0. 1. 0. 1. 1. 0. 0. 0. 0. 1. 0. 0. 1. 1. 0. 1. 1. 1. 1. 1. 1. 1. 1. 0. 1. 0. 1. 1. 0. 1. 0. 1. 1. 0. 0. 1. 1. 0. 1. 1. 0. 0. 0. 0. 1. 1. 1. 1. 1. 1. 0. 0. 1. 0. 0. 0. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 0. 0. 0. 1. 1. 0. 0. 1. 1. 0. 0. 0. 1. 0. 0. 0. 0. 1. 0. 0. 1. 1.]
and the predicted y values=[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 1. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 1. 0. 1. 0. 1. 0. 0. 1. 1. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 1. 1. 1. 1. 1. 1. 0. 1. 1. 0. 1. 1. 1. 1. 1. 1. 1. 0. 1. 0. 0. 1. 1. 0. 1. 0. 0. 0. 0. 0. 1. 0. 0. 0. 1. 0. 0. 1. 0. 1. 1. 0. 0. 0. 0. 1. 0. 0. 1. 1. 0. 1. 1. 1. 1. 1. 1. 1. 1. 0. 1. 0. 1. 1. 0. 1. 0. 1. 1. 0. 0. 1. 1. 0. 1. 1. 0. 0. 0. 0. 1. 1. 1. 1. 1. 1. 0. 0. 1. 0. 0. 0. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 0. 0. 0. 1. 1. 0. 0. 1. 1. 0. 0. 0. 1. 0. 0. 0. 0. 1. 0. 0. 1. 1.]
The predicted probabilities are=[ 0. 0.2 0. 0. 0. 0.2 0.1 0.4 0.2 0.9 0.9 0.4 0.9 0.2 0. 0. 0. 0. 0. 0. 0.1 0. 0.6 0. 0. 0.1 0. 0.1 0.7 0. 0. 0.1 0. 0.8 0.5 0.8 0. 1. 0.2 0. 0.9 0.9 0. 0. 0. 0.7 0.4 0. 0. 0.2 0. 0. 0. 0.6 0.1 0. 0. 0.1 0.2 0. 0. 0.1 0. 0.1 0.1 0. 0.1 0. 0. 0.1 0. 0. 1. 0. 0. 0. 0.4 0. 0. 0. 0. 1. 0.9 0.9 1. 0.9 1. 0.3 0.9 0.7 0.5 0.8 1. 0.9 0.9 1. 0.7 0.9 0. 0.8 0.2 0.2 0.8 0.9 0.3 0.7 0.3 0.1 0.1 0. 0.5 0.7 0. 0.2 0.1 0.7 0. 0.4 0.9 0.2 1. 0.8 0.1 0.1 0.1 0.3 1. 0.2 0.4 0.8 0.8 0.4 0.8 1. 0.9 0.9 0.8 0.7 1. 1. 0.2 0.7 0. 0.8 0.7 0.2 0.7 0.2 0.8 0.9 0.3 0.3 1. 1. 0.2 0.7 1. 0.3 0.2 0.2 0.1 0.8 0.8 0.9 0.9 1. 0.7 0. 0. 0.7 0.4 0.1 0.2 0.7 0.9 1. 1. 0.6 0.9 0.8 0.9 0.8 0.7 0.3 0. 0.2 1. 0.9 0. 0.1 0.6 0.8 0.1 0.1 0. 0.7 0.1 0.4 0. 0.2 0.6 0.1 0. 0.7 1. ]
Finally, the code for creating the ROC is:
#Lasso
final_logit=LogisticRegression(class_weight='balanced',penalty='l1',C=best_C)
final_logit.fit(x,y)
y_pred_lass=final_logit.predict_proba(x)
fpr_lass, tpr_lass, thresholds = metrics.roc_curve(y, y_pred_lass[:,1], pos_label=1)
roc_auc_lass = auc(fpr_lass, tpr_lass)
#Logistic
wlogit = LogisticRegression(class_weight='balanced')
wlogit.fit(x,y)
y_pred_logit=wlogit.predict_proba(x)
fpr_logit, tpr_logit, thresholds = metrics.roc_curve(y, y_pred_logit[:,1], pos_label=1)
roc_auc_logit = auc(fpr_logit, tpr_logit)
#Random forest
brf = RandomForest(class_weight='balanced')
brf.fit(x,y)
y_pred_brf=brf.predict_proba(x)
fpr_brf, tpr_brf, thresholds = metrics.roc_curve(y, y_pred_brf[:,1], pos_label=1)
roc_auc_brf = auc(fpr_brf, tpr_brf)
plt.figure(figsize=(7,7))
lw = 2
plt.plot(fpr_lass, tpr_lass, color='darkorange',
lw=lw, label='Lasso (area = %0.2f)' % roc_auc_lass)
plt.plot(fpr_logit, tpr_logit, color='red',
lw=lw, label='Logistic (area = %0.2f)' % roc_auc_logit)
plt.plot(fpr_brf, tpr_brf, color='green',
lw=lw, label='Random Forest (area = %0.2f)' % roc_auc_brf)
plt.plot([0, 1], [0, 1], color='black', lw=lw, linestyle='--')
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver Operating Characteristic')
plt.legend(loc="lower right")
plt.show()