Plotting Multiple ROC curves, or an average one from multi class labels (multinomial regression)

Question

I have a data set with multiple discrete labels, say 4,5,6. On this I run the ExtraTreesClassifier (I will also run Multinomial logit afterword on the same data, this is just a short example) as below. :

from sklearn.ensemble import ExtraTreesClassifier
from sklearn.metrics import roc_curve, auc

clf = ExtraTreesClassifier(n_estimators=200,random_state=0,criterion='gini',bootstrap=True,oob_score=1,compute_importances=True)
# Also tried entropy for the information gain
clf.fit(x_train, y_train)
#y_test is test data and y_predict is trained using ExtraTreesClassifier
y_predicted=clf.predict(x_test)

fpr, tpr, thresholds = roc_curve(y_test, y_predicted,pos_label=4) # recall my labels are 4,5 and 6
roc_auc = auc(fpr, tpr)
print("Area under the ROC curve : %f" % roc_auc)

The question is - is there something like a average ROC curve - basically I could add up all the tpr & fpr seperately for EACH label value and then take means (will that make sense by the way?) - and then just call

# Would this be statistically correct, and would mean something worth interpreting?
roc_auc_avearge = auc(fpr_average, tpr_average)
print("Area under the ROC curve : %f" % roc_auc)

I am assuming, I will get something similar to this afterword - but how do I interpret thresholds in this case ? How to plot a ROC curve for a knn model

Hence, please also mention if I can/should get individual thresholds in this case and why would one approach be better(statistically) over the other?

What I tried so far (besides averaging):

On changing pos_label = 4 , then 5 & 6 and plotting the roc curves, I see very poor performance, even lesser than the y=x (perfectly random and tpr=fpr case) How should I approach this problem ?

Found this paper that describes multiple usecases to solve similar problem ccrma.stanford.edu/workshops/mir2009/references/ROCintro.pdf "Introduction to ROC graphs : Tom Fawcett " — ekta
Another possibility is Grid search to compute thresholds that give the maximum tpr & least fpr. More here, stackoverflow.com/questions/13370570/… and docs.scipy.org/doc/scipy/reference/generated/… — ekta

Calimo Calimo · Accepted Answer · 2014-01-27T09:55:31

ROC curve averaging has been proposed by Hand & Till in 2001. They basically compute the ROC curves for all comparison pairs (4 vs. 5, 4 vs. 6 and 5 vs. 6) and average the result.

When you compute the ROC curve with pos_label=4, you implicitly say that the other labels are the negatives (5 and 6). Note that this is slightly different from what was proposed by Hand & Till.

A few notes:

You should make sure that your classifier was trained in a way that makes sense with your ROC analysis. If you say pos_label=5 in the roc_curve, and your classifier was train to recognize 5 as intermediate between 4 and 6, you will for sure get nothing useful here
If you get AUC < 0.5, it means you are looking at it in the wrong way (and you should reverse your predictions)
In general ROC analysis is useful for a binary classification. Whether it makes sense for multiclass problems is case-dependant, and it might not be the case for you.

Plotting Multiple ROC curves, or an average one from multi class labels (multinomial regression)

1 Answers