0
votes

suppose I have formatted the classification results of a model as the following:

actual.class score.actual.class
A            1
A            1
A            0.6
A            0.1
B            0.5
B            0.3
.            .
.            .

1-If I understand well, the ROC curve plots the trade-off between true positives and false positives. This implies that I need to be varying the score threshold for just one class(the true class) and not both, right? I mean if I pick A to be the true class here then I would use only the subset(results,actual.class="A") to plot the ROC curve?

2-What if I wanted to generated the curve manually (without libraries), are the thresholds going to be each possible score from that subset?

3-Are the following points generated correctly from the above data for the purposes of plotting the ROC curve? (I'm using class A as the true class)

threshold fpr tpr
1         1   0   
0.6       1/2 1/2 
0.1       1/4 3/4      
0         0   1

Are these the points that are going to form my ROC?

1

1 Answers

2
votes

"This implies that I need to be varying the score threshold for just one class(the true class) and not both, right?"

There seems to be a misunderstanding since there's no such thing as a separate threshold for positive or negative. ROC curves are used in the context of the evaluation of binary classification algorithms. In such algorithms, elements that don't belong to one type (TRUE) are automatically identified as elements of the other (FALSE).

The choice of the threshold may only shift the balance, such that more observations are assigned to one type rather than the other. This variation of the threshold is the parameter that allows to draw an ROC curve. Else it would be just one point.

Concerning your third point: Yes, as far as I can tell from your example I would say that this kind of data is what typically constitutes an ROC curve.