2
votes

I was trying to calculate the True positive rate and false positive rate and then plot the roc curve by hand, since I wanted to check the roc curve I got from sklearn.metrics roc_curve function. But the roc curve of fpr(on x-axis) vs tpr(on y axis) I'm getting seems like the axes have been interchanged. I'm doing a gradient descent binary classifier with two labels positive and negative. The relevant portion of the tensorflow code for the tpr, fpr calculation is given below:

prediction=tf.nn.softmax(tf.matmul(X,w)+b)
pred_pos= prediction.eval(feed_dict={X: x_pos})
pred_neg= prediction.eval(feed_dict={X: x_neg})
tpr=[]
fpr=[]
for j in range(100):
    pos=0
    neg=0
    n=j/100.
    for i in range(0,len(pred_pos)):
            if(pred_pos[i,1]>=n):
                pos+=1
            if(pred_neg[i,1]>=n):
                neg+=1
    tpr.append(pos/len(x_pos))
    fpr.append(neg/len(x_neg))

f= open('output.txt','wb')
arr=np.array([fpr,tpr])
arr=arr.T          
np.savetxt(f,arr,fmt=['%e','%e'])    
f.close()

I'm then plotting from the text file using gnuplot with fpr(x axis) and tpr(y axis), and I'm getting the plot attached. roc curve fpr vs tprThis is certainly not right. Why is this so? What am I doing wrong?

1
Your code seems reasonable; it really does look like you've swapped positives and negatives somewhere. Try outputting tpr and fpr to the terminal to make sure tpr is higher; if it is then the problem must be in your code for drawing the graph. Also note that you have fpr first in arr=np.array([fpr,tpr]).Stephen
@Stephen Thanks for your comment. Yes, I tried outputting that to the terminal, and tpr is lower than fpr. So that explains the graph. But I tried the same code for another network involving cnn and the roc curve looks just fine! tpr being greater than fpr. So I wonder what's wrong with this particular problemabhih1

1 Answers

1
votes

I found out the problem with the code. Instead of if(pred_pos[i,1]>=n): it should be if(pred_pos[i,0]>=n):(similarly for the pred_neg), and then it gives the right values for fpr and tpr, with tpr being greater. This was because the labelling of the data was done as [1,0] for positives and [0,1] for negatives. So the determining bit was at the 0th position of the prediction array.