ROC curve says my predictions are worse than random but my confusion matrix says otherwise

Question

I am working with data to classify handwritten numbers from 0 to 9. I am using PCA to reduce the dimensionality to 6 principal components and KNN to model the data.

When I created the confusion matrix, I got reasonable answers out. It wasn't perfect, wasn't expecting it to be, but it made sense considering the accuracy of ~0.8885 for my k-value.

array([[ 952,    0,    2,    1,    0,    9,    9,    0,    7,    0],
       [   0, 1125,    0,    3,    0,    0,    5,    1,    1,    0],
       [   7,    5,  973,   11,    4,    2,    9,    3,   18,    0],
       [   4,    9,   15,  846,    2,   40,    2,    7,   82,    3],
       [   3,    4,    9,    6,  830,    5,   16,   11,    0,   98],
       [  23,    1,    9,   38,    9,  787,    9,    2,   10,    4],
       [  17,    8,   16,    2,   13,    9,  893,    0,    0,    0],
       [   2,   14,   13,    3,   54,    4,    0,  909,    6,   23],
       [  16,    2,   25,   60,   23,   23,    4,    6,  802,   13],
       [  11,    5,    7,   16,  155,   15,    4,   21,    7,  768]],
      dtype=int64)

However, when I try and plot the ROC Curve I either get 3 points outputted to fpr and tpr and the curve seems abnormally high. I was sure I needed more points so I tried changing my approach to computing the roc_curve, but now I get obscenely low results from my curve that don't make sense to my confusion matrix. It seems like the ROC's just increase in accuracy as I go down the list of classes to check.

I was wondering what I could be doing wrong in my ROC computation.

accuracy = 0;
predicted_class = np.zeros((np.size(y_test),1))
knn = KNeighborsClassifier(n_neighbors = 3)
knn.fit(test_projected,y_test)
for i in range (0,np.size(test_projected[:,0])):
    query_point = test_projected[i,:]
    true_class_of_query_point = y_test[i]

    predicted_class[i] = knn.predict([query_point])
    if(predicted_class[i] == true_class_of_query_point):
        accuracy += 1;
print('Accuracy of k = 3 is ', accuracy/np.size(test_projected[:,0]), '\n')

fig,axs = plt.subplots(5,2,figsize=(15,15))
fig.tight_layout()
j = 0;
k = 0;
y_gnd = np.zeros((10000,1))
for i in range (0,10):
    for m in range(0,10000):
        if(y_test[m]==i):
            y_gnd[m] = 1
        else:
            y_gnd[m] = 0
    fpr,tpr,threshold = metrics.roc_curve(y_gnd,predicted_class)
    auc = metrics.roc_auc_score(y_gnd,predicted_class)

Also, are the inputs to the roc_auc_score supposed to be fpr and tpr? I have seen both the labels and predictions as inputs as well as fpr and tpr.

    axs[j][k].plot(fpr,tpr)
    axs[j][k].set_title('AUC Score for ' +str(i)+ ' is = ' +str(auc)+ '.')

    if(k == 1):
        j += 1;
    k += 1;
    if(k > 1):
        k = 0;

Edit: New ROC Curves using predict_proba for predicted class

pred = knn.predict_proba(test_projected)
fpr,tpr,threshold = metrics.roc_curve(y_gnd,pred[:,i])
auc = metrics.roc_auc_score(y_gnd,pred[:,i])

you should use predict_proba rather than predict to get the class probabilities which will then be used by both roc_curve and roc_auc_score. your plots (I believe) consider the predicted class labels as the non-thresholded prediction scores (which they are not). — sim
@sim That helped me get back to a more reasonable curve, but I feel they are still too high. I am getting 0.9866 as my lowest AUC value, which seems like there's something still going awry. I am getting 5 pts in my fpr, tpr now as opposed to 3. — Chris_skelton

Fabian Fabian · Accepted Answer · 2020-02-24T20:12:53

Given your confusion matrix, the ROC plots based on the predicted probability make sense to me. For example, the ROC plots suggest you will be able to identify all true zeros when misclassifying a small percentage of other numbers. This seems to make sense given the confusion matrix, which shows a very high accuracy for zeros. The ROC plots also reflect the lower accuracy for threes or nines.

However, I think that ROC might not be the right metric for your problem: an ROC curve essentially shows the trade-off between false negatives an false positives for a given task (e.g. recognizing nines). In your case, I imagine you are not so much interested in recognition of a single number, but more interested in the overall model accuracy across all numbers. So you might be better off looking at a measure such as categorical crossentropy loss.

I think, however, that looking at the whole ROC curve can be a bit misleading in your case: you would probably not be willing to misclassify

ROC curve says my predictions are worse than random but my confusion matrix says otherwise

1 Answers