Scikit-Learn: Labels don't match in Confusion Matrix

Question

Let's say I have an array with (potentially) 43 different values, e.g.

import pandas as pd
Y_test = pd.Series([4,4,4,42,42,0,1,1,19], dtype=int)
Y_hat = pd.Series([4,4,2,32,42,0,5,5,19], dtype=int)

Whenever I try to plot the confusion matrix with:

def create_conf_mat(index, y_test, y_hat):
    cm = confusion_matrix(y_test, y_hat)
    fig = plt.figure()
    ax = fig.add_subplot(111)
    cax = ax.matshow(cm)
    plt.title(f'Confusion Matrix ({index} features, 1 outcome)')
    fig.colorbar(cax)
    plt.xlabel('Predicted')
    plt.ylabel('Actual')
    plt.savefig(f'confm_{index}.png')
    plt.savefig(f'confm_{index}.svg')
    plt.savefig(f'confm_{index}.pdf')
    return

I don't get the labels [0, 1, 2, 4, 5, 19, 32, 42] but [0, 1, 2, 3, 4, 5, 6, 7]. I tried to set the labels explicitly by using the unique values in y_test/y_hat as the labels argument but it doesn't work either. I even tried to convert the integer values as strings but by doing so, sklearn complains that at least one label has to be in y_true. Does anyone know how I could get the actual values within y_test and y_pred to be plotted as labels within the confusion matrix?

You can add the following two lines before the return statement plt.xticks(range(len(y_test)), y_test) and plt.yticks(range(len(y_hat)), y_hat) — Sheldore

JohanC JohanC · Accepted Answer · 2020-02-14T14:36:00

As hinted in the documentation, about the labels parameter to confusion_matrix:

If None is given, those that appear at least once in y_true or y_pred are used in sorted order.

So, we need to grab together both lists, and extract the list of unique numbers:

labels = np.unique(np.concatenate([y_test.values, y_hat.values]))
plt.xticks(range(len(labels)), labels)
plt.yticks(range(len(labels)), labels)

Please note that the latest version of 'scikit-learn' now includes an integrated function to plot a confusion matrix with example code.

Scikit-Learn: Labels don't match in Confusion Matrix

1 Answers