How to know scikit-learn confusion matrix's label order and change it

Question

There is a multi-classification problem with 27 classes.

y_predict=[0 0 0 20 26 21 21 26 ....]

y_true=[1 10 10 20 26 21 18 26 ...]

A list named "answer_vocabulary" stored the corresponding 27 words to each index. answer_vocabulary=[0 1 10 11 2 3 agriculture commercial east living north .....]

cm = confusion_matrix(y_true=y_true, y_pred=y_predict)

I'm confused about the order of the confusion matrix. It is in an ascending index order? And if I want to reorder the confusion matrix with a label sequence=[0 1 2 3 10 11 agriculture commercial living east north ...], how can I implement it?

Here is a function I have tried to plot confusion matrix.

def plot_confusion_matrix(cm, classes,
                        normalize=False,
                        title='Confusion matrix',
                        cmap=plt.cm.Blues):
    """
    This function prints and plots the confusion matrix.
    Normalization can be applied by setting `normalize=True`.
    """
    plt.imshow(cm, interpolation='nearest', cmap=cmap)
    plt.title(title)
    plt.colorbar()
    tick_marks = np.arange(len(classes))
    plt.xticks(tick_marks, classes, rotation=45)
    plt.yticks(tick_marks, classes)

    if normalize:
        cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
        print("Normalized confusion matrix")
    else:
        print('Confusion matrix, without normalization')

    print(cm)

    thresh = cm.max() / 2.
    for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
        plt.text(j, i, cm[i, j],
            horizontalalignment="center",
            color="white" if cm[i, j] > thresh else "black")

    plt.tight_layout()
    plt.ylabel('True label')
    plt.xlabel('Predicted label')

Iñigo Iñigo · Accepted Answer · 2020-08-29T22:35:11

The confusion matrices from sklearn don't store information about how the matrix was created (class ordering, and normalization): this means you must use the confusion matrix as soon as you create it or the information will be lost.

By default, sklearn.metrics.confusion_matrix(y_true,y_pred) create the matrix in the order the classes appear in y_true.

If you pass this data to sklearn.metrix.confusion_matrix:

+--------+--------+
| y_true | y_pred |
+--------+--------+
| A      | B      |
| C      | C      |
| D      | B      |
| B      | A      |
+--------+--------+

Scikit-leart will create this confusion matrix (zeros omited):

+-----------+---+---+---+---+
| true\pred | A | C | D | B | 
+-----------+---+---+---+---+
| A         |   |   |   | 1 |
| C         |   | 1 |   |   |
| D         |   |   |   | 1 |
| B         | 1 |   |   |   |
+-----------+---+---+---+---+

And it will return this numpy matrix to you:

+---+---+---+---+
| 0 | 0 | 0 | 1 |
| 0 | 0 | 1 | 0 |
| 0 | 0 | 0 | 1 |
| 1 | 0 | 0 | 0 |
+---+---+---+---+

If you want to select classes, or reorder them you can pass the 'labels' argument to confusion_matrix().

For reordering:

labels = ['D','C','B','A']
mat = confusion_matrix(true_y,pred_y, labels=labels)

Or, if you just want to focus on some labels (useful if you have a lot of labels):

labels = ['A','D']
mat = confusion_matrix(true_y,pred_y, labels=labels)

Also,take a look at sklearn.metrics.plot_confusion_matrix. It works very well for small (<100) classes.

If you have >100 classes it will take a white to plot the matrix.

How to know scikit-learn confusion matrix's label order and change it

2 Answers