2
votes

This issue has been mentioned a few times here on Stackoverflow, but none provided the solution for the problem/error I'm currently facing.

Currently my y of the dataset that I use as labels had to be transformed using One-Hot Encoding so that my Deep Learning network/model could handle it as a categorical_crossentropy.

But now the problem arises that for the evaluation of my data, it needs the original labels again for the prediction of y.

import pandas as pd
import numpy as np

keypoints = pd.read_csv('keypoints.csv')

X = keypoints.iloc[:,1:76]
y = keypoints.iloc[:,-1]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0, stratify=y)

Here y is a list of 3 different labels, let's say A,B and C with a shape of (63564, 1)

So using the One-Hot encoding I was able to split it up:

le = LabelEncoder()
y = le.fit_transform(y)
ohe = OneHotEncoder(categorical_features = [0])
y = ohe.fit_transform(y[:,None]).toarray()

The new y here has a shape of (63564, 3) and looks like:

[[0. 0. 1.]
 [0. 0. 1.]
 [0. 0. 1.]
 ...
 [1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]]

After running my Deep Learning network I want to evaluate it by using:

......
#Evaluation and such
y_pred = model.predict(X_test, verbose=0)
y_classes = model.predict_classes(X_test, verbose=0)

#Reduce to 1D
y_pred = y_pred[:, 0]
y_classes = y_classes[:, 0]

#Confution Matrix
print(confusion_matrix(y_test, y_classes))

#Accuracy: (tp + tn) / (p + n)
accuracy = accuracy_score(y_test, y_classes)
print('Accuracy: %f' % accuracy)
#Precision tp / (tp + fp)
precision = precision_score(y_test, y_classes)
print('Precision: %f' % precision)
#Recall: tp / (tp + fn)
recall = recall_score(y_test, y_classes)
print('Recall: %f' % recall)
#F1: 2 tp / (2 tp + fp + fn)
f1 = f1_score(y_test, y_classes)
print('F1 score: %f' % f1)

But ofcourse this won't accept the 0 and 1 as labels:

ValueError: Classification metrics can't handle a mix of unknown and continuous-multioutput targets

So my question is

How do i reverese the One-Hot Encoded labels so that I can run the evaluation of my DL model?

3

3 Answers

2
votes

You probably will need inverse_transform as documented in the examples section of sklearn.preprocessing.OneHotEncoder

>>> from sklearn.preprocessing import OneHotEncoder
>>> enc = OneHotEncoder(handle_unknown='ignore')
>>> X = [['Male', 1], ['Female', 3], ['Female', 2]]
>>> enc.transform([['Female', 1], ['Male', 4]]).toarray()
array([[1., 0., 1., 0., 0.],
       [0., 1., 0., 0., 0.]])
>>> enc.inverse_transform([[0, 1, 1, 0, 0], [0, 0, 0, 1, 0]])
array([['Male', 1],
       [None, 2]], dtype=object)
0
votes

You can use argmax to convert probabilities to categorical decisions:

y_test_classes = y_test.argmax(1)
y_pred_classes = y_pred.argmax(1)

print(confusion_matrix(y_true=y_test_classes, y_pred=y_pred_classes, labels=['A', 'B', 'C']))
0
votes
y_classes = np.argmax(y_test_4, axis=1)


y_classes = model_4.predict_classes(x_test_4, verbose=0)
y_classes =  np.reshape(yhat_classes_4, (-1, 1))

y_classes= y_classes[:, 0]
 
# accuracy: (tp + tn) / (p + n)
accuracy_4 = accuracy_score(y_test_classes_4, y_classes)
print('Accuracy: %f' % accuracy_4)
# precision tp / (tp + fp)
precision_4 = precision_score(y_test_classes_4, y_classes)
print('Precision: %f' % precision_4)
# recall: tp / (tp + fn)
recall_4 = recall_score(y_test_classes_4, y_classes)
print('Recall: %f' % recall_4)
# f1: 2 tp / (2 tp + fp + fn)
f1_4 = f1_score(y_test_classes_4, y_classes)
print('F1 score: %f' % f1_4)

Use this. You want to first convert your one hot encoded data using argmax function to compare your predictions.' Coming to the predictions part , you want to convert your 1D array to a 2D array as stated in line 3