3
votes

I'm using keras to solve a multi-class problem. My data is very unbalanced, so I'm trying to create something similar to a confusion matrix. My dataset is very large, and saved as HDF5, so I use HDF5Matrix to fetch the X and Y, making scikit-learn confusion matrix irrelevant (as far as I know). I've seen it is possible to save the predictions and true labels, or output the error per label, however a more elegant solution would be to create a multi-dimensional metric that accumulates the (predicted,true) label pairs (sort of like a confusion matrix). I have used the following callback to try and peek into what's going on per batch / epoch:

from keras.callbacks import LambdaCallback
batch_print_callback = LambdaCallback(on_batch_end=lambda batch, logs: 
print(logs),on_epoch_end=lambda epoch, logs: print(logs))

but it only accumulates a single value (usually the average of sorts).

I've also tried to see if it's possible to return the y_pred / y_true as following (to try and see if I can print a multi-dimensional value in the logs):

def pred(y_true, y_pred):
     return y_pred

def true(y_true, y_pred):
    return y_true

However, it doesn't return a multi-dimensional value as I expected So basically, my question is, can I use keras to accumulate multi-dimensional metric?

1

1 Answers

3
votes

Well, to my best knowledge, it is not possible, since before returning the value of a tensor, K.mean is applied. I posted an issue about this on keras github. The best design I came up with is a metric for each cell in the confusion matrix, and a callback that collects them, inpired by the thread mentioned in the question. A sort-of working solution can be found here