0
votes

Let's say I have 3 classes, and each sample can belong to any of those classes. The labels look like this.

[
    [1 0 0]
    [0 1 0]
    [0 0 1]
    [1 1 0]
    [1 0 1]
    [0 1 1]
    [1 1 1]
]

I set my output as Dense(3, activation="sigmoid"), and I compiled with optimizer="adam", loss="binary_crossentropy". I guet 0.05 for loss, and 0.98 for accuracy, according to Keras output.

I thought I would get only 1 or 0 for prediction values if I use sigmoid and binary_crossentropy. However, model.predict(training-features) gave me values between 1 and 0 like 0.0026. I have tried all 4 combinations between categorical_crossentropy and binary_crossentropy with sigmoid and softmax. Model.predict always returns a value between 0 and 1 with shape of n_samples by n_classes. It would be 7x3 in the example above.

Then I clipped the values at 0.5 like below and checked accuracy_score(training_labels, preds). The score dropped to 0.1.

preds[preds>=0.5] = 1
preds[preds<0.5] = 0

I'd appreciate if someone could give me some guidance on how I should approach this problem.

Thanks!

1

1 Answers

3
votes

Per your description, this is a multi-label classification problem and therefore you should use sigmoid as the activation function of last layer and binary_crossentropy as the loss function. That's because we consider the classification of each label to be independent of all the other labels. Therefore, using softmax or categorical_crossentropy is wrong in this scenario.

The discrepancy between the accuracy reported by Keras and the accuracy computed using sklearn.metrics.accuracy_score() function is not due to the rounding; actually Keras does the same rounding (or clipping) you have done for computing the accuracy. Rather, the difference is due to the fact that the accuracy_score function in multi-label classification mode only considers a sample as correctly classified when all the true labels and predicted labels for that sample match with each other. This has been clearly stated in the documentation:

In multilabel classification, this function computes subset accuracy: the set of labels predicted for a sample must exactly match the corresponding set of labels in y_true.

However, in Keras binary_accuracy function reports the average fraction of correctly classified labels (i.e. partial match is acceptable). To better understand this, consider the following example:

True labels  | Predictions | Keras binary acc | accuracy_score
-----------------------------------------------------------------
  [1 0 0]    |   [1 0 1]   | 2 correct = 0.66 | not match = 0.00
  [0 1 1]    |   [0 1 1]   | 3 correct = 1.00 | match     = 1.00
  [1 0 1]    |   [0 0 1]   | 2 correct = 0.66 | not match = 0.00
=================================================================
      average reported acc |             0.77 |             0.33