Multi-label classification loss function

Question

I have seen in many places that, for multi-label classification using neural networks, one useful loss function to use is the binary cross entropy for each of the output nodes.

In Tensorflow it looks like this:

cost = tf.nn.sigmoid_cross_entropy_with_logits()

This gives an array with as many values as output nodes we have.

My question is, should this cost function be averaged over the number of output nodes? Which in Tensorflow would look like:

cost = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits())

Or each loss is treated independently?

Thanks

Andrey Tyukin Andrey Tyukin · Accepted Answer · 2018-05-17T15:35:44

For N labels in multi-label classification, it doesn't really matter whether you sum the loss for each class, or whether you compute the average loss using tf.reduce_mean: the gradient would point in the same direction.

However, if you divide the sum by N (this is what averaging essentially is), this will influence the learning rate at the end of the day. If you are not sure how many labels in the multi-label classification task there will be, it might be easier to use tf.reduce_mean, because you wouldn't have to readjust the weight of this loss component compared to other components of the loss, and you wouldn't have to adjust the learning rate in the number N of labels changes.

Multi-label classification loss function

1 Answers