I have seen in many places that, for multi-label classification using neural networks, one useful loss function to use is the binary cross entropy for each of the output nodes.
In Tensorflow it looks like this:
cost = tf.nn.sigmoid_cross_entropy_with_logits()
This gives an array with as many values as output nodes we have.
My question is, should this cost function be averaged over the number of output nodes? Which in Tensorflow would look like:
cost = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits())
Or each loss is treated independently?
Thanks