The output of softmax makes the binary cross entropy's output NAN, what should I do?

Question

I have implemented a neural network in Tensorflow where the last layer is a convolution layer, I feed the output of this convolution layer into a softmax activation function then I feed it to a cross-entropy loss function which is defined as follows along with the labels but the problem is I got NAN as the output of my loss function and I figured out it is because I have 1 in the output of softmax. So, my question is what should I do in this case? My input is a 16 by 16 image where I have 0 and 1 as the values of each pixel (binary classification)

My loss function:

#Loss function
def loss(prediction, label):
    #with tf.variable_scope("Loss") as Loss_scope:
    log_pred = tf.log(prediction, name='Prediction_Log')
    log_pred_2 = tf.log(1-prediction, name='1-Prediction_Log')
    cross_entropy = -tf.multiply(label, log_pred) - tf.multiply((1-label), log_pred_2) 

    return cross_entropy

You should use tf.nn.softmax_cross_entropy_with_logits_v2 or tf.losses.softmax_cross_entropy for that, using the outputs of the last layer before the softmax activation (the "logits"). Those functions are designed to handle extreme cases correctly. — jdehesa
@jdehesa Good point! :-) I should really have included a pointer to the out of the box functions in my answer. I assumed the OPs question was about implementing her own loss fn — Stewart_R
updated answer now with a note about the out of the box functions handling this nicely — Stewart_R
@ jdehesa , I have already tried those (without softmax as the documentation says) but the problem is my loss is zero and so my model does not learn. — MRM

Stewart_R Stewart_R · Accepted Answer · 2019-06-21T06:52:34

Note that log(0) is undefined so if ever prediction==0 or prediction==1 you will have a NaN.

In order to get around this it is commonplace to add a very small value epsilon to the value passed to tf.log in any loss function (we also do a similar thing when dividing to avoid dividing by zero). This makes our loss function numerically stable and the epsilon value is small enough to be negligible in terms of any inaccuracy it introduces to our loss.

Perhaps try something like:

#Loss function
def loss(prediction, label):
    #with tf.variable_scope("Loss") as Loss_scope:

    epsilon = tf.constant(0.000001)
    log_pred = tf.log(prediction + epsilon, name='Prediction_Log')
    log_pred_2 = tf.log(1-prediction + epsilon, name='1-Prediction_Log')

    cross_entropy = -tf.multiply(label, log_pred) - tf.multiply((1-label), log_pred_2) 
    return cross_entropy

UPDATE:

As jdehesa points out in his comments though - the 'out of the box' loss functions handle the numerical stability issue nicely already

The output of softmax makes the binary cross entropy's output NAN, what should I do?

1 Answers