
I've written an LSTM model using Keras, and using LeakyReLU advance activation:

    # ADAM Optimizer with learning rate decay
    opt = optimizers.Adam(lr=0.0001, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.0001)

    # build the model
    model = Sequential()

    num_features = data.shape[2]
    num_samples = data.shape[1]

        LSTM(16, batch_input_shape=(None, num_samples, num_features), return_sequences=True, activation='linear'))
    model.add(LSTM(8, return_sequences=True, activation='linear'))
    model.add(Dense(1, activation='sigmoid'))

    model.compile(loss='binary_crossentropy', optimizer=opt,
                  metrics=['accuracy', keras_metrics.precision(), keras_metrics.recall(), f1])

My data is a balanced binary labeled set. i.e: 50% labeled 1 50% labeled 0. I've used activation='linear' for the LSTM layers preceding the LeakyReLU activation, similar to this example I found on GitHub.

The model throws Nan in summary histogram error in that configuration. Changing the LSTM activations to activation='sigmoid' works well, but seems like the wrong thing to do.

Reading this StackOverflow question suggested "introducing a small value when computing the loss", I'm just not sure how to do it on a built-in loss function.

Any help/explanation would be appreciated.

Update: I can see that the loss is nan on the first epoch

260/260 [==============================] - 6s 23ms/step - 
loss: nan - acc: 0.5000 - precision: 0.5217 - recall: 0.6512 - f1: nan - val_loss: nan - val_acc: 0.0000e+00 - val_precision: -2147483648.0000 - val_recall: -49941480.1860 - val_f1: nan

Update 2 I've upgraded both TensorFlow & Keras to versions 1.12.0 & 2.2.4 . There was no effect.

I also tried adding a loss to the first LSTM layer as suggested by @Oluwafemi Sule, it looks like a step in the right direction, now the loss is not nan on the first epoch, however, I still get the same error ... probably because of other nan values, like the val_loss / val_f1.

[==============================] - 7s 26ms/step - 
loss: 1.9099 - acc: 0.5077 - precision: 0.5235 - recall: 0.6544 - f1: 0.5817 - val_loss: nan - val_acc: 0.5172 - val_precision: 35.0000 - val_recall: 0.9722 - val_f1: nan

Update 3 I tried to compile the network with just the accuracy metric, with no success:

Epoch 1/300
260/260 [==============================] - 8s 29ms/step - loss: nan - acc: 0.5538 - val_loss: nan - val_acc: 0.0000e+00
I had a similar issue once but mine was due to Nan values in the data-setkerastf
I'm not really sure if your gradients are exploding because LeakyRelu on its own is not enough to make it converge. But there is generally an option called 'clipnorm' or 'clipvalue' that you can pass with all the optimizers. This helps you clip gradients and is generally used to find ways out of local minimas. You could try that over here and see if it makes any difference? Sourcekvish
What version of Keras and TensorFlow are you using?today
keras 2.2.2 , tf 1.5.0Shlomi Schwartz
@ShlomiSchwartz Just pass clipnorm=1.0 argument to the optimizer, e.g. Adam(..., clipnorm=1.0).today

1 Answers


This answers starts from the suggestion to introduce a small value when computing the loss.

keras.layers.LSTM as with all layers that are direct or indirect subclasses of keras.engine.base_layer.Layer has a add_loss method that can be used to set a starting value for the loss.

I suggest to do this for the LSTM layer and see if it makes any difference for your results.

lstm_layer = LSTM(8, return_sequences=True, activation='linear')
