Keras An operation has None for gradient when train_on_batch

Question

Google Colab to reproduce the error None_for_gradient.ipynb

I need a custom loss function where the value is calculated according to the model inputs, these inputs are not the default values (y_true, y_pred). The predict method works for the generated architecture, but when I try to use the train_on_batch, the following error appears.

ValueError: An operation has None for gradient. Please make sure that all of your ops have a gradient defined (i.e. are differentiable). Common ops without gradient: K.argmax, K.round, K.eval.

My custom function of loss (below) was based on this example image_ocr.py#L475, in the Colab link has another example based on this solution Custom loss function y_true y_pred shape mismatch #4781, it also generates the same error:

from keras import backend as K
from keras import losses
import keras
from keras.models import TimeDistributed, Dense, Dropout, LSTM

def my_loss(args):
    input_y, input_y_pred, y_pred = args
    return keras.losses.binary_crossentropy(input_y, input_y_pred)

def generator2():
    input_noise = keras.Input(name='input_noise', shape=(40, 38), dtype='float32')
    input_y = keras.Input(name='input_y', shape=(1,), dtype='float32')
    input_y_pred = keras.Input(name='input_y_pred', shape=(1,), dtype='float32')
    lstm1 = LSTM(256, return_sequences=True)(input_noise)
    drop = Dropout(0.2)(lstm1)
    lstm2 = LSTM(256, return_sequences=True)(drop)
    y_pred = TimeDistributed(Dense(38, activation='softmax'))(lstm2)

    loss_out = keras.layers.Lambda(my_loss, output_shape=(1,), name='my_loss')([input_y, input_y_pred, y_pred])

    model = keras.models.Model(inputs=[input_noise, input_y, input_y_pred], outputs=[y_pred, loss_out])
    model.compile(loss={'my_loss': lambda y_true, y_pred: y_pred}, optimizer='adam')

    return model

g2 = generator2()
noise = np.random.uniform(0,1,size=[10,40,38])
g2.train_on_batch([noise, np.ones(10), np.zeros(10)], noise)

I need help to verify which operation is generating this error, because as far as I know the keras.losses.binary_crossentropy is differentiable.

buggy buggy · Accepted Answer · 2018-12-25T03:44:18

I think the reason is that input_y and input_y_pred are all keras Input,your loss function is calculated with these two tensor,they are not binded up with the model parameters,so the loss function gives no gradient to your model

Keras An operation has None for gradient when train_on_batch

1 Answers