3
votes

Google Colab to reproduce the error None_for_gradient.ipynb

I need a custom loss function where the value is calculated according to the model inputs, these inputs are not the default values (y_true, y_pred). The predict method works for the generated architecture, but when I try to use the train_on_batch, the following error appears.

ValueError: An operation has None for gradient. Please make sure that all of your ops have a gradient defined (i.e. are differentiable). Common ops without gradient: K.argmax, K.round, K.eval.

My custom function of loss (below) was based on this example image_ocr.py#L475, in the Colab link has another example based on this solution Custom loss function y_true y_pred shape mismatch #4781, it also generates the same error:

from keras import backend as K
from keras import losses
import keras
from keras.models import TimeDistributed, Dense, Dropout, LSTM

def my_loss(args):
    input_y, input_y_pred, y_pred = args
    return keras.losses.binary_crossentropy(input_y, input_y_pred)

def generator2():
    input_noise = keras.Input(name='input_noise', shape=(40, 38), dtype='float32')
    input_y = keras.Input(name='input_y', shape=(1,), dtype='float32')
    input_y_pred = keras.Input(name='input_y_pred', shape=(1,), dtype='float32')
    lstm1 = LSTM(256, return_sequences=True)(input_noise)
    drop = Dropout(0.2)(lstm1)
    lstm2 = LSTM(256, return_sequences=True)(drop)
    y_pred = TimeDistributed(Dense(38, activation='softmax'))(lstm2)

    loss_out = keras.layers.Lambda(my_loss, output_shape=(1,), name='my_loss')([input_y, input_y_pred, y_pred])

    model = keras.models.Model(inputs=[input_noise, input_y, input_y_pred], outputs=[y_pred, loss_out])
    model.compile(loss={'my_loss': lambda y_true, y_pred: y_pred}, optimizer='adam')

    return model

g2 = generator2()
noise = np.random.uniform(0,1,size=[10,40,38])
g2.train_on_batch([noise, np.ones(10), np.zeros(10)], noise)

I need help to verify which operation is generating this error, because as far as I know the keras.losses.binary_crossentropy is differentiable.

1

1 Answers

0
votes

I think the reason is that input_y and input_y_pred are all keras Input,your loss function is calculated with these two tensor,they are not binded up with the model parameters,so the loss function gives no gradient to your model