Tensorflow GradientTape “Gradients does not exist for variables” intermittently

Question

When training my network I am occasionally met with the warning:

W0722 11:47:35.101842 140641577297728 optimizer_v2.py:928] Gradients does not exist for variables ['model/conv1d_x/Variable:0'] when minimizing the loss.

This happens sporadically at infrequent intervals (maybe once in every 20 successful steps). My model basically has two paths which join together with concatenations at various positions in the network. To illustrate this, here is a simplified example of what I mean.

class myModel(tf.keras.Model):

  def __init__(self):

    self.conv1 = Conv2D(32)
    self.conv2 = Conv2D(32)
    self.conv3 = Conv2D(16)

  def call(self, inputs):

    net1 = self.conv1(inputs)
    net2 = self.conv2(inputs)
    net = tf.concat([net1, net2], axis=2)
    net = self.conv3(net)
    end_points = tf.nn.softmax(net)

model = myModel()

with tf.GradientTape() as tape:

  predicition = model(image)
  loss = myloss(labels, prediction)

gradients = tape.gradient(loss, model.trainable_variables)
optimizer.apply_gradients(zip(gradients, model.trainable_variables))

In reality my network is much larger, but the variables that generally don't have gradients tend to be the ones at the top of the network. Before each Conv2D layer I also have a custom gradient. Sometimes when I the error appears I can notice that the gradient function for that layer has not been called.

My question is how can the gradient tape sometimes take what appears to be different paths when propagating backwards through my network. My secondary question, is this caused by having two separate routes through my network (i.e. conv1 AND conv2). Is there a fundamental flaw in this network architecture?

Ideally, could I define to the GradientTape() that it must find the gradients for each of the top layers?

Hi D.Griffiths, With the code specified by you and the explanation given by you, we couldn't reproduce your error. Can you please share complete code (Dummy code, which represents your architecture and Custom Gradients should suffice), so that we can try reproducing the error and work towards its resolution. Thanks. — Tensorflow Support
Hi, the code to reproduce the issue would be my whole network which is rather large and contains custom ops and specific data etc. I don't believe this is a bug with the TF code. I was more wandering in what scenarios would gradients not exists for certain variables intermittently causing the error to be thrown. For example is it normal that maybe a division by zero occurs, or is this something I should be worried about (and even a bug from my end). — D.Griffiths
I just had a similar problem where I was always getting the warning. My code was different. Basically, in the build function of the custom class, I was adding a weight and overwriting my reference to it. By avoiding that the warning went away. But your case doesn't seem to be similar. — Eduardo Reis

gkennos gkennos · Accepted Answer · 2019-12-11T05:12:49

I had an issue that seems similar - may be helpful or not sure depending on what your network actually looks like, but basically, I had a multi-output network and I realised that as I was applying gradients that corresponded to the outputs separately, so for each separate loss there was a branch of the network for which the gradient was zero, but this was totally valid and corresponded to the terminal layers immediately prior to the non-targeted outputs each time. For this reason, I ended up replacing any None gradients with tf.zeros_like and it was possible to proceed with training. Could you have the same problem with multiple input heads to your network, if it's always at the top of the graph?

(ETA solution by Nguyễn Thu below is the code version of what I'm describing in above - exactly the same way that I dealt with it)

I've seen other answers where gradients weren't calculating because tensors aren't watched by default - you have to add them, but looks like that's not your issue as you should be only dealing with model.trainable_variables, or perhaps your myLoss function is getting a NaN result or casting to a numpy array occasionally depending on your batch composition, which would explain the sporadic nature (e.g. perhaps it's on batches that have no instances of a minority class if your data is very imbalanced?)

Tensorflow GradientTape “Gradients does not exist for variables” intermittently

5 Answers