When training my network I am occasionally met with the warning:
W0722 11:47:35.101842 140641577297728 optimizer_v2.py:928] Gradients does not exist for variables ['model/conv1d_x/Variable:0'] when minimizing the loss.
This happens sporadically at infrequent intervals (maybe once in every 20 successful steps). My model basically has two paths which join together with concatenations at various positions in the network. To illustrate this, here is a simplified example of what I mean.
class myModel(tf.keras.Model):
def __init__(self):
self.conv1 = Conv2D(32)
self.conv2 = Conv2D(32)
self.conv3 = Conv2D(16)
def call(self, inputs):
net1 = self.conv1(inputs)
net2 = self.conv2(inputs)
net = tf.concat([net1, net2], axis=2)
net = self.conv3(net)
end_points = tf.nn.softmax(net)
model = myModel()
with tf.GradientTape() as tape:
predicition = model(image)
loss = myloss(labels, prediction)
gradients = tape.gradient(loss, model.trainable_variables)
optimizer.apply_gradients(zip(gradients, model.trainable_variables))
In reality my network is much larger, but the variables that generally don't have gradients tend to be the ones at the top of the network. Before each Conv2D
layer I also have a custom gradient. Sometimes when I the error appears I can notice that the gradient function for that layer has not been called.
My question is how can the gradient tape sometimes take what appears to be different paths when propagating backwards through my network. My secondary question, is this caused by having two separate routes through my network (i.e. conv1 AND conv2). Is there a fundamental flaw in this network architecture?
Ideally, could I define to the GradientTape()
that it must find the gradients for each of the top layers?