I'd like to compute the gradient of the loss wrt all the network params. The problem arises when I try to reshape each weight matrix in order to be 1 dimensional (it is useful for computations that I do later with the gradients).
At this point Tensorflow outputs a list of None
(which means that there is no path from the loss to those tensors while there should be as they are the model parameters reshaped).
Here is the code:
all_tensors = list()
for dir in ["fw", "bw"]:
for mtype in ["kernel"]:
t = tf.get_default_graph().get_tensor_by_name("encoder/bidirectional_rnn/%s/lstm_cell/%s:0" % (dir, mtype))
all_tensors.append(t)
# classifier tensors:
for mtype in ["kernel", "bias"]:
t = tf.get_default_graph().get_tensor_by_name("encoder/dense/%s:0" % (mtype))
all_tensors.append(t)
all_tensors = [tf.reshape(x, [-1]) for x in all_tensors]
tf.gradients(self.loss, all_tensors)
all_tensor
at the end of the for loops is a list of 4 components with matrices of different shapes. This code outputs [None, None, None, None]
.
If I remove the reshape line all_tensors = [tf.reshape(x, [-1]) for x in all_tensors]
the code works fine and returns 4 tensor containing the gradients wrt each param.
Why does it happen? I'm pretty sure that reshape doesn't break any dependency in the graph, otherwise it couldn't be used in any network at all.