3
votes

I'd like to compute the gradient of the loss wrt all the network params. The problem arises when I try to reshape each weight matrix in order to be 1 dimensional (it is useful for computations that I do later with the gradients).

At this point Tensorflow outputs a list of None (which means that there is no path from the loss to those tensors while there should be as they are the model parameters reshaped).

Here is the code:

all_tensors = list()
for dir in ["fw", "bw"]:
    for mtype in ["kernel"]:
        t = tf.get_default_graph().get_tensor_by_name("encoder/bidirectional_rnn/%s/lstm_cell/%s:0" % (dir, mtype))
        all_tensors.append(t)
        # classifier tensors:
    for mtype in ["kernel", "bias"]:
        t = tf.get_default_graph().get_tensor_by_name("encoder/dense/%s:0" % (mtype))
        all_tensors.append(t)
all_tensors = [tf.reshape(x, [-1]) for x in all_tensors]
tf.gradients(self.loss, all_tensors)

all_tensor at the end of the for loops is a list of 4 components with matrices of different shapes. This code outputs [None, None, None, None]. If I remove the reshape line all_tensors = [tf.reshape(x, [-1]) for x in all_tensors] the code works fine and returns 4 tensor containing the gradients wrt each param.

Why does it happen? I'm pretty sure that reshape doesn't break any dependency in the graph, otherwise it couldn't be used in any network at all.

1

1 Answers

1
votes

Well, the fact is that there is no path from your tensors to the loss. If you think of the computation graph in TensorFlow, self.loss is defined through a series of operations that at some point use the tensors your are interested in. However, when you do:

all_tensors = [tf.reshape(x, [-1]) for x in all_tensors]

You are creating new nodes in the graph and new tensors that are not being used by anyone. Yes, there is a relationship between those tensors and the loss value, but from the point of view of TensorFlow that reshaping is an independent computation.

If you want to do something like that, you would have to do the reshaping first and then compute the loss using the reshaped tensors. Or, alternatively, you can just compute the gradients with respect to the original tensors and then reshape the result.