Suppose I multiply a vector with a scalar, e.g.:
a = tf.Variable(3.)
b = tf.Variable([1., 0., 1.])
with tf.GradientTape() as tape:
c = a*b
grad = tape.gradient(c, a)
The resulting gradient I get is a scalar,
<tf.Tensor: shape=(), dtype=float32, numpy=2.0>
whereas we would expect the vector:
<tf.Variable 'Variable:0' shape=(3,) dtype=float32, numpy=array([1., 0., 1.], dtype=float32)>
Looking at other examples, it appears that tensorflow sums the expected vector, also for scalar-matrix multiplication and so on.
Why does tensorflow do this? This can probably be avoided using @custum_gradient
, is there another less cumbersome way to get the correct gradient?
There are appear to be some related questions but these all seem to consider a the gradient of a loss function that aggregates over a training-batch. No loss function or aggregation is used here, so I think the issue is something else?