Tensorflow aggregates scalar-tensor multiplication gradient

Question

Suppose I multiply a vector with a scalar, e.g.:

a = tf.Variable(3.)
b = tf.Variable([1., 0., 1.])

with tf.GradientTape() as tape:
   c = a*b

grad = tape.gradient(c, a)

The resulting gradient I get is a scalar,

<tf.Tensor: shape=(), dtype=float32, numpy=2.0>

whereas we would expect the vector:

<tf.Variable 'Variable:0' shape=(3,) dtype=float32, numpy=array([1., 0., 1.], dtype=float32)>

Looking at other examples, it appears that tensorflow sums the expected vector, also for scalar-matrix multiplication and so on. Why does tensorflow do this? This can probably be avoided using @custum_gradient, is there another less cumbersome way to get the correct gradient?

There are appear to be some related questions but these all seem to consider a the gradient of a loss function that aggregates over a training-batch. No loss function or aggregation is used here, so I think the issue is something else?

M.Innat M.Innat · Accepted Answer · 2021-04-14T17:00:18

You're getting scaler value because you took the gradient wrt scaler. You would get a vector if you took grad wrt some vector. Take a look to the following example:

import tensorflow as tf 

a = tf.Variable(3.,  trainable=True)
b = tf.Variable([1., 0, 1.],  trainable=True)
c = tf.Variable(2., trainable=True)
d = tf.Variable([2., 1, 2.],  trainable=True)

with tf.GradientTape(persistent=True) as tape:
   e = a*b*c*d # abcd , abcd , abcd
   tf.print(e)

grad = tape.gradient(e, [a, b, c, d])
grad[0].numpy(), grad[1].numpy(), grad[2].numpy(), grad[3].numpy()

[12 0 12]
(8.0,
 array([12.,  6., 12.], dtype=float32),
 12.0,
 array([6., 0., 6.], dtype=float32))

Tensorflow aggregates scalar-tensor multiplication gradient

2 Answers