1
votes

The TensorFlow documentation explains the function

tf.gradients(
    ys,
    xs,
    grad_ys=None,
    name='gradients',
    colocate_gradients_with_ops=False,
    gate_gradients=False,
    aggregation_method=None,
    stop_gradients=None
)

saying:

  • [it] constructs symbolic derivatives of sum of ys w.r.t. x in xs.
  • ys and xs are each a Tensor or a list of tensors.
  • gradients() adds ops to the graph to output the derivatives of ys with respect to xs.
  • ys: A Tensor or list of tensors to be differentiated

I find it difficult to relate this with the mathematical definition of gradient. For example, according to wikipedia, the gradient of a scalar function f(x1, x2, x3, ..., xn) is a vector field (i.e. a function grad f : Rn -> Rn) with certain properties involving the dot product of vectors. You can also speak about the gradient of f at a certain point: (grad f)(x1, x2, x3, ..., xn).

The TensorFlow documentation speaks about tensors instead of vectors: can the definition of gradient be generalized from functions that map vectors to scalars to functions that map tensors to scalars? Is there a dot product between tensors?

Even if the definition of gradient can be applied to functions f that map tensors to scalars (with the dot product in the definition working on tensors), the documentation speaks about differentiating tensors themselves: the parameter ys is a "Tensor or list of tensors to be differentiated". According to the documentation "Tensor is a multi-dimensional array used for computation", so a tensor is not a function, how can it be differentiated?

So, how is this concept of gradient in TensorFlow exactly related to the definition from wikipedia?

1
Of course there is a dot (inner) product between tensors: en.wikipedia.org/wiki/Dot_product#Tensorsdesertnaut
@desertnaut: Thanks for the link.Giorgio

1 Answers

0
votes

One would expect that the Tensorflow Gradient is simply the Jacobian, i.e. the derivative of a rank (m) tensor Y against a rank (n) tensor X is the rank (m + n) tensor comprised of each individual derivative ∂Yj1...jm/∂Xi1...in.

However, you may notice that the gradient isn't actually a rank (m + n) tensor, but rather always takes the rank n of the tensor X -- indeed, it appears that Tensorflow gives you the gradient of the scalar sum(Y) against X.

Of course, the real Jacobians are stored internally for calculation in applying the Chain rule.