The TensorFlow documentation explains the function
tf.gradients(
ys,
xs,
grad_ys=None,
name='gradients',
colocate_gradients_with_ops=False,
gate_gradients=False,
aggregation_method=None,
stop_gradients=None
)
saying:
- [it] constructs symbolic derivatives of sum of ys w.r.t. x in xs.
- ys and xs are each a Tensor or a list of tensors.
- gradients() adds ops to the graph to output the derivatives of ys with respect to xs.
- ys: A Tensor or list of tensors to be differentiated
I find it difficult to relate this with the mathematical definition of gradient. For example, according to wikipedia, the gradient of a scalar function f(x1, x2, x3, ..., xn) is a vector field (i.e. a function grad f : Rn -> Rn) with certain properties involving the dot product of vectors. You can also speak about the gradient of f at a certain point: (grad f)(x1, x2, x3, ..., xn).
The TensorFlow documentation speaks about tensors instead of vectors: can the definition of gradient be generalized from functions that map vectors to scalars to functions that map tensors to scalars? Is there a dot product between tensors?
Even if the definition of gradient can be applied to functions f that map tensors to scalars (with the dot product in the definition working on tensors), the documentation speaks about differentiating tensors themselves: the parameter ys is a "Tensor or list of tensors to be differentiated". According to the documentation "Tensor is a multi-dimensional array used for computation", so a tensor is not a function, how can it be differentiated?
So, how is this concept of gradient in TensorFlow exactly related to the definition from wikipedia?