Higher order gradients in pytorch

Question

I have implemented the following Jacobian function in pytorch. Unless I have made a mistake, it computes the Jacobian of any tensor w.r.t. any dimensional inputs:

import torch
import torch.autograd as ag

def nd_range(stop, dims = None):
    if dims == None:
        dims = len(stop)
    if not dims:
        yield ()
        return
    for outer in nd_range(stop, dims - 1):
        for inner in range(stop[dims - 1]):
            yield outer + (inner,)


def full_jacobian(f, wrt):    
    f_shape = list(f.size())
    wrt_shape = list(wrt.size())
    fs = []


    f_range = nd_range(f_shape)
    wrt_range = nd_range(wrt_shape)

    for f_ind in f_range:
        grad = ag.grad(f[tuple(f_ind)], wrt, retain_graph=True, create_graph=True)[0]
        for i in range(len(f_shape)):
            grad = grad.unsqueeze(0)
        fs.append(grad)

    fj = torch.cat(fs, dim=0)
    fj = fj.view(f_shape + wrt_shape)
    return fj

On top of this, I have tried to implement a recursive function to calculate nth order derivatives:

def nth_derivative(f, wrt, n):
    if n == 1:
        return full_jacobian(f, wrt)
    else:        
        deriv = nth_derivative(f, wrt, n-1)
        return full_jacobian(deriv, wrt)

I ran a simple test:

op = torch.ger(s, s)
deep_deriv = nth_derivative(op, s, 5)

Unfortunately, this succeeds in getting me the Hessian...but no higher order derivatives. I'm aware many higher order derivatives should be 0, but I'd prefer if pytorch can analytically compute that.

One fix has been to change the gradient calculation to:

try:
            grad = ag.grad(f[tuple(f_ind)], wrt, retain_graph=True, create_graph=True)[0]
        except:
            grad = torch.zeros_like(wrt)

Is this the accepted correct way to handle this? Or is there a better option? Or do I have the reason for my issue completely wrong to begin with?

Alex Alex · Accepted Answer · 2018-05-16T16:00:10

You can just iterate calling the grad function:

import torch
from torch.autograd import grad

def nth_derivative(f, wrt, n):

    for i in range(n):

        grads = grad(f, wrt, create_graph=True)[0]
        f = grads.sum()

    return grads

x = torch.arange(4, requires_grad=True).reshape(2, 2)
loss = (x ** 4).sum()

print(nth_derivative(f=loss, wrt=x, n=3))

outputs

tensor([[  0.,  24.],
        [ 48.,  72.]])

Higher order gradients in pytorch

2 Answers