Can you reverse a PyTorch neural network and activate the inputs from the outputs?

Question

Can we activate the outputs of a NN to gain insight into how the neurons are connected to input features?

If I take a basic NN example from the PyTorch tutorials. Here is an example of a f(x,y) training example.

import torch

N, D_in, H, D_out = 64, 1000, 100, 10

x = torch.randn(N, D_in)
y = torch.randn(N, D_out)

model = torch.nn.Sequential(
    torch.nn.Linear(D_in, H),
    torch.nn.ReLU(),
    torch.nn.Linear(H, D_out),
)

loss_fn = torch.nn.MSELoss(reduction='sum')

learning_rate = 1e-4
for t in range(500):
    y_pred = model(x)
    loss = loss_fn(y_pred, y)
    model.zero_grad()
    loss.backward()
    with torch.no_grad():
        for param in model.parameters():
            param -= learning_rate * param.grad

After I've finished training the network to predict y from x inputs. Is it possible to reverse the trained NN so that it can now predict x from y inputs?

I don't expect y to match the original inputs that trained the y outputs. So I expect to see what features the model activates on to match x and y.

If it is possible, then how do I rearrange the Sequential model without breaking all the weights and connections?

It's definitely possible by creating two separate Sequential networks that share their layers in reverse order. Though this only reverses the order of layers, not the order of computational steps (since each layers performs activation(W*x + b)). But for that to be meaningful you'd need to be able to reverse the activation function which in your example, using ReLu, is not possible since it doesn't have an inverse on (-inf, inf). So you need to be more precise what you actually mean by saying "reverse a neural network". — a_guest
@Shai that is very cool, but I'm just looking to reverse a NN at this stage of my learning. — Reactgular

a_guest a_guest · Accepted Answer · 2020-01-23T20:31:48

It is possible but only for very special cases. For a feed-forward network (Sequential) each of the layers needs to be reversible; that means the following arguments apply to each layer separately. The transformation associated with one layer is y = activation(W*x + b) where W is the weight matrix and b the bias vector. In order to solve for x we need to perform the following steps:

Reverse activation; not all activation functions have an inverse though. For example the ReLU function does not have an inverse on (-inf, 0). If we used tanh on the other hand we can use its inverse which is 0.5 * log((1 + x) / (1 - x)).
Solve W*x = inverse_activation(y) - b for x; for a unique solution to exist W must have similar row and column rank and det(W) must be non-zero. We can control the former by choosing a specific network architecture while the latter depends on the training process.

So for a neural network to be reversible it must have a very specific architecture: all layers must have the same number of input and output neurons (i.e. square weight matrices) and the activation functions all need to be invertible.

Code: Using PyTorch we will have to do the inversion of the network manually, both in terms of solving the system of linear equations as well as finding the inverse activation function. Consider the following example of a 1-layer neural network (since the steps apply to each layer separately extending this to more than 1 layer is trivial):

import torch

N = 10  # number of samples
n = 3   # number of neurons per layer

x = torch.randn(N, n)

model = torch.nn.Sequential(
    torch.nn.Linear(n, n), torch.nn.Tanh()
)

y = model(x)

z = y  # use 'z' for the reverse result, start with the model's output 'y'.
for step in list(model.children())[::-1]:
    if isinstance(step, torch.nn.Linear):
        z = z - step.bias[None, ...]
        z = z[..., None]  # 'torch.solve' requires N column vectors (i.e. shape (N, n, 1)).
        z = torch.solve(z, step.weight)[0]
        z = torch.squeeze(z)  # remove the extra dimension that we've added for 'torch.solve'.
    elif isinstance(step, torch.nn.Tanh):
        z = 0.5 * torch.log((1 + z) / (1 - z))

print('Agreement between x and z: ', torch.dist(x, z))

Can you reverse a PyTorch neural network and activate the inputs from the outputs?

2 Answers