Can we activate the outputs of a NN to gain insight into how the neurons are connected to input features?
If I take a basic NN example from the PyTorch tutorials. Here is an example of a f(x,y)
training example.
import torch
N, D_in, H, D_out = 64, 1000, 100, 10
x = torch.randn(N, D_in)
y = torch.randn(N, D_out)
model = torch.nn.Sequential(
torch.nn.Linear(D_in, H),
torch.nn.ReLU(),
torch.nn.Linear(H, D_out),
)
loss_fn = torch.nn.MSELoss(reduction='sum')
learning_rate = 1e-4
for t in range(500):
y_pred = model(x)
loss = loss_fn(y_pred, y)
model.zero_grad()
loss.backward()
with torch.no_grad():
for param in model.parameters():
param -= learning_rate * param.grad
After I've finished training the network to predict y
from x
inputs. Is it possible to reverse the trained NN so that it can now predict x
from y
inputs?
I don't expect y
to match the original inputs that trained the y
outputs. So I expect to see what features the model activates on to match x
and y
.
If it is possible, then how do I rearrange the Sequential
model without breaking all the weights and connections?
Sequential
networks that share their layers in reverse order. Though this only reverses the order of layers, not the order of computational steps (since each layers performsactivation(W*x + b)
). But for that to be meaningful you'd need to be able to reverse theactivation
function which in your example, usingReLu
, is not possible since it doesn't have an inverse on(-inf, inf)
. So you need to be more precise what you actually mean by saying "reverse a neural network". – a_guest