I'm learning pytorch and tried to train a network as an XOR gate. Everything runs smoothly, but it just does not learn. It does changes it weights, yet it converges in a result for every input that is way out of the expected results.
I have tried with many learning rates and weights initialization.
So the inputs are A and B gates and it should return 1 if both are equals or 0 otherwise, like this :
[0,0] => 1 [0,1] => 0 [1,0] => 0 [1,1] => 1
This is my attempt of modeling and training the model:
import torch as torch import torch.nn as nn class Network(nn.Module): def __init__(self): super(Network, self).__init__() self.x1 = nn.Linear(2,4) self.s1 = nn.Sigmoid() self.x2 = nn.Linear(4,1) self.s2 = nn.Sigmoid() def init(self): nn.init.uniform_(self.x1.weight) nn.init.uniform_(self.x2.weight) def forward(self, feats): f1 = torch.tensor(feats).float() xr1= self.x1(f1) xs1= self.s1(xr1) xr2= self.x2(xs1) out= self.s2(xr2) return out def train(self,val_expected,feats_next): val_expected_tensor = torch.tensor(val_expected) criterion = nn.MSELoss() optimizer = torch.optim.SGD(self.parameters(), lr=0.01) def closure(): optimizer.zero_grad() resp = self.forward(feats_next) error = criterion(resp,val_expected_tensor) error.backward() return error optimizer.step(closure) net = Network() net.init() for input in ([0.,0.],[0.,1.],[1.,0.],[1.,1.]): response=net.forward(input) print(response) print ("--TRAIN START-") for i in range(1000): net.train([1.],[0.,0.]) net.train([0.],[1.,0.]) net.train([0.],[0.,1.]) net.train([1.],[1.,1.]) print ("---TRAIN END---") for input in ([0.,0.],[0.,1.],[1.,0.],[1.,1.]): response=net.forward(input) print(response)
This is a run with 100000 iterations at 0.001 learning rate:
tensor([0.7726], grad_fn=) tensor([0.7954], grad_fn=) tensor([0.8229], grad_fn=) tensor([0.8410], grad_fn=) --TRAIN START- *.........*.........*.........*.........*.........*.........*.........*.........*.........*......... ---TRAIN END--- tensor([0.6311], grad_fn=) tensor([0.6459], grad_fn=) tensor([0.6770], grad_fn=) tensor([0.6906], grad_fn=)
I'm really lost here. Shound't this work?