0
votes

I'm learning pytorch and tried to train a network as an XOR gate. Everything runs smoothly, but it just does not learn. It does changes it weights, yet it converges in a result for every input that is way out of the expected results.

I have tried with many learning rates and weights initialization.

So the inputs are A and B gates and it should return 1 if both are equals or 0 otherwise, like this :


    [0,0] => 1
    [0,1] => 0
    [1,0] => 0
    [1,1] => 1

This is my attempt of modeling and training the model:


    import torch as torch
    import torch.nn as nn
    
    class Network(nn.Module):
        
        def __init__(self):
            super(Network, self).__init__()
            self.x1 = nn.Linear(2,4)
            self.s1 = nn.Sigmoid()
            self.x2 = nn.Linear(4,1)
            self.s2 = nn.Sigmoid()
        
        def init(self):
            nn.init.uniform_(self.x1.weight)
            nn.init.uniform_(self.x2.weight)
    
        def forward(self, feats):
            f1 = torch.tensor(feats).float()
            xr1= self.x1(f1)
            xs1= self.s1(xr1)
            xr2= self.x2(xs1)
            out= self.s2(xr2)        
            return out  
    
        def train(self,val_expected,feats_next):
            val_expected_tensor = torch.tensor(val_expected)
            criterion = nn.MSELoss()
            optimizer = torch.optim.SGD(self.parameters(), lr=0.01)
            def closure():
                optimizer.zero_grad()
                resp = self.forward(feats_next)
                error = criterion(resp,val_expected_tensor)
                error.backward()
                return error
            optimizer.step(closure)
    
    net = Network()
    net.init()
    
    for input in ([0.,0.],[0.,1.],[1.,0.],[1.,1.]):
        response=net.forward(input)
        print(response)
    
    print ("--TRAIN START-")
    for i in range(1000):
        net.train([1.],[0.,0.])
        net.train([0.],[1.,0.])
        net.train([0.],[0.,1.])
        net.train([1.],[1.,1.])
    print ("---TRAIN END---")
    
    for input in ([0.,0.],[0.,1.],[1.,0.],[1.,1.]):
        response=net.forward(input)
        print(response)

This is a run with 100000 iterations at 0.001 learning rate:


    tensor([0.7726], grad_fn=)
    tensor([0.7954], grad_fn=)
    tensor([0.8229], grad_fn=)
    tensor([0.8410], grad_fn=)
    --TRAIN START-
    *.........*.........*.........*.........*.........*.........*.........*.........*.........*.........
    ---TRAIN END---
    tensor([0.6311], grad_fn=)
    tensor([0.6459], grad_fn=)
    tensor([0.6770], grad_fn=)
    tensor([0.6906], grad_fn=)

I'm really lost here. Shound't this work?

1
Can you initialize your loss and optimizer outside of the train function?Anurag Reddy
I did that with the same results.Samy Garib

1 Answers

0
votes

So, in your case, keep the train method outside of the network class. The code would be as followed.

net = Network()
net.init()
criterion = nn.MSELoss()
optimizer = torch.optim.SGD(net.parameters(), lr=0.1)
for input in ([0., 0.], [0., 1.], [1., 0.], [1., 1.]):
    response = net.forward(input)
    print (response)

def train(val_expected, feats_next, criterion, optimizer):
    val_expected_tensor = torch.tensor(val_expected)
    optimizer.zero_grad()
    resp = net.forward(feats_next)
    # print (resp)
    error = criterion(resp, val_expected_tensor)
    # print (error, resp, val_expected_tensor)
    error.backward()
    # print (error)
    optimizer.step()


print("--TRAIN START-")
for i in range(10000):
    train([1.], [0., 0.], criterion, optimizer)
    train([0.], [1., 0.], criterion, optimizer)
    train([0.], [0., 1.], criterion, optimizer)
    train([1.], [1., 1.], criterion, optimizer)
print("---TRAIN END---")

for input in ([0., 0.], [0., 1.], [1., 0.], [1., 1.]):
    response = net.forward(input)
    print(response)

The results are as followed:

tensor([0.9571], grad_fn=<SigmoidBackward>)
tensor([0.0414], grad_fn=<SigmoidBackward>)
tensor([0.0459], grad_fn=<SigmoidBackward>)
tensor([0.9621], grad_fn=<SigmoidBackward>)

I just increased the learning rate and also the there is a train method in nn.Module, and therefore not a good idea to have the trainer method in the model instantiation.