XOR Neural Network Converges to 0.5

Question

I've implemented the following neural network to solve the XOR problem in Python. My neural network consists of an input layer of 2 neurons, 1 hidden layer of 2 neurons and an output layer of 1 neuron. I am using the Sigmoid function as the activation function for the hidden layer and the linear (identity) function as the activation function for the output layer:

import numpy as np

def sigmoid(z):
    return 1/(1+np.exp(-z))

def s_prime(z):
    return np.multiply(sigmoid(z), sigmoid(1.0-z))

def init_weights(layers, epsilon):
    weights = []
    for i in range(len(layers)-1):
        w = np.random.rand(layers[i+1], layers[i]+1)
        w = w * 2*epsilon - epsilon
        weights.append(np.mat(w))
    return weights

def fit(X, Y, w, predict=False, x=None):
    w_grad = ([np.mat(np.zeros(np.shape(w[i]))) 
              for i in range(len(w))])
    for i in range(len(X)):
        x = x if predict else X[0]
        y = Y[0,i]
        # forward propagate
        a = x
        a_s = []
        for j in range(len(w)):
            a = np.mat(np.append(1, a)).T
            a_s.append(a)
            z = w[j] * a
            a = sigmoid(z)
        if predict: return a
        # backpropagate
        delta = a - y.T
        w_grad[-1] += delta * a_s[-1].T
        for j in reversed(range(1, len(w))):
            delta = np.multiply(w[j].T*delta, s_prime(a_s[j]))
            w_grad[j-1] += (delta[1:] * a_s[j-1].T)
    return [w_grad[i]/len(X) for i in range(len(w))]

def predict(x):
    return fit(X, Y, w, True, x)

####

X = np.mat([[0,0],
            [0,1],
            [1,0],
            [1,1]])
Y = np.mat([0,1,1,0])
layers = [2,2,1]
epochs = 10000
alpha = 0.5
w = init_weights(layers, 1)

for i in range(epochs):
    w_grad = fit(X, Y, w)
    print w_grad
    for j in range(len(w)):
        w[j] -= alpha * w_grad[j]

for i in range(len(X)):
    x = X[i]
    guess = predict(x)
    print x, ":", guess

The backpropagation seems to all be correct; the only issue that comes to mind would be some problem with my implementation of the bias units. Either way, all predications for each input converge to approximately 0.5 each time I run the code. I've scoured the code and can't seem to find what's wrong. Can anyone point what's wrong with my implementation? I appreciate any feedback.

If for any reason it might help, here's the kind of output I'm getting:

[[0 0]] : [[ 0.5]]
[[0 1]] : [[ 0.49483673]]
[[1 0]] : [[ 0.52006739]]
[[1 1]] : [[ 0.51610963]]

Actually your code about calculating sigmoid function's derivative has little problem, since g'(z) = a*(1-a), g means sigmoid function, a = sigmoid(z), and you pass a_s[j] to s_prime(), so your s_prime() should be return np.multiply(z, 1.0-z) instead of return np.multiply(sigmoid(z), sigmoid(1.0-z)). — Belter
@Sam, are you using linear neuron in the output layer? Otherwise, I can't think of why you would do "delta = a - y.T" after the line "# backpropagate". — Kun Hu

rayryeng rayryeng · Accepted Answer · 2016-04-02T05:43:09

Your implementation of forward and backpropagation is more or less correct. However, where you're going wrong is quite simple. The first small error is to look inside your fit function - specifically the first statement inside your for loop:

x = x if predict else X[0]

You are saying that if you aren't predicting (i.e. performing training), the input example chosen during each iteration of Stochastic Gradient Descent must always be the first example, which is [0 0] (i.e. X[0]). This is the reason why you are getting 0.5 for all of your predictions because you are only training using the first input. You need to change this so that it reads the correct example, which is example i:

x = x if predict else X[i]

The last change you need to make is your s_prime function. The derivative of the sigmoid function is indeed what you have there:

def s_prime(z):
    return np.multiply(sigmoid(z), sigmoid(1.0-z))

When you calculate the forward propagation, you have already computed the output activations of each neuron in a_s, so when you compute the local derivative at these neurons, you supply the output activations directly to s_prime so there is no need for you to compute the sigmoid of these again.

Therefore:

def s_prime(z):
    return np.multiply(z, 1.0-z)

Once I made these two changes, we now get this output:

[[0 0]] : [[ 0.00239857]]
[[0 1]] : [[ 0.99816778]]
[[1 0]] : [[ 0.99816596]]
[[1 1]] : [[ 0.0021052]]

You can see that this agrees with the expected output of the XOR gate more or less. One last thing I can recommend is that 10000 iterations is far too long computationally given your current code structure. I noticed that with the above corrections, we are able to reach the expected output in fewer iterations. I've decreased the iterations to 1000 and I've bumped the learning rate alpha up to 0.75. Changing these two things we now get:

[[0 0]] : [[ 0.03029435]]
[[0 1]] : [[ 0.95397528]]
[[1 0]] : [[ 0.95371525]]
[[1 1]] : [[ 0.04796917]]

XOR Neural Network Converges to 0.5

1 Answers