I've implemented the following neural network to solve the XOR problem in Python. My neural network consists of an input layer of 2 neurons, 1 hidden layer of 2 neurons and an output layer of 1 neuron. I am using the Sigmoid function as the activation function for the hidden layer and the linear (identity) function as the activation function for the output layer:
import numpy as np
def sigmoid(z):
return 1/(1+np.exp(-z))
def s_prime(z):
return np.multiply(sigmoid(z), sigmoid(1.0-z))
def init_weights(layers, epsilon):
weights = []
for i in range(len(layers)-1):
w = np.random.rand(layers[i+1], layers[i]+1)
w = w * 2*epsilon - epsilon
return weights
def fit(X, Y, w, predict=False, x=None):
w_grad = ([np.mat(np.zeros(np.shape(w[i])))
for i in range(len(w))])
for i in range(len(X)):
x = x if predict else X[0]
y = Y[0,i]
# forward propagate
a = x
a_s = []
for j in range(len(w)):
a = np.mat(np.append(1, a)).T
z = w[j] * a
a = sigmoid(z)
if predict: return a
# backpropagate
delta = a - y.T
w_grad[-1] += delta * a_s[-1].T
for j in reversed(range(1, len(w))):
delta = np.multiply(w[j].T*delta, s_prime(a_s[j]))
w_grad[j-1] += (delta[1:] * a_s[j-1].T)
return [w_grad[i]/len(X) for i in range(len(w))]
def predict(x):
return fit(X, Y, w, True, x)
X = np.mat([[0,0],
Y = np.mat([0,1,1,0])
layers = [2,2,1]
epochs = 10000
alpha = 0.5
w = init_weights(layers, 1)
for i in range(epochs):
w_grad = fit(X, Y, w)
print w_grad
for j in range(len(w)):
w[j] -= alpha * w_grad[j]
for i in range(len(X)):
x = X[i]
guess = predict(x)
print x, ":", guess
The backpropagation seems to all be correct; the only issue that comes to mind would be some problem with my implementation of the bias units. Either way, all predications for each input converge to approximately 0.5 each time I run the code. I've scoured the code and can't seem to find what's wrong. Can anyone point what's wrong with my implementation? I appreciate any feedback.
If for any reason it might help, here's the kind of output I'm getting:
[[0 0]] : [[ 0.5]]
[[0 1]] : [[ 0.49483673]]
[[1 0]] : [[ 0.52006739]]
[[1 1]] : [[ 0.51610963]]
g'(z) = a*(1-a), g means sigmoid function, a = sigmoid(z)
, and you passa_s[j]
, so yours_prime()
should bereturn np.multiply(z, 1.0-z)
instead ofreturn np.multiply(sigmoid(z), sigmoid(1.0-z))
. – Belter