I created an Octave script for training a neural network with 1 hidden layer using backpropagation but it can not seem to fit an XOR function.
xInput 4x2 matrix[0 0; 0 1; 1 0; 1 1]yOutput 4x1 matrix[0; 1; 1; 0]thetaHidden / output layer weightszWeighted sumsaActivation function applied to weighted sumsmSample count (4here)
My weights are initialized as follows
epsilon_init = 0.12;
theta1 = rand(hiddenCount, inputCount + 1) * 2 * epsilon_init * epsilon_init;
theta2 = rand(outputCount, hiddenCount + 1) * 2 * epsilon_init * epsilon_init;
Feed forward
a1 = x;
a1_with_bias = [ones(m, 1) a1];
z2 = a1_with_bias * theta1';
a2 = sigmoid(z2);
a2_with_bias = [ones(size(a2, 1), 1) a2];
z3 = a2_with_bias * theta2';
a3 = sigmoid(z3);
Then I compute the logistic cost function
j = -sum((y .* log(a3) + (1 - y) .* log(1 - a3))(:)) / m;
Back propagation
delta2 = (a3 - y);
gradient2 = delta2' * a2_with_bias / m;
delta1 = (delta2 * theta2(:, 2:end)) .* sigmoidGradient(z2);
gradient1 = delta1' * a1_with_bias / m;
The gradients were verified to be correct using gradient checking.
I then use these gradients to find the optimal values for theta using gradient descent, though using Octave's fminunc function yields the same results. The cost function converges to ln(2) (or 0.5 for a squared errors cost function) because the network outputs 0.5 for all four inputs no matter how many hidden units I use.
Does anyone know where my mistake is?
theta). At a guess, that could be your problem. I'll explain if so. - Neil Slaterepsilon_init = 0.12;theta1 = rand(hiddenCount, inputCount + 1) * 2 * epsilon_init * epsilon_init;theta2 = rand(outputCount, hiddenCount + 1) * 2 * epsilon_init * epsilon_init;Don't know how to format it correctly in a comment sorry about that! - Torax