I've been reading up quite a bit on Neural Networks and training them with backprogpagation, primarily this Coursera course, with additional reading from here and here. I thought I had a pretty soild grasp of the core algorithm, but my attempt to build a backpropagation trained neural net hasn't quite worked out and i'm not sure why.
The code is in C++ with no vectorisation as of yet.
I wanted to build a simple 2 input neurons, 1 hidden neuron, 1 output neuron, network to model the AND function. Just to understand how the concepts worked before moving onto a more complex example My forward propagation code worked when I hand coded in the values for the weights and biases.
float NeuralNetwork::ForwardPropagte(const float *dataInput)
{
int number = 0; // Write the input data into the input layer
for ( auto & node : m_Network[0])
{
node->input = dataInput[number++];
}
// For each layer in the network
for ( auto & layer : m_Network)
{
// For each neuron in the layer
for (auto & neuron : layer)
{
float activation;
if (layerIndex != 0)
{
neuron->input += neuron->bias;
activation = Sigmoid( neuron->input);
} else {
activation = neuron->input;
}
for (auto & pair : neuron->outputNeuron)
{
pair.first->input += static_cast<float>(pair.second)*activation;
}
}
}
return Sigmoid(m_Network[m_Network.size()-1][0]->input);
}
Some of these variables are fairly poorly named but basically, neuron->outputNeuron is a vector of pairs. The first being a pointer to the next neuron and the second being the weight value. neuron->input is the "z" value in the neural network equation, the sum of all the wieghts*activation + bais. Sigmoid is given by:
float NeuralNetwork::Sigmoid(float value) const
{
return 1.0f/(1.0f + exp(-value));
}
These two appear to work as intended. After a pass over the network all the 'z' or 'neuron->input' values are reset to zero (or after backpropagation).
I then train the network following the psudo-code below. Training code is run multiple times.
for trainingExample=0 to m // m = number of training examples
perform forward propagation to calculate hyp(x)
calculate cost delta of last layer
delta = y - hyp(x)
use the delta of the output to calculate delta for all layers
move over the network adjusting the weights based on this value
reset network
The actual code is here:
void NeuralNetwork::TrainNetwork(const std::vector<std::pair<std::pair<float,float>,float>> & trainingData)
{
for (int i = 0; i < 100; ++i)
{
for (auto & trainingSet : trainingData)
{
float x[2] = {trainingSet.first.first,trainingSet.first.second};
float y = trainingSet.second;
float estimatedY = ForwardPropagte(x);
m_Network[m_Network.size()-1][0]->error = estimatedY - y;
CalculateError();
RunBackpropagation();
ResetActivations();
}
}
}
With the backpropagation function given by:
void NeuralNetwork::RunBackpropagation()
{
for (int index = m_Network.size()-1; index >= 0; --index)
{
for(auto &node : m_Network[index])
{
// Again where the "outputNeuron" is a list of the next layer of neurons and associated weights
for (auto &weight : node->outputNeuron)
{
weight.second += weight.first->error*Sigmoid(node->input);
}
node->bias = node->error; // I'm not sure how to adjust the bias, some of the formulas seemed to point to this. Is it correct?
}
}
}
and the cost calculated by:
void NeuralNetwork::CalculateError()
{
for (int index = m_Network.size()-2; index > 0; --index)
{
for(auto &node : m_Network[index])
{
node->error = 0.0f;
float sigmoidPrime = Sigmoid(node->input)*(1 - Sigmoid(node->input));
for (auto &weight : node->outputNeuron)
{
node->error += (weight.first->error*weight.second)*sigmoidPrime;
}
}
}
}
I randomize the weights and run it on the data set:
x = {0.0f,0.0f} y =0.0f
x = {1.0f,0.0f} y =0.0f
x = {0.0f,1.0f} y =0.0f
x = {1.0f,1.0f} y =1.0f
Of course I shouldn't be training and testing with the same data set but I just wanted to get the basic backpropagation algortithm up and running. When I run this code I see the weights/biases are as follows:
Layer 0
Bias 0.111129
NeuronWeight 0.058659
Bias -0.037814
NeuronWeight -0.018420
Layer 1
Bias 0.016230
NeuronWeight -0.104935
Layer 2
Bias 0.080982
The training set runs and the mean squared error of delta[outputLayer] looks somthing like:
Error: 0.156954
Error: 0.152529
Error: 0.213887
Error: 0.305257
Error: 0.359612
Error: 0.373494
Error: 0.374910
Error: 0.374995
Error: 0.375000
... remains at this value for ever...
And the final weights look like: (they always end up at roughtly this value)
Layer 0
Bias 0.000000
NeuronWeight 15.385233
Bias 0.000000
NeuronWeight 16.492933
Layer 1
Bias 0.000000
NeuronWeight 293.518585
Layer 2
Bias 0.000000
I accept that this may seem like quite a roundabout way of learning neural networks and the implementation is (at the moment) very unoptimal. But can anyone spot any point where I make an invalid assumption, or either the implementation or the formula is wrong?
EDIT
Thanks for the feedback for the bias values, I stopped them being applied to the input layer and stopped passing the input layer through the sigmoid function. Additionaly my Sigmoid prime function was invalid. But the network still isn't working. I've updated the error and output above with what happens now.