Neural Network Backpropagation implementation issues

Question

I've been reading up quite a bit on Neural Networks and training them with backprogpagation, primarily this Coursera course, with additional reading from here and here. I thought I had a pretty soild grasp of the core algorithm, but my attempt to build a backpropagation trained neural net hasn't quite worked out and i'm not sure why.

The code is in C++ with no vectorisation as of yet.

I wanted to build a simple 2 input neurons, 1 hidden neuron, 1 output neuron, network to model the AND function. Just to understand how the concepts worked before moving onto a more complex example My forward propagation code worked when I hand coded in the values for the weights and biases.

float NeuralNetwork::ForwardPropagte(const float *dataInput)
{
        int number = 0; // Write the input data into the input layer
        for ( auto & node : m_Network[0])
        {
            node->input = dataInput[number++];
        }

        // For each layer in the network
        for ( auto & layer : m_Network)
        {
            // For each neuron in the layer
            for (auto & neuron : layer)
            {
                float activation;
                if (layerIndex != 0)
                {
                   neuron->input += neuron->bias;
                   activation = Sigmoid( neuron->input);
                } else {
                    activation = neuron->input;
                }

                for (auto & pair : neuron->outputNeuron)
                {
                    pair.first->input += static_cast<float>(pair.second)*activation;
                }
            }
        }

        return Sigmoid(m_Network[m_Network.size()-1][0]->input);
}

Some of these variables are fairly poorly named but basically, neuron->outputNeuron is a vector of pairs. The first being a pointer to the next neuron and the second being the weight value. neuron->input is the "z" value in the neural network equation, the sum of all the wieghts*activation + bais. Sigmoid is given by:

float NeuralNetwork::Sigmoid(float value) const
{
    return 1.0f/(1.0f + exp(-value));
}

These two appear to work as intended. After a pass over the network all the 'z' or 'neuron->input' values are reset to zero (or after backpropagation).

I then train the network following the psudo-code below. Training code is run multiple times.

for trainingExample=0 to m // m = number of training examples
   perform forward propagation to calculate hyp(x)
   calculate cost delta of last layer
         delta = y - hyp(x)
   use the delta of the output to calculate delta for all layers
   move over the network adjusting the weights based on this value
   reset network

The actual code is here:

void NeuralNetwork::TrainNetwork(const std::vector<std::pair<std::pair<float,float>,float>> & trainingData)
{
    for (int i = 0; i < 100; ++i)
    {
        for (auto & trainingSet : trainingData)
        {
            float x[2] = {trainingSet.first.first,trainingSet.first.second};
            float y      = trainingSet.second;
            float estimatedY = ForwardPropagte(x);

            m_Network[m_Network.size()-1][0]->error = estimatedY - y;
            CalculateError();
            RunBackpropagation();
            ResetActivations();
        }
    }
}

With the backpropagation function given by:

void NeuralNetwork::RunBackpropagation()
{
    for (int index = m_Network.size()-1; index >= 0; --index)
    {
        for(auto &node : m_Network[index])
        {
            // Again where the "outputNeuron" is a list of the next layer of neurons and associated weights
            for (auto &weight : node->outputNeuron)
            {
                weight.second += weight.first->error*Sigmoid(node->input);
            }
            node->bias = node->error; // I'm not sure how to adjust the bias, some of the formulas seemed to point to this. Is it correct?
        }
    }
}

and the cost calculated by:

void NeuralNetwork::CalculateError()
{
    for (int index = m_Network.size()-2; index > 0; --index)
    {
        for(auto &node : m_Network[index])
        {
            node->error = 0.0f;

            float sigmoidPrime = Sigmoid(node->input)*(1 - Sigmoid(node->input));

            for (auto &weight : node->outputNeuron)
            {
                node->error += (weight.first->error*weight.second)*sigmoidPrime;
            }
        }
    }   
}

I randomize the weights and run it on the data set:

    x = {0.0f,0.0f} y =0.0f
    x = {1.0f,0.0f} y =0.0f
    x = {0.0f,1.0f} y =0.0f
    x = {1.0f,1.0f} y =1.0f

Of course I shouldn't be training and testing with the same data set but I just wanted to get the basic backpropagation algortithm up and running. When I run this code I see the weights/biases are as follows:

Layer 0
    Bias 0.111129
    NeuronWeight 0.058659
    Bias -0.037814
    NeuronWeight -0.018420
Layer 1
    Bias 0.016230
    NeuronWeight -0.104935
Layer 2
    Bias 0.080982

The training set runs and the mean squared error of delta[outputLayer] looks somthing like:

Error: 0.156954
Error: 0.152529
Error: 0.213887
Error: 0.305257
Error: 0.359612
Error: 0.373494
Error: 0.374910
Error: 0.374995
Error: 0.375000

... remains at this value for ever...

And the final weights look like: (they always end up at roughtly this value)

Layer 0
    Bias 0.000000
    NeuronWeight 15.385233
    Bias 0.000000
    NeuronWeight 16.492933
Layer 1
    Bias 0.000000
    NeuronWeight 293.518585
Layer 2
    Bias 0.000000

I accept that this may seem like quite a roundabout way of learning neural networks and the implementation is (at the moment) very unoptimal. But can anyone spot any point where I make an invalid assumption, or either the implementation or the formula is wrong?

EDIT

Thanks for the feedback for the bias values, I stopped them being applied to the input layer and stopped passing the input layer through the sigmoid function. Additionaly my Sigmoid prime function was invalid. But the network still isn't working. I've updated the error and output above with what happens now.

why do you have so many biases? 2-1-1 network should have 5 parameters in total: 2 weights between input neurons and a hidden; 1 weight between bias in input layer and hidden neuron; 1 weight between hidden layer and output neuron; 1 weight between bias in hidden layer and output neuron. Total 5 weights. Even your code shows you so - you and up without these 2 redundant biases — lejlot
Thanks for your help with the biases, see my reply to galloguille for further explanation. I fixed this but I still have issues with the neural network always training the weights to the values w11(15.385233), w12(16.492933), w21(293.518585). All biases become zero. So obviously there is still another issue with the code. — Davors72

Guillem Cucurull Guillem Cucurull · Accepted Answer · 2015-11-17T20:47:40

As lejilot said, you have a lot of biases there. You don't need a bias in the last layer, it's an output layer and a bias must be connected to its input, but not to its output. Take a look at the following image:

In this image you can see that there is just one bias per layer, except for the last one, where there is no need of a bias.

Here you can read a very intuitive approach to neural networks. It is in Python, but it can help you to understand some concepts of neural networks better.

Neural Network Backpropagation implementation issues

2 Answers