0
votes

I use the fully connected neural network for image recognition "mnist".

My network has 784 input neurons, one hidden layer of neurons consists of 1569 neurons, an output layer consists of 10 ones.

I have two questions:

  1. I use sigmoid and formula for error error = output * (1 - output) * (target - output). The problem is that if the output neuron is 1, and the required value is 0, that error = 0, but it's wrong, is n't?

  2. Is it right to use sigmoid if weighted sum of neurons in the hidden layer becomes too large as the result is 1? What values ​​to initialize the weights?

2

2 Answers

0
votes

Normally, I experienced good results with initializing weights in a random range of something like 0.01 to 0.5.

To 1: As far as I know the local error for output layer normally is expectedOutput - currentOutput, because this simplified statement never fails and has enough accuracy. After this, for fully-connected layers, you use backpropagation to adjust weights of hidden layers. See Yann Lecuns work for efficient: Efficient Backprop

To 2: To prevent to have an input of 1 to your output layer because the sum of the hiddens layer is too big and sigmoid delivers 1 for a huge amount of epochs you could do a simple, easy, efficient hack: always divide the input of each output layers neuron with the amount of neurons in the parent (hidden) layer, therefore your input is always in the interval [-1.0, 1.0] before the sigmoid transfer function is used. In most cases this trick reduces the amount of epochs needed to train the network drastically.

0
votes

Ok Its suggested you initialize your weights randomly. Typically its suggested you choose initial weights of a neural network from the range ((−1/√d),(1√d)), where d is the number of inputs to a given neuron.

And error is always Actual output-Current Output. The formula you mentioned has to do with one of the steps of BPN algorithm in the hidden layer weight adjustment. I would suggest to reduce the number of hidden nodes in your model. Its a general advice to have the number of hidden nodes less then the number of inputs.

And sigmoid function is fine for your task.