I use the fully connected neural network for image recognition "mnist".
My network has 784 input neurons, one hidden layer of neurons consists of 1569 neurons, an output layer consists of 10 ones.
I have two questions:
I use sigmoid and formula for error error = output * (1 - output) * (target - output). The problem is that if the output neuron is 1, and the required value is 0, that error = 0, but it's wrong, is n't?
Is it right to use sigmoid if weighted sum of neurons in the hidden layer becomes too large as the result is 1? What values to initialize the weights?