FeedForward Neural Network: Using a single Network with multiple output neurons for many classes

Question

I am currently working on the MNIST handwritten digits classification.

I built a single FeedForward network with the following structure:

Inputs: 28x28 = 784 inputs
Hidden Layers: A single hidden layer with 1000 neurons
Output Layer: 10 neurons

All the neurons have Sigmoid activation function.

The reported class is the one corresponding to the output neuron with the maximum output value

My questions are:

Is it a good approach to create a single network with multiple outputs? I.e. should I instead create a separated network per each digit?

I ask about it, as currently the network is stuck on ~75% success rate. As the actually "10 classifiers" share the same neurons of the hidden layer - I am not sure - does it reduce the network learning capability?

** EDIT: **

As other people may take reference of this thread, I want to be honest and update that the 75% success rate was after ~1500 epochs. Now I'm after nearly 3000 epochs and it's on ~85% of success rate - so it works pretty well

You should use a softmax layer for the output, not a sigmoid. — Andreas Mueller
What is "softmax" layer? Do you mean linear when saying "softmax"? — SomethingSomething
No, I mean softmax when I say softmax: en.wikipedia.org/wiki/Softmax_function — Andreas Mueller
look at the link I posted. You should use neither a liner, nor a sigmoid, you should use a soft-max, which is what is used for multi-class classification. — Andreas Mueller
@AndreasMueller If he/she is not doing multi-class classification, is there any benefit to using softmax over just selecting the greatest sigmoid output value? — bogatron

bogatron bogatron · Accepted Answer · 2015-06-05T18:01:06

In short, yes it is a good approach to use a single network with multiple outputs. The first hidden layer describes decision boundaries (hyperplanes) in your feature space and multiple digits can benefit from some of the same hyperplanes. While you could create one ANN for each digit, that kind of one-vs-rest approach doesn't necessarily yield better results and requires training 10 times as many ANNs (each of which might be trained multiple times to try to avoid local minima). If you had hundreds or thousands of digits, then it might make more sense.

1000 neurons in a single hidden layer seems like a lot for this problem. I think you would probably achieve better results for handwritten digits by reducing that number and adding a second hidden layer. That would let you model more complex combinations boundaries in the input feature space. For example, perhaps try something like a 784x20x20x10 network.

If you do experiment with different network structures, it is usually better to start with a smaller number of layers & neurons and then increase complexity. That not only reduces training time but also avoids overfitting the data right away (you didn't mention if your accuracy was for a training or validation set).

FeedForward Neural Network: Using a single Network with multiple output neurons for many classes

2 Answers