33
votes

What does number of hidden layers in a multilayer perceptron neural network do to the way neural network behaves? Same question for number of nodes in hidden layers?

Let's say I want to use a neural network for hand written character recognition. In this case I put pixel colour intensity values as input nodes, and character classes as output nodes.

How would I choose number of hidden layers and nodes to solve such problem?

5
Just to make sure where to start, you know what you need a hidden layer for? By the way, I do not think that you can get a perfect answer for this questionTim
From what I understand hidden layers generally allows more complex relationships.. I am aware that there might be no perfect answer, but what should I look for when deciding on number of layers/nodes?gintas
You should start by understanding why you even need hidden layers (XOR).Tim
Possible duplicates: [What is the criteria for choosing number of hidden layers and nodes in hidden layer?][1] [Estimating the number of neurons and number of layers of an artificial neural network][2] [1]: stackoverflow.com/questions/10565868/… [2]: stackoverflow.com/questions/3345079/…eric-haibin-lin

5 Answers

19
votes

Note: this answer was correct at the time it was made, but has since become outdated.


It is rare to have more than two hidden layers in a neural network. The number of layers will usually not be a parameter of your network you will worry much about.

Although multi-layer neural networks with many layers can represent deep circuits, training deep networks has always been seen as somewhat of a challenge. Until very recently, empirical studies often found that deep networks generally performed no better, and often worse, than neural networks with one or two hidden layers.

Bengio, Y. & LeCun, Y., 2007. Scaling learning algorithms towards AI. Large-Scale Kernel Machines, (1), pp.1-41.

The cited paper is a good reference for learning about the effect of network depth, recent progress in teaching deep networks, and deep learning in general.

2
votes

All the above answers are of course correct but just to add some more ideas: Some general rules are the following based on this paper: 'Approximating Number of Hidden layer neurons in Multiple Hidden Layer BPNN Architecture' by Saurabh Karsoliya


In general:

  • The number of hidden layer neurons are 2/3 (or 70% to 90%) of the size of the input layer. If this is insufficient then number of output layer neurons can be added later on.
  • The number of hidden layer neurons should be less than twice of the number of neurons in input layer.
  • The size of the hidden layer neurons is between the input layer size and the output layer size.

Keep always in mind that you need to explore and try a lot of different combinations. Also, using GridSearch you could find the "best model and parameters".

E.g. we can do a GridSearch in order to determine the "best" size of the hidden layer.

5
votes

Most of the problems I have seen were solved with 1-2 hidden layers. It is proven that MLPs with only one hidden layer are universal function approximators (Hornik et. al.). More hidden layers can make the problem easier or harder. You usually have to try different topologies. I heard that you cannot add an arbitrary number of hidden layers if you want to train your MLP with backprop because the gradient will become too small in the first layers (I have no reference for that). But there are some applications where people used up to nine layers. Maybe you are interested in a standard benchmark problem which is solved by different classifiers and MLP topologies.

8
votes

The general answer is to for picking hyperparameters is to cross-validate. Hold out some data, train the networks with different configurations, and use the one that performs best on the held out set.

3
votes

Besides the fact that cross-validation on different model configurations(no. of hidden layers OR neurons per layer) will lead you to choose better configuration.

One approach is training a model, as big and deep as possible and use dropout regularization to turn off some neurons and reduce overfitting.

the reference to this approach can be seen in this paper. https://www.cs.toronto.edu/~hinton/absps/JMLRdropout.pdf