What layers should experience “dropout” when training a Neural Network?

Question

I have this multilayer network with ReLU hidden layer activations and Sigmoid output layer activations. I want to implement dropout (where each neuron has a chance to just output zero during training).

I was thinking I could just introduce this noise as part of the ReLU activation routine during training and be done with it, but I wasn't sure if, in principle, dropout extends to the visible/output layer or not.

(In my mind, dropout eliminates over-fitting because it effectively makes the network an average of many smaller networks. I'm just not sure about the output layer)

Marcin Możejko Marcin Możejko · Accepted Answer · 2016-06-30T22:33:20

Yes, you are right - you should not apply dropout to output layer. Intuitively - introduction of such noise makes the output of your network pretty likely independent of the structure of your network. No matter what kind of computations were made in hidden layers - with some probability output might be independent of them. This is exactly opposite to the philosophy of a modeling.

What layers should experience “dropout” when training a Neural Network?

1 Answers