5
votes

I have this multilayer network with ReLU hidden layer activations and Sigmoid output layer activations. I want to implement dropout (where each neuron has a chance to just output zero during training).

I was thinking I could just introduce this noise as part of the ReLU activation routine during training and be done with it, but I wasn't sure if, in principle, dropout extends to the visible/output layer or not.


(In my mind, dropout eliminates over-fitting because it effectively makes the network an average of many smaller networks. I'm just not sure about the output layer)

1

1 Answers

4
votes

Yes, you are right - you should not apply dropout to output layer. Intuitively - introduction of such noise makes the output of your network pretty likely independent of the structure of your network. No matter what kind of computations were made in hidden layers - with some probability output might be independent of them. This is exactly opposite to the philosophy of a modeling.