I have this multilayer network with ReLU hidden layer activations and Sigmoid output layer activations. I want to implement dropout (where each neuron has a chance to just output zero during training).
I was thinking I could just introduce this noise as part of the ReLU activation routine during training and be done with it, but I wasn't sure if, in principle, dropout extends to the visible/output layer or not.
(In my mind, dropout eliminates over-fitting because it effectively makes the network an average of many smaller networks. I'm just not sure about the output layer)