A fairly easy one, but just getting crazy with it right now.
When applying dropout to regularize my neural network, where should it be applied?
For the example let's imagine 2 convolutional layers followed by 1 fully connected layer. "A2" are the activations of the second conv layer. Should I apply dropout to those activations, or should I apply it to the weights of the following fully connected layer? Or it doesn't really matter?
My intuition tells me that the right thing is to apply dropout on the weights of the fully connected layer and not in the activations of the second conv layer, but I have seen the opposite in many places.
I have seen two similar questions but none of them with a satisfying answer.