4
votes

A fairly easy one, but just getting crazy with it right now.

When applying dropout to regularize my neural network, where should it be applied?

For the example let's imagine 2 convolutional layers followed by 1 fully connected layer. "A2" are the activations of the second conv layer. Should I apply dropout to those activations, or should I apply it to the weights of the following fully connected layer? Or it doesn't really matter?

My intuition tells me that the right thing is to apply dropout on the weights of the fully connected layer and not in the activations of the second conv layer, but I have seen the opposite in many places.

I have seen two similar questions but none of them with a satisfying answer.

1

1 Answers

5
votes

Both are valid. When you drop the activations it is called dropout and when you drop weights it is called dropconnect. DropConnect is a generalized version of the DropOut method. This image from the DropConnect paper explains it well.enter image description here

In case of Dropconnect if all the weights for node u3 are zero(3/4th are zero) which is same as applying a dropout on r3 node. Another difference is in the mask matrix of weights.enter image description here

Left one represents the mask matrix of dropconnect, while right one shows the effective mask matrix if dropout is applied to two consecutive layers. Notice the pattern in mask matrix of dropout. The authors show that the dropconnect beats dropout in benchmark datasets and produces state of the art results.

Since, dropconnect is the generalized version I would go with it.