It is common to use a dropout rate of 0.5
as a default which I also use in my fully-connected network. This advise follows the recommendations from the original Dropout paper (Hinton at al).
My network consists of fully-connected layers of size
[1000, 500, 100, 10, 100, 500, 1000, 20]
.
I do not apply dropout to the last layer. But I do apply it to the bottle neck layer of size 10. This does not seem reasonable given that dropout = 0.5
. I guess to much information gets lost. Is there a rule of thumb how to treat bottle neck layers when using dropout? Is it better to increase the size of the bottle neck or decrease dropout rate?