I am currently building a CLDNN (Convolutional, LSTM, Deep Neural Network) model for raw signal classification.
Since the number of trainable parameters are easily over millions, I thought dropout would help prevent overfitting.
My question is also applicable to other networks with multiple models stacked.
If I have the network structured as
input -> convolution -> LSTM -> DNN -> output
Do I have to put a dropout after each layer or only right before the output?
input -> convolution -> dropout -> LSTM -> dropout -> DNN -> dropout -> output
or
input -> convolution -> LSTM -> DNN -> dropout -> output
So far, I've only seen dropout applied to convNets, but I don't see why it should be only restricted to convNets. Do other networks, such as LSTM and DNN, also use dropout to prevent overfitting?