0
votes

I have implemented a convolutional neural network with batch normalization on 1D input signal. My model has a pretty good accuracy of ~80%. Here is the order of my layers: (Conv1D, Batch, ReLU, MaxPooling) repeat 6 times, Conv1D, Batch, ReLU, Dense, Softmax.

I have seen several articles saying that I should NOT use dropout on convolutional layers, but I should use batch normalization instead, so I want to experiment with my models by replacing all batch normalization layers with dropout layers to see if dropout will really make my performance worse.

My new model has the following structure: (Conv1D, Dropout, ReLU, MaxPooling) repeat 6 times, Conv1D, Dropout, ReLU, Dense, Softmax. I have tried dropout rates of 0.1, 0.2, 0.3, 0.4, 0.5. The performance of my new model is only ~25%, much worse than my original model, and even worse than predicting the dominating class (~40%).

I wonder if the huge difference in performance is actually the result from replacing batch normalization with dropout. or is it my misunderstanding of how I should use dropout.

1

1 Answers

1
votes

To get an intuition on how to use batch norm and dropout, you should first understand what these layers do:

  • Batch normalization scales and shifts your layer output with the mean and variance calculated over the batch, so that the input to the next layer is more robust against internal covariate shift
  • Dropout randomly drops elements of its input, teaching the following layers not to rely on specific features or elements, but to use all information available. This enforces the network to generalize better and is a mean to reduce overfitting

What you did is replace your normalization layers with layers that add extra noise to the information flow, which of course leads to a drastic decrease of accuracy.

My recommendation for you is to use batch norm just like in your first setup and if you want to experiment with dropout, add it after the activation function was applied to the previous layer. Usually, dropout is used to regularize dense layers which are very prone to overfit. Try this:

  1. 6 x (Conv1D, Batch, ReLU, MaxPooling)
  2. 1 x (Conv1D, Batch, ReLU)
  3. Dropout, Dense, Softmax