5
votes

I have a question about Keras function Dropout with the argument of noise_shape.

Question 1:

What's the meaning of if your inputs have shape (batch_size, timesteps, features) and you want the dropout mask to be the same for all timesteps, you can use noise_shape=(batch_size, 1, features)?, and what's the benefit of adding this argument?

Does it mean the number of neurons that will be dropped out is same along time step? which means at every timestep t, there would be n neurons dropped?

Question 2: Do I have to include 'batch_size' in noise_shape when creating models? --> see the following example.

Suppose I have a multivariate time series data in the shape of (10000, 1, 100, 2) --> (number of data, channel, timestep, number of features).

Then I create batches with batch size of 64 --> (64, 1, 100, 2)

If I want to create a CNN model with drop out, I use Keras functional API:

inp = Input([1, 100, 2])
conv1 = Conv2D(64, kernel_size=(11,2), strides(1,1),data_format='channels_first')(inp)
max1 = MaxPooling2D((2,1))(conv1)
max1_shape = max1._keras_shape
drop1 = Dropout((0.1, noise_shape=[**?**, max1._keras_shape[1], 1, 1]))

Because the output shape of layer max1 should be (None, 64, 50, 1), and I cannot assign None to the question mark (which corresponds to batch_size)

I wonder how should I cope with this? Should I just use (64, 1, 1) as noise_shape? or should I define a variable called 'batch_size', then pass it to this argument like (batch_size, 64, 1, 1)?

1

1 Answers

5
votes

Question 1:

It's kind of like a numpy broadcast I think.

Imagine you have 2 batches witch 3 timesteps and 4 features (It's a small example to make it easier to show it): (2, 3, 4)

If you use a noise shape of (2, 1, 4), each batch will have its own dropout mask that will be applied to all timesteps.

So let's say these are the weights of shape (2, 3, 4):

array([[[  1,   2,   3,   4],
        [  5,   6,   7,   8],
        [ 10,  11,  12,  13]],

       [[ 14,  15,  16,  17],
        [ 18,  19,  20,  21],
        [ 22,  23,  24,  25]]])

And this would be the random noise_shape (2, 1, 4) (1 is like keep and 0 is like turn it off):

array([[[ 1,  1,  1,  0]],

       [[ 1,  0,  0,  1]]])

So you have these two noise shapes (For every batch one). Then it will be kinda broadcast along the timestep axis.

array([[[ 1,  1,  1,  0],
        [ 1,  1,  1,  0],
        [ 1,  1,  1,  0]],

       [[ 1,  0,  0,  1],
        [ 1,  0,  0,  1],
        [ 1,  0,  0,  1]]])

and applied to the weights:

array([[[  1,   2,   3,   0],
        [  5,   6,   7,   0],
        [ 10,  11,  12,   0]],

       [[ 14,   0,   0,  17],
        [ 18,   0,   0,  21],
        [ 22,   0,   0,  25]]])

Question 2:

I'm not sure about your second question to be honest.

Edit: What you can do is take the first dimension of the shape of the input, which should be the batch_size, as proposed in this github issue:

import tensorflow as tf

...

batch_size = tf.shape(inp)[0]
drop1 = Dropout((0.1, noise_shape=[batch_size, max1._keras_shape[1], 1, 1]))

As you can see I'm on tensorflow backend. Dunno if theano also has these problems and if it does you might just be able to solve it with the theano shape equivalent.