Correct usage of keras SpatialDropout2D inside TimeDistributed layer - CNN LSTM network

Question

I have a burning issue on applying same dropout mask for all of the timesteps within a time series sample so that LSTM layer sees same inputs in one forward pass. I read multiple articles but did not find a solution to this. Does the following implementation support this? Or this randomly drops different feature maps in each timestep?

dim = (420,48,48,1) # grayscale images of size 48x48
inputShape = (dim)
Input_words = Input(shape=inputShape, name='input_vid')
x = TimeDistributed(Conv2D(filters=50, kernel_size=(8,8), padding='same', activation='relu'))(Input_words)
x = TimeDistributed(MaxPooling2D(pool_size=(2,2)))(x)
x = TimeDistributed(SpatialDropout2D(0.2))(x)
x = TimeDistributed(BatchNormalization())(x)
x = TimeDistributed(Flatten())(x)
x = LSTM(200, dropout=0.2, recurrent_dropout=0.2)(x)
out = Dense(5,activation='softmax')(x)
model = Model(inputs=Input_words, outputs=[out])
opt = Adam(lr=1e-3, decay=1e-3 / 200)
model.compile(loss = 'categorical_crossentropy', optimizer=opt,metrics = ['accuracy'])

If not what would a good solution for this on keras? Can I use Dropout with noise_shape to solve my problem?

Marco Cerliani Marco Cerliani · Accepted Answer · 2021-03-10T10:14:47

you can simply test all the possibilities on your own...

we generate one sample of shape (1, n_frame, H, W, n_channel) and visualize the impact of different dropout strategies:

inputShape = (100,8,8,1) # frames of 100 grayscale images of size 8x8 
X = np.random.uniform(1,2, (1,)+inputShape).astype('float32') # generate 1 sample

layer = Dropout(0.4, seed=0)
d = layer(X, training=True).numpy()

layer = Dropout(0.4, seed=0, noise_shape=(X.shape[0],1,X.shape[2],X.shape[3],X.shape[4]))
d1d = layer(X, training=True).numpy()

layer = TimeDistributed(SpatialDropout2D(0.4, seed=0))
tsd2d = layer(X, training=True).numpy()

layer = SpatialDropout3D(0.4, seed=0)
# the same as:
# layer = Dropout(0.4, seed=0, noise_shape=(X.shape[0],1,1,1,X.shape[4]))
sd3d = layer(X, training=True).numpy()

results from Dropout:

plt.figure(figsize=(15,12))

for i,f_map in enumerate(d[0]):
    
    if i == 12:
        break
    
    plt.subplot(3,4, i+1)
    plt.imshow(np.squeeze(f_map>0, -1), vmin=0, vmax=1)
    plt.title(f"frame {i}")

results from Dropout with noise_shape:

plt.figure(figsize=(15,12))

for i,f_map in enumerate(d1d[0]):
    
    if i == 12:
        break
    
    plt.subplot(3,4, i+1)
    plt.imshow(np.squeeze(f_map>0, -1), vmin=0, vmax=1)
    plt.title(f"frame {i}")

results from TimeDistributed plus SpatialDropout2D

plt.figure(figsize=(15,12))

for i,f_map in enumerate(tsd2d[0]):
    
    if i == 12:
        break
    
    plt.subplot(3,4, i+1)
    plt.imshow(np.squeeze(f_map>0, -1), vmin=0, vmax=1)
    plt.title(f"frame {i}")

results from SpatialDropout3D:

plt.figure(figsize=(15,12))

for i,f_map in enumerate(sd3d[0]):
    
    if i == 12:
        break
    
    plt.subplot(3,4, i+1)
    plt.imshow(np.squeeze(f_map>0, -1), vmin=0, vmax=1)
    plt.title(f"frame {i}")

CONCLUSIONS

the simple Dropout random drops pixels in each frame without a rule
Dropout with noise_shape random drops pixels in each frame always at the same position
TimeDistributed plus SpatialDropout2D random drops entire frames randomly
SpatialDropout3D drops all the frames in random channels

Correct usage of keras SpatialDropout2D inside TimeDistributed layer - CNN LSTM network

1 Answers