Where to use dilated convolution in autoencoder for temporal data?

Question

I am trying to build and encoder-decoder model for time series data with 1D convolution in Keras. Consider this simple model:

inputs = Input(shape = (timesteps, input_dim))
t = Conv1D(16, kernel_size=3, padding='same')(inputs)
encoded = Conv1D(16, kernel_size=2, strides=2)(t)

t = UpSampling1D(2)(encoded)
t = Conv1D(16, kernel_size=3, padding='same')(inputs)
decoded = Conv1D(1, kernel_size=3, padding='same')(t)

model = Model(inputs, decoded)

My questions are:

Where to use dilation (dilation_rate=2)? In the encoder only or in both in order to maximize the receptive field?
What should I use as a latent representation? Fully connected layer, lower dimensional image (as above), pooling or fewer filters?

In order to maximize the receptive field and be able to compress all information from the series into a small datastructure — Márton György

Tom Nijhof Tom Nijhof · Accepted Answer · 2018-08-16T06:30:25

This answer is for the other people who came here via google:

Dilation VS stride: Stride makes the response smaller. So you only use is once. Dilation makes the kernel bigger by adding zeros in between. It will give the same effect as strides but without making the response smaller. Keras/tf.keras example:

x = input_img

x = Conv2D(16, (3, 3), padding='valid')(x)
x = Conv2D(16, (3, 3), strides=2, padding='valid')(x)
x = Conv2D(16, (3, 3), padding='valid')(x)
x = Conv2D(16, (3, 3), strides=2, padding='valid')(x)
x = Conv2D(16, (3, 3), padding='valid')(x)

encoded = Conv2D(num_featers, (2, 2), padding='valid')(x)

Is the same as:

x = Conv2D(16, (3, 3), padding='valid')(x)
x = Conv2D(16, (3, 3), padding='valid')(x)
x = Conv2D(16, (3, 3), dilation_rate=2, padding='valid')(x)
x = Conv2D(16, (3, 3), dilation_rate=2, padding='valid')(x)
x = Conv2D(16, (3, 3), dilation_rate=4, padding='valid')(x)

encoded = Conv2D(num_featers, (2, 2), dilation_rate=4, padding='valid')(x)

If you replace the strides in an auto-encoder with dilation_rate like this it will work. (Conv2dTranspose also has dilation_rate but that does not work: https://github.com/keras-team/keras/issues/8159. A work around is training your network with strides (encoder) and upscaling2d (decoder). Load those weights in to a simpler encoder with dilation when your gone use it.)

About the pooling: pooling is not needed in this case, but it can help with remove location bias. Other method is translations augmentation to get the same result. Depending on you problem you want this or not.

fully connected: are completely out of style. Just use a convolution layer with the size to connect everything. This is exactly the same but will make it possible to have a bigger input.

Fewer or more filter: I never know. Visualize your filter and/or filter response. If you see filters what are very similar, you used to many filter. Or did not stimulate difference in kind enough (dropout and data-augmentation could help with it).

Where to use dilated convolution in autoencoder for temporal data?

1 Answers