Python: How to solve the low accuracy of a Variational Autoencoder Convolutional Model developed to predict a sequence of future frames?

Question

I am currently developing a precipitation cell displacement prediction model. I have taken as a model to implement a variational convolutional autoencoder (I attach the model code). In summary, the model receives a sequence of 5 images, and must predict the following 5. The architecture consists of five convolutive layers in the encoder and decoder (Conv Transpose), which were made to greatly reduce the image size and learn spatial details. Between the encoder and decoder carries ConvLSTM layers to learn the temporal sequences. I am working it in Python, with tensorflow and keras.

The data consists of "images" of the rain radar of 400x400 pixels, with the dark background and the rain cells in the center of the frame. The time between frame and frame is 5 minutes, radar configuration. After further processing, the training data is scaled between 0 and 1, and in numpy format (for working with matrices). My training data finally has the form [number of sequences, number of images per sequence, height, width, channel = 1].

Sequence of precipitation Images

The sequences are made up of: 5 inputs and 5 targets, of which there are 2111 radar image sequences (I know I don't have much data :( for training) and 80% have been taken for training and 20% for the validation. To detail:

train_input = [1688, 5, 400, 400, 1]
train_target = [1688, 5, 400, 400, 1]
valid_input = [423, 5, 400, 400, 1]
valid_target = [423, 5, 400, 400, 1]

The problem is that I have trained my model, and I have obtained the value of accuracy very poor. around 8e-05. I've been training 400 epochs, and the value remains or surrounds the mentioned value. Also when I take a sequence of 5 images to predict the next 5, I get very bad results (not even a small formation of "spots" in the center, which represents the rain). I have already tried to reduce the number of layers in the encoder and decoder, in addition to the optimizer [adam, nadam, adadelta], I have also tried using the activation function [relu, elu]. I have not obtained any profitable results, in the prediction images and the accuracy value.

Loss and Accuracy during Training

I am a beginner in Deep Learning topics, I like it a lot, but I can't find a solution to this problem. I suspect that my model architecture is not right. In addition to that I should look for a better optimizer or activation function to improve the accuracy value and predicted images. As a last solution, perhaps cut the image of 400x400 pixels to a central area, where the precipitation is. Although I would lose training data.

I appreciate you can help me solve this problem, maybe giving me some ideas to organize my architecture model, or ideas to organize de train data. Best regards.

    # Encoder

    seq = Sequential()

    seq.add(Input(shape=(5, 400, 400,1)))

    seq.add(Conv3D(filters=32, kernel_size=(11, 11, 5), strides=3,
                   padding='same', activation ='relu'))
    seq.add(BatchNormalization())

    seq.add(Conv3D(filters=32, kernel_size=(9, 9, 32), strides=2,
                   padding='same', activation ='relu'))
    seq.add(BatchNormalization())

    seq.add(Conv3D(filters=64, kernel_size=(7, 7, 32), strides=2,
                   padding='same', activation ='relu'))
    seq.add(BatchNormalization())

    seq.add(Conv3D(filters=64, kernel_size=(5, 5, 64), strides=2,
                   padding='same', activation ='relu'))
    seq.add(BatchNormalization())

    seq.add(Conv3D(filters=32, kernel_size=(3, 3, 64), strides=3,
                   padding='same', activation ='relu'))
    seq.add(BatchNormalization())

    # ConvLSTM Layers

    seq.add(ConvLSTM2D(filters=40, kernel_size=(3, 3),
    input_shape=(None, 6, 6, 32),
    padding='same', return_sequences=True))
    seq.add(BatchNormalization())

    seq.add(ConvLSTM2D(filters=40, kernel_size=(3, 3),
    padding='same', return_sequences=True))
    seq.add(BatchNormalization())

    seq.add(ConvLSTM2D(filters=40, kernel_size=(3, 3),
    padding='same', return_sequences=True))
    seq.add(BatchNormalization())

    seq.add(ConvLSTM2D(filters=40, kernel_size=(3, 3),
    padding='same', return_sequences=True))
    seq.add(BatchNormalization())

    seq.add(Conv3D(filters=32, kernel_size=(3, 3, 3),
    activation='relu',
    padding='same', data_format='channels_last'))

    # Decoder

    seq.add(Conv3DTranspose(filters=32, kernel_size=(3, 3, 64), strides=(2,3,3),
                                      input_shape=(1, 6, 6, 32),
                                      padding='same', activation ='relu'))
    seq.add(BatchNormalization())

    seq.add(Conv3DTranspose(filters=64, kernel_size=(5, 5, 64), strides=(3,2,2),
                                      padding='same', activation ='relu'))
    seq.add(BatchNormalization())

    seq.add(Conv3DTranspose(filters=64, kernel_size=(7, 7, 32), strides=(1,2,2),
                                      padding='same', activation ='relu'))
    seq.add(BatchNormalization())

    seq.add(Conv3DTranspose(filters=32, kernel_size=(9, 9, 32), strides=(1,2,2),
                                      padding='same', activation ='relu'))
    seq.add(BatchNormalization())

    seq.add(Conv3DTranspose(filters=1, kernel_size=(11, 11, 5), strides=(1,3,3),
                                      padding='same', activation ='relu'))
    seq.add(BatchNormalization())

    seq.add(Cropping3D(cropping = (0,16,16)))

    seq.add(Cropping3D(cropping = ((0,-5),(0,0),(0,0))))

    seq.compile(loss='mean_squared_error', optimizer='adadelta', metrics=['accuracy'])

Autoencoders perform regression, so it makes no sense to look at accuracy, which is a classification metric. — Dr. Snoopy
Thank you for your response Matias, so... Do you recommend me to use another metric, in this case related to the regression ?. Best regards — Pool Nolasco Ramírez

Timbus Calin Timbus Calin · Accepted Answer · 2020-02-27T14:15:26

The metric you would like to use in case of a regression problem is mse(mean_squared_error) or mae (mean_absolute_error).

You may want to use mse in the beginning as it penalises greater errors more than the mae.

You just need to change a little bit the code where you compile your model.

seq.compile(loss='mean_squared_error', optimizer='adadelta', metrics=['mse','mae'])

In this way you can monitor both mse and mae metric during the training.

Python: How to solve the low accuracy of a Variational Autoencoder Convolutional Model developed to predict a sequence of future frames?

1 Answers