1
votes

I would like to understand the ConvLSTM2D Keras layer a bit better.

Does it execute an 2D convolution on a 2D input (image) and then average/ flatten its ouptut and feed that into a LSTM module? But I guess it is basically an LSTM cell, where the matrix multiplications are replaced with convolution operations. Is that correct?

Thank you

1

1 Answers

1
votes

Yes, you are right with the concept of CONVLSTM2D.
CONVLSTM2D architecture combines gating of LSTM with 2D convolutions.

As you have mentioned, CONVLSTM layers will do a similar task to LSTM but instead of matrix multiplications, it does convolution operations and retains the input dimensions.

Another different approach would be that the images pass through the convolution layer and the result will be a flattened 1D array and this will be the input to the LSTM layers with a set of features over time.

Input of Kera's CONVLSTM layer: is a 5D tensor with shape

(samples, time, channels, rows, cols) if it is channels first.
(samples, time, rows, cols, channels) if it is channels last.

Output of a CONVLSTM layer:

If return_sequences = True then it is a 5D tensor with shape

(samples, time, filters, rows, cols)  

If return_sequences = False then it is a 4D tensor with shape.

(samples, filters, rows, cols)  

You can refer to this paper from where the implementation of CONVLSTM is done.