Tensorflow 2.0 Combine CNN + LSTM

Question

How can you add an LSTM Layer after (flattened) conv2d Layer in Tensorflow 2.0 / Keras? My Training input data has the following shape (size, sequence_length, height, width, channels). For a convolutional layer, I can only process one image a a time, for the LSTM Layer I need a sequence of features. Is there a way to reshape your data before the LSTM Layer, so you can combine both?

I think the TimeDistributed layer is meant for this, but when I use it I get a memory leak and OOM exception is thrown after a few iterations. Did you figure this out? — user3496060

Prasad Prasad · Accepted Answer · 2019-09-19T06:12:58

From an overview of shape you have provided which is (size, sequence_length, height, width, channels), it appears that you have sequences of images for each label. For this purpose, we usually make use of Conv3D. I am enclosing a sample code below:

import tensorflow as tf

SIZE = 64
SEQUENCE_LENGTH = 50
HEIGHT = 128
WIDTH = 128
CHANNELS = 3

data = tf.random.normal((SIZE, SEQUENCE_LENGTH, HEIGHT, WIDTH, CHANNELS))

input = tf.keras.layers.Input((SEQUENCE_LENGTH, HEIGHT, WIDTH, CHANNELS))
hidden = tf.keras.layers.Conv3D(32, (3, 3, 3))(input)
hidden = tf.keras.layers.Reshape((-1, 32))(hidden)
hidden = tf.keras.layers.LSTM(200)(hidden)

model = tf.keras.models.Model(inputs=input, outputs=hidden)
model.summary()

Output:

Model: "model"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         [(None, 50, 128, 128, 3)] 0         
_________________________________________________________________
conv3d (Conv3D)              (None, 48, 126, 126, 32)  2624      
_________________________________________________________________
reshape (Reshape)            (None, None, 32)          0         
_________________________________________________________________
lstm (LSTM)                  (None, 200)               186400    
=================================================================
Total params: 189,024
Trainable params: 189,024
Non-trainable params: 0

If you still wanted to make use of Conv2D which is not recommended in your case, you will have to do something like shown below. Basically, you are appending the sequence of images across the height dimension, which will make you to loose temporal dimensions.

import tensorflow as tf

SIZE = 64
SEQUENCE_LENGTH = 50
HEIGHT = 128
WIDTH = 128
CHANNELS = 3

data = tf.random.normal((SIZE, SEQUENCE_LENGTH, HEIGHT, WIDTH, CHANNELS))

input = tf.keras.layers.Input((SEQUENCE_LENGTH, HEIGHT, WIDTH, CHANNELS))
hidden = tf.keras.layers.Reshape((SEQUENCE_LENGTH * HEIGHT, WIDTH, CHANNELS))(input)
hidden = tf.keras.layers.Conv2D(32, (3, 3))(hidden)
hidden = tf.keras.layers.Reshape((-1, 32))(hidden)
hidden = tf.keras.layers.LSTM(200)(hidden)

model = tf.keras.models.Model(inputs=input, outputs=hidden)
model.summary()

Output:

Model: "model"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         [(None, 50, 128, 128, 3)] 0         
_________________________________________________________________
reshape (Reshape)            (None, 6400, 128, 3)      0         
_________________________________________________________________
conv2d (Conv2D)              (None, 6398, 126, 32)     896       
_________________________________________________________________
reshape_1 (Reshape)          (None, None, 32)          0         
_________________________________________________________________
lstm (LSTM)                  (None, 200)               186400    
=================================================================
Total params: 187,296
Trainable params: 187,296
Non-trainable params: 0
_________________________________________________________________

Tensorflow 2.0 Combine CNN + LSTM

1 Answers