How can you add an LSTM Layer after (flattened) conv2d Layer in Tensorflow 2.0 / Keras? My Training input data has the following shape (size, sequence_length, height, width, channels). For a convolutional layer, I can only process one image a a time, for the LSTM Layer I need a sequence of features. Is there a way to reshape your data before the LSTM Layer, so you can combine both?
5
votes
1 Answers
3
votes
From an overview of shape you have provided which is (size, sequence_length, height, width, channels)
, it appears that you have sequences of images for each label. For this purpose, we usually make use of Conv3D
. I am enclosing a sample code below:
import tensorflow as tf
SIZE = 64
SEQUENCE_LENGTH = 50
HEIGHT = 128
WIDTH = 128
CHANNELS = 3
data = tf.random.normal((SIZE, SEQUENCE_LENGTH, HEIGHT, WIDTH, CHANNELS))
input = tf.keras.layers.Input((SEQUENCE_LENGTH, HEIGHT, WIDTH, CHANNELS))
hidden = tf.keras.layers.Conv3D(32, (3, 3, 3))(input)
hidden = tf.keras.layers.Reshape((-1, 32))(hidden)
hidden = tf.keras.layers.LSTM(200)(hidden)
model = tf.keras.models.Model(inputs=input, outputs=hidden)
model.summary()
Output:
Model: "model"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) [(None, 50, 128, 128, 3)] 0
_________________________________________________________________
conv3d (Conv3D) (None, 48, 126, 126, 32) 2624
_________________________________________________________________
reshape (Reshape) (None, None, 32) 0
_________________________________________________________________
lstm (LSTM) (None, 200) 186400
=================================================================
Total params: 189,024
Trainable params: 189,024
Non-trainable params: 0
If you still wanted to make use of Conv2D
which is not recommended in your case, you will have to do something like shown below. Basically, you are appending the sequence of images across the height dimension, which will make you to loose temporal dimensions.
import tensorflow as tf
SIZE = 64
SEQUENCE_LENGTH = 50
HEIGHT = 128
WIDTH = 128
CHANNELS = 3
data = tf.random.normal((SIZE, SEQUENCE_LENGTH, HEIGHT, WIDTH, CHANNELS))
input = tf.keras.layers.Input((SEQUENCE_LENGTH, HEIGHT, WIDTH, CHANNELS))
hidden = tf.keras.layers.Reshape((SEQUENCE_LENGTH * HEIGHT, WIDTH, CHANNELS))(input)
hidden = tf.keras.layers.Conv2D(32, (3, 3))(hidden)
hidden = tf.keras.layers.Reshape((-1, 32))(hidden)
hidden = tf.keras.layers.LSTM(200)(hidden)
model = tf.keras.models.Model(inputs=input, outputs=hidden)
model.summary()
Output:
Model: "model"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) [(None, 50, 128, 128, 3)] 0
_________________________________________________________________
reshape (Reshape) (None, 6400, 128, 3) 0
_________________________________________________________________
conv2d (Conv2D) (None, 6398, 126, 32) 896
_________________________________________________________________
reshape_1 (Reshape) (None, None, 32) 0
_________________________________________________________________
lstm (LSTM) (None, 200) 186400
=================================================================
Total params: 187,296
Trainable params: 187,296
Non-trainable params: 0
_________________________________________________________________