About LSTM understanding on Keras

Question

I was wondering how was working LSTM under Keras.

Let's take an example. I have maximum sentence length of 3 words. Example : 'how are you' I vectorize each words in a vector of len 4. So I will have a shape (3,4) Now, I want to use an lstm to do translation stuff. (Just an example)

model = Sequential()
model.add(LSTM(1, input_shape=(3,4), return_sequences=True))
model.summary()

I'm going to have an output shape of (3,1) according to Keras.

Layer (type)                 Output Shape              Param #   
=================================================================
lstm_16 (LSTM)               (None, 3, 1)              24        
=================================================================
Total params: 24
Trainable params: 24
Non-trainable params: 0
_________________________________________________________________

And this is what I don't understand.

Each unit of an LSTM (With return_sequences=True to have all the output of each state) should give me a vector of shape (timesteps, x) Where timesteps is 3 in this case, and x is the size of my words vector (In this case, 4)

So, why I got an output shape of (3,1) ? I searched everywhere, but can't figure it out.

rvinas rvinas · Accepted Answer · 2018-04-06T15:32:42

Your interpretation of what the LSTM should return is not right. The output dimensionality doesn't need to match the input dimensionality. Concretely, the first argument of keras.layers.LSTM corresponds to the dimensionality of the output space, and you're setting it to 1.

In other words, setting:

model.add(LSTM(k, input_shape=(3,4), return_sequences=True))

will result in a (None, 3, k) output shape.

About LSTM understanding on Keras

1 Answers