0
votes

I want to create an LSTM memory. The LSTM should predict a one-hot encoding value of length 4 given a sentence. This was easy in the first step.

The next thing, I wanted to do, is adding additional information to my dataset. The information is a one-hot encoded vector of length 5.

My idea was to concatenate the Embedding layer with another Input shape before passing the data to an LSTM. This looks like this for me:

main_input = Input(shape=(MAX_SEQUENCE_LENGTH,), dtype='int32', name='main_input')
embedding = Embedding(MAX_NB_WORDS, EMBEDDING_SIZE,
                    input_length=MAX_SEQUENCE_LENGTH)(main_input)

# second input model
auxiliary_input = Input(shape=(5,), name='aux_input')
x = concatenate([embedding, auxiliary_input])

lstm = LSTM(HIDDEN_LAYER_SIZE)(x)

main_output = Dense(4, activation='sigmoid', name='main_output')(lstm)

model = Model(inputs=[main_input, auxiliary_input], outputs=main_output)

But if I try to do a set up like this, I get the following error: ValueError: A Concatenate layer requires inputs with matching shapes except for the concat axis. Got inputs shapes: [(None, 50, 128), (None, 5)]

It is working for me that I create an LSTM of the embedding layer and concatenate this one the the auxiliary input but then I cannot run a LSTM anymore after (getting the erro: ValueError: Input 0 is incompatible with layer lstm_2: expected ndim=3, found ndim=2)

So my question is: What is the right way to build a LSTM with an embedding layer input with additional data in keras?

1
First question is "how you want to add this data". "Where". "What does it mean?". Then you decide which approach to use. It's not possible to concatenate 5 values, shape (None, 5) to a sequence of values, shape (None, 50, 128).Daniel Möller

1 Answers

0
votes

It seems that here you are trying to pass an additional information about the full sequence (and not each token) , that's why you have a mismatch problem.

There are several ways to tackle this problem , all with pros and cons

(1) You can concatenate the aux_data with the last output of your lstm , so concatenating concat_with_aux = concatenate([auxiliary_input,lstm]) and pass this concatenate vector to your model. Here it means that if you have two identical sequences with different category, the output of the LSTM will be the same, then after concatenation it will be the job of the dense classifier to use this concatenated result to produce the right output.

(2) If you want to pass the information directly at the input of the LSTM. You can for example create new trainable Embedding layer for your categories:

auxiliary_input = Input(shape=(1,), name='aux_input') # Now you pass the idx (0,1,2,3,4) not the one_hot encoded form
embed_categories = Embedding(5, EMBEDDING_SIZE,
                    input_length=1)(auxiliary_input)

x = concatenate([embed_categories, embedding])

By doing that your LSTM will be conditionned on your auxiliary information and two identical sentences with different categories will have a different last lstm output.