How to structure an LSTM neural network for classification

Question

I have data that has various conversations between two people. Each sentence has some type of classification. I am attempting to use an NLP net to classify each sentence of the conversation. I tried a convolution net and get decent results (not ground breaking tho). I figured that since this a back and forth conversation, and LSTM net may produce better results, because what was previously said may have a large impact on what follows.

If I follow the structure above, I would assume that I am doing a many-to-many. My data looks like.

X_train = [[sentence 1],  
           [sentence 2],
           [sentence 3]]
Y_train = [[0],
           [1],
           [0]]

Data has been processed using word2vec. I then design my network as follows..

model = Sequential()      
model.add(Embedding(len(vocabulary),embedding_dim,
          input_length=X_train.shape[1]))
model.add(LSTM(88))
model.add(Dense(1,activation='sigmoid'))
model.compile(optimizer='rmsprop',loss='binary_crossentropy',
              metrics['accuracy'])
model.fit(X_train,Y_train,verbose=2,nb_epoch=3,batch_size=15)

I assume that this setup will feed one batch of sentences in at a time. However, if in model.fit, shuffle is not equal to false its receiving shuffled batches, so why is an LSTM net even useful in this case? From research on the subject, to achieve a many-to-many structure one would need to change the LSTM layer too

model.add(LSTM(88,return_sequence=True))

and the output layer would need to be...

model.add(TimeDistributed(Dense(1,activation='sigmoid')))

When switching to this structure I get an error on the input size. I'm unsure of how to reformat the data to meet this requirement, and also how to edit the embedding layer to receive the new data format.

Any input would be greatly appreciated. Or if you have any suggestions on a better method, I am more than happy to hear them!

Nassim Ben Nassim Ben · Accepted Answer · 2017-02-20T06:55:22

Your first attempt was good. The shuffling takes place between sentences, the only shuffle the training samples between them so that they don't always come in in the same order. The words inside sentences are not shuffled.

Or maybe I didn't understand the question correctly?

EDIT :

After a better understanding of the question, here is my proposition.

Data preparation : You slice your corpus in blocks of n sentences (they can overlap). You should then have a shape like (number_blocks_of_sentences, n, number_of_words_per_sentence) so basically a list of 2D arrays which contain blocks of n sentences. n shouldn't be too big because LSTM can't handle huge number of elements in the sequence when training (vanishing gradient). Your targets should be an array of shape (number_blocks_of_sentences, n, 1) so also a list of 2D arrays containing the class of each sentence in your block of sentences.

Model :

n_sentences = X_train.shape[1]  # number of sentences in a sample (n)
n_words = X_train.shape[2]      # number of words in a sentence

model = Sequential()
# Reshape the input because Embedding only accepts shape (batch_size, input_length) so we just transform list of sentences in huge list of words
model.add(Reshape((n_sentences * n_words,),input_shape = (n_sentences, n_words)))
# Embedding layer - output shape will be (batch_size, n_sentences * n_words, embedding_dim) so each sample in the batch is a big 2D array of words embedded 
model.add(Embedding(len(vocabaulary), embedding_dim, input_length = n_sentences * n_words ))
# Recreate the sentence shaped array
model.add(Reshape((n_sentences, n_words, embedding_dim))) 
# Encode each sentence - output shape is (batch_size, n_sentences, 88)
model.add(TimeDistributed(LSTM(88)))
# Go over lines and output hidden layer which contains info about previous sentences - output shape is (batch_size, n_sentences, hidden_dim)
model.add(LSTM(hidden_dim, return_sequence=True))
# Predict output binary class - output shape is (batch_size, n_sentences, 1)
model.add(TimeDistributed(Dense(1,activation='sigmoid')))
...

This should be a good start.

I hope this helps

How to structure an LSTM neural network for classification

1 Answers