1
votes

In my binary multilabel sequence classification problem, I have 22 timesteps in each input sentence. Now that I have added 200 dimensions of word embedding to each timestep, so my current input shape is (*number of input sentence*,22,200). My output shape would be (*number of input sentence*,4), eg.[1,0,0,1].

My first question is, how to build the Keras LSTM model to accept 3D input and output 2D results. The following code outputs the error:

ValueError: Error when checking target: expected dense_41 to have 3 dimensions, but got array with shape (7339, 4)

My second question is, when I add TimeDistributed layer, should I set the number of Dense layer to the number of features in input, in my case, that is 200?

.

X_train, X_test, y_train, y_test = train_test_split(padded_docs2, new_y, test_size=0.33, random_state=42)

start = datetime.datetime.now()
print(start)

# define the model
model = Sequential()
e = Embedding(input_dim=vocab_size2, input_length=22, output_dim=200, weights=[embedding_matrix2], trainable=False)
model.add(e)
model.add(LSTM(128, input_shape=(X_train.shape[1],200),dropout=0.2, recurrent_dropout=0.1, return_sequences=True))
model.add(TimeDistributed(Dense(200)))

model.add(Dense(y_train.shape[1],activation='sigmoid'))

# compile the model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['acc'])
# summarize the model
print(model.summary())

# fit the model
model.fit(X_train, y_train, epochs=300, verbose=0)

end = datetime.datetime.now()
print(end)
print('Time taken to build the model: ', end-start)

Please let me know if I have missed out any information, thanks.

1

1 Answers

1
votes

Your model's Lstm layers gets 3D sequence and produces outputs of 3D. The same goes to TimeDistributed layer. If you want lstm to return 2D tensor the argument return_sequences should be false. Now you don't have to use TimeDistributed Wrapper. With this setup your model would be

model = Sequential()
e = Embedding(input_dim=vocab_size2, input_length=22, output_dim=200, weights=[embedding_matrix2], trainable=False)
model.add(e)
model.add(LSTM(128, input_shape=(X_train.shape[1],200),dropout=0.2, recurrent_dropout=0.1, return_sequences=False))
model.add(Dense(200))

model.add(Dense(y_train.shape[1],activation='sigmoid'))

###Edit: TimeDistributed applies a given layer to each temporal slices of inputs.In your case for example, the temporal dimension is X_train.shape[1]. Let's assume X_train.shape[1] == 10 and consider the following line.

model.add(TimeDistributed(Dense(200)))

Here the TimeDistributed wrapper creates one dense layer(Dense(200)) for each temporal slices(total of 10 dense layers). So for each temporal dimension you will get output with shape(batch_size, 200) and the final output tensor would have shape of (batch_size, 10, 200). But you said you want 2D output. So the TimeDistributed wouldn't work to get 2D from 3D inputs. The other case is if you remove TimeDistributed wrapper and use only dense, like this. model.add(Dense(200)) Then the dense layer first flatten the input to have shape (batch_size * 10, 200) and computes the dot product of fully connected layer. After dot product the dense layer reshapes the outputs to have the same shape as inputs. In your case (batch_size, 10, 200) and it is still 3D tensor.
But if you don't want to change the lstm layer you can replace TimeDistributed layer with another lstm layer with return_sequences set to false. Now your model would look like this.

model = Sequential()
e = Embedding(input_dim=vocab_size2, input_length=22, output_dim=200, weights=[embedding_matrix2], trainable=False)
model.add(e)
model.add(LSTM(128, input_shape=(X_train.shape[1],200),dropout=0.2, recurrent_dropout=0.1, return_sequences=True))
model.add(LSTM(200, input_shape=(X_train.shape[1],200),dropout=0.2, recurrent_dropout=0.1, return_sequences=False))

model.add(Dense(y_train.shape[1],activation='sigmoid'))