I am trying to make a sequence to sequence encoder decoder model and need to softmax the last layer to use categorical cross entropy.
I've tried setting activation of the last LSTM layer to 'softmax' but that doesn't seem to do the trick. Adding another dense layer and setting the activation to softmax doesn't help either. What is the correct way to do a softmax when your last LSTM outputs a sequence?
inputs = Input(batch_shape=(batch_size, timesteps, input_dim), name='hella')
encoded = LSTM(latent_dim, return_sequences=True, stateful=False)(inputs)
encoded = LSTM(latent_dim, return_sequences=True, stateful=False)(encoded)
encoded = LSTM(latent_dim, return_sequences=True, stateful=False)(encoded)
encoded = LSTM(latent_dim, return_sequences=False)(encoded)
decoded = RepeatVector(timesteps)(encoded)
decoded = LSTM(input_dim, return_sequences=True)(decoded)
# do softmax here
sequence_autoencoder = Model(inputs, decoded)
sequence_autoencoder.compile(loss='categorical_crossentropy', optimizer='adam')