I am trying to implement a simple many to many LSTM for Sequence Prediction. The problem is very easy. The input is a sequence of 0s and 1s. The output at each time step is the count of ones in the sequence until that time step. For example assume the input is [0 1 0 1]. The output of the given input would be time0=0, time1=1, time2=1, time3=2. I should note that I use One hot encoding to represent the output.
Assumptions: the length of the input sequence is 20 (so at most I can have 20 ones in the sequence). Therefore, I consider 21 classes for output (one hot encoding). Class 0 means there is no one in the sequence. Class 21 shows that we have 20 ones in the sequence.
So far, I use the following model:
# create LSTM
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.LSTM(30, input_shape=(20, 1), return_sequences=True ))
#model.add(tf.keras.layers.LSTM(30, input_shape=(20, 1)))
print (model.input_shape)
print (model.output_shape)
model.add(tf.keras.layers.Dropout(0.2))
#model.add(tf.keras.layers.TimeDistributed(tf.keras.layers.Dense(21, activation='softmax')))
model.add(tf.keras.layers.Dense(21, activation='softmax'))
print(model.summary())
model.compile(loss='categorical_crossentropy',
optimizer='adam',
metrics=['accuracy'])
I evaluated it by adding and removing "tf.keras.layers.TimeDistributed". Both of them reach the same accuracy of 99%! I am wondering why is that? So when we need to use "TimeDistributed"? What is it for then?