Need clarification of TimeDistributed(dense()) with LSTM in many-to-many scenario

Question

I am new to RNNs and keras.

I am trying to compare performance of LSTM against traditional machine learning algorithms (like RF or GBM) on a sequential data (not necessarily time-series but in order). My data contains 276 predictors and an output (for eg. stock price with 276 various informations of the stock's firm) with 8564 retro observations. Since, LSTMs are great in capturing sequential trend, I decided to use a time_step of 300. From the below figure, I believe I have the task of creating a many-to-many network (last figure from left). (Pic:http://karpathy.github.io/2015/05/21/rnn-effectiveness/)

With each pink box being of size 276 (number of predictors) and 300 (time_steps) such pink boxes in one batch.However, I am struggling to see how I design the blue boxes here as each blue box should be the output (stock price) of each instance. From other posts on Keras gihub forum #2403 and #2654 , I think I have to implement TimeDistributed(Dense()) but I don't know how . This is my code to check if it works (train_idv is the data to predict from and train_dv is stock price)

train_idv.shape
#(8263, 300, 276)
train_dv.shape
#(8263, 300, 1)
batch_size = 1
time_Steps=300
model = Sequential()

model.add(LSTM(300,
        batch_input_shape=(batch_size, time_Steps, train_idv.shape[2]),
        stateful=True, 
        return_sequences=True))
model.add(Dropout(0.3))

model.add(TimeDistributed(Dense(300))) 

# Model Compilation
model.compile(loss='mean_squared_error',optimizer='adam',metrics=['accuracy'])

model.fit(train_idv, traindv, nb_epoch=1, batch_size=batch_size, verbose=2, shuffle=False)

Running the model.fit gives this error

Traceback (most recent call last): File "", line 1, in File "/home/user/.local/lib/python2.7/site-packages/keras/models.py", line 627, in fit sample_weight=sample_weight) File "/home/user/.local/lib/python2.7/site-packages/keras/engine/training.py", line 1052, in fit batch_size=batch_size) File "/home/user/.local/lib/python2.7/site-packages/keras/engine/training.py", line 983, in _standardize_user_data exception_prefix='model target') File "/home/user/.local/lib/python2.7/site-packages/keras/engine/training.py", line 111, in standardize_input_data str(array.shape)) Exception: Error when checking model target: expected timedistributed_4 to have shape (1, 300, 300) but got array with shape (8263, 300, 1)

Now, I have successfully ran it with time_step=1 and just using Dense(1) as last layer. But I am not sure how should I shape my train_dv (output in training) or how to use TimeDistributed(Dense()) correctly. Finally, I want to use

trainPredict = model.predict(train_idv,batch_size=1)

to predict scores on any data.

I have posted this question on keras github forum as well.

I would be careful with TimeDistributed(Dense). Although it is essential in certain parts of the model (between LSTMs for example), I have found that it seems to break loss calculations if used as the final layer. See Keras on github for a related issue: github.com/fchollet/keras/issues/8055 — Phil

Daniel De Freitas Daniel De Freitas · Accepted Answer · 2016-11-15T01:45:31

From your post I understand that you want each LSTM time step to predict a single scalar correct? Then you Time Distributed Dense layer should have output 1, not 300 (i.e. TimeDistributed(Dense(1))).

Also for your reference, there's an example in the keras repo for using Time Distributed Dense.

In this example one wants basically to train a multi-class classifier (with shared weights) for each timestep, where the different possible classes are the different possible digit characters:

# For each of step of the output sequence, decide which character should be chosen
model.add(TimeDistributed(Dense(len(chars))))

The number of time steps is defined by the preceding recurrent layers.

Need clarification of TimeDistributed(dense()) with LSTM in many-to-many scenario

1 Answers