How to prepare input data for a LSTM regression in keras?

Question

I have a dataset with 5K rows (-1K for validation) and 17 columns, including the last one (the target integer binary label).

My model is simply this 2-layer LSTM:

model = Sequential()
model.add(Embedding(output_dim=64, input_dim=17))
model.add(LSTM(32, return_sequences=True))
model.add(Dropout(0.5))
model.add(LSTM(32, return_sequences=False))
model.add(Dense(1))

model.compile(loss='binary_crossentropy', optimizer='rmsprop',
              class_mode='binary')

After loading my dataset with pandas

df_train = pd.read_csv(train_file)
train_X, train_y = df_train.values[:, :-1], df_train['target'].values

and trying to run my model, I get this error:

Exception: When using TensorFlow, you should define explicitly the number of timesteps of your sequences. - If your first layer is an Embedding, make sure to pass it an "input_length" argument. Otherwise, make sure the first layer has an "input_shape" or "batch_input_shape" argument, including the time axis.

What should I put in input_length? The total rowcount?

Since my dataframe has a shape as train_X=(4000, 17) train_y=(4000,) how can I prepare it to feed this kind of model? I have to change my input data shape?

Thanks for any help!! (=

mrry mrry · Accepted Answer · 2016-03-18T15:58:19

It looks like Keras uses the static unrolling approach to build recurrent networks (such as LSTMs) on TensorFlow. The input_length should be the length of the longest sequence that you want to train: so if each row of your CSV file train_file is a comma-delimited sequence of symbols, it should be the number of symbols in the longest row.

How to prepare input data for a LSTM regression in keras?

1 Answers