LSTM forecasted a straight line

Question

I built an LSTM in Keras. It reads observations of 9 time-lags, and predicts the next label. For some reason, the model I trained is predicting something that is nearly a straight line. What issue might there be in the model architecture that is creating such a bad regression result?

Input Data: Hourly financial time-series, with a clear upward trend 1200+ records

Input Data Dimensions:
- originally:

X_train.shape (1212, 9)

- reshaped for LSTM:

Z_train.shape (1212, 1, 9)


array([[[0.45073171, 0.46783444, 0.46226164, ..., 0.47164819,
         0.47649667, 0.46017738]],

       [[0.46783444, 0.46226164, 0.4553289 , ..., 0.47649667,
         0.46017738, 0.47167775]],

Target data: y_train

69200    0.471678
69140    0.476364
69080    0.467761
       ...   
7055     0.924937
7017     0.923651
7003     0.906253
Name: Close, Length: 1212, dtype: float64

type(y_train)
<class 'pandas.core.series.Series'>

LSTM design:

my = Sequential()
my.add(LSTM((20),batch_input_shape=(None,1,9), return_sequences=True))
my.add(LSTM(20, return_sequences=True))
my.add(LSTM(20, return_sequences=True))
my.add(LSTM(1))

input layer of 9 nodes. 3 hidden layers of 20 units each. 1 output layers of 1 unit.

The Keras default is return_sequences=False

Model is compiled with mse loss, and adam or sgd optimizer.

curr_model.compile(optimizer=optmfunc, loss="mse")

Model is fit in this manner. Batch is 32, shuffle can be True/False

curr_model.fit(Z_train, y_train,
                           validation_data=(Z_validation,y_validation),
                           epochs=noepoch, verbose=0,
                           batch_size=btchsize,
                           shuffle=shufBOOL)

Config and Weights are saved to disk. Since I'm training several models, I load them afterward to test certain performance metrics.

spec_model.model.save_weights(mname_trn)
mkerascfg = spec_model.model.to_json()
    with open(mname_cfg, "w") as json_file:
        json_file.write(mkerascfg)

When I trained an MLP, I got this result against the validation set:

I've trained several of the LSTMs, but the result against the validation set looks like this:

The 2nd plot (LSTM plot) is of the validation data. This is y_validation versus predictions on Z_validation. They are the last 135 records in respective arrays. These were split out of full data (i.e validation), and have the same type/properties as Z_train and y_train. The x-axis is just numbering 0 to 134 of the index, and y-axis it the value of y_validation or the prediction. Units are normalized in both arrays. So all the units are the same. The "straight" line is the prediction.

What idea could you suggest on why this is happening? - I've changed batch sizes. Similar result. - I've tried changing the return_sequences, but it leads to various errors around shape for subsequent layers, etc.

Information about LSTM progression of MSE loss

There are 4 models trained, all with the same issue of course. We'll just focus on the 3 hidden layer, 20-unit per layer, LSTM, as defined above.(Mini-batch size was 32, and shuffling was disabled. But enabling changed nothing).

This is a slightly zoomed in image of the loss progressionfor the first model (adam optimizer)

From what I can tell by messing with the index, that bounce in the loss values (which creates the thick area) starts after in the 500s of epochs.

How do you define y_train, and what are the horizontal axis units for the plots? It'd help to include code covering all variables used in the snippets you provided. — OverLordGoldDragon
Added datatype/shape info for validation sets, and for the training targets. — VISQL
Try and plot the loss graph, see how much do you learn in each iteration. — Yonlif
Do I understand correctly: Input = 9 timesteps, Output = 10-th timestep. Z_train = 1212 samples of 9 timesteps, y_train = 1212 samples of the "10-th timestep" — OverLordGoldDragon
Thanks. I think this would be past where any problem occurs. I have graphed this. It's downward sloping and trails off like one would expect. — VISQL

OverLordGoldDragon OverLordGoldDragon · Accepted Answer · 2019-10-15T21:25:43

Your code has a single critical problem: dimensionality shuffling. LSTM expects inputs to be shaped as (batch_size, timesteps, channels) (or (num_samples, timesteps, features)) - whereas you're feeding one timestep with nine channels. Backpropagation through time never even takes place.

Fix: reshape inputs as (1212, 9, 1).

Suggestion: read this answer. It's long, but could save you hours of debugging; this information isn't available elsewhere in such a compact form, and I wish I've had it when starting out with LSTMs.

Answer to a related question may also prove useful - but previous link's more important.

LSTM forecasted a straight line

2 Answers