3
votes

I built an LSTM in Keras. It reads observations of 9 time-lags, and predicts the next label. For some reason, the model I trained is predicting something that is nearly a straight line. What issue might there be in the model architecture that is creating such a bad regression result?

Input Data: Hourly financial time-series, with a clear upward trend 1200+ records

Input Data Dimensions:
- originally:

X_train.shape (1212, 9)

- reshaped for LSTM:

Z_train.shape (1212, 1, 9)


array([[[0.45073171, 0.46783444, 0.46226164, ..., 0.47164819,
         0.47649667, 0.46017738]],

       [[0.46783444, 0.46226164, 0.4553289 , ..., 0.47649667,
         0.46017738, 0.47167775]],

Target data: y_train

69200    0.471678
69140    0.476364
69080    0.467761
       ...   
7055     0.924937
7017     0.923651
7003     0.906253
Name: Close, Length: 1212, dtype: float64

type(y_train)
<class 'pandas.core.series.Series'>

LSTM design:

my = Sequential()
my.add(LSTM((20),batch_input_shape=(None,1,9), return_sequences=True))
my.add(LSTM(20, return_sequences=True))
my.add(LSTM(20, return_sequences=True))
my.add(LSTM(1))

input layer of 9 nodes. 3 hidden layers of 20 units each. 1 output layers of 1 unit.

The Keras default is return_sequences=False

Model is compiled with mse loss, and adam or sgd optimizer.

curr_model.compile(optimizer=optmfunc, loss="mse")

Model is fit in this manner. Batch is 32, shuffle can be True/False

curr_model.fit(Z_train, y_train,
                           validation_data=(Z_validation,y_validation),
                           epochs=noepoch, verbose=0,
                           batch_size=btchsize,
                           shuffle=shufBOOL)

Config and Weights are saved to disk. Since I'm training several models, I load them afterward to test certain performance metrics.

spec_model.model.save_weights(mname_trn)
mkerascfg = spec_model.model.to_json()
    with open(mname_cfg, "w") as json_file:
        json_file.write(mkerascfg)


When I trained an MLP, I got this result against the validation set:

enter image description here

I've trained several of the LSTMs, but the result against the validation set looks like this:

enter image description here

The 2nd plot (LSTM plot) is of the validation data. This is y_validation versus predictions on Z_validation. They are the last 135 records in respective arrays. These were split out of full data (i.e validation), and have the same type/properties as Z_train and y_train. The x-axis is just numbering 0 to 134 of the index, and y-axis it the value of y_validation or the prediction. Units are normalized in both arrays. So all the units are the same. The "straight" line is the prediction.

What idea could you suggest on why this is happening? - I've changed batch sizes. Similar result. - I've tried changing the return_sequences, but it leads to various errors around shape for subsequent layers, etc.

Information about LSTM progression of MSE loss

There are 4 models trained, all with the same issue of course. We'll just focus on the 3 hidden layer, 20-unit per layer, LSTM, as defined above.(Mini-batch size was 32, and shuffling was disabled. But enabling changed nothing).

This is a slightly zoomed in image of the loss progressionfor the first model (adam optimizer)

enter image description here

From what I can tell by messing with the index, that bounce in the loss values (which creates the thick area) starts after in the 500s of epochs.

enter image description here

2
How do you define y_train, and what are the horizontal axis units for the plots? It'd help to include code covering all variables used in the snippets you provided.OverLordGoldDragon
Added datatype/shape info for validation sets, and for the training targets.VISQL
Try and plot the loss graph, see how much do you learn in each iteration.Yonlif
Do I understand correctly: Input = 9 timesteps, Output = 10-th timestep. Z_train = 1212 samples of 9 timesteps, y_train = 1212 samples of the "10-th timestep"OverLordGoldDragon
Thanks. I think this would be past where any problem occurs. I have graphed this. It's downward sloping and trails off like one would expect.VISQL

2 Answers

3
votes

Your code has a single critical problem: dimensionality shuffling. LSTM expects inputs to be shaped as (batch_size, timesteps, channels) (or (num_samples, timesteps, features)) - whereas you're feeding one timestep with nine channels. Backpropagation through time never even takes place.

Fix: reshape inputs as (1212, 9, 1).


Suggestion: read this answer. It's long, but could save you hours of debugging; this information isn't available elsewhere in such a compact form, and I wish I've had it when starting out with LSTMs.

Answer to a related question may also prove useful - but previous link's more important.

1
votes

OverLordGoldDragon is right: the problem is with the dimensionality of the input.

As you can see in the Keras documentation all recurrent layers expect the input to be a 3D tensor with shape: (batch_size, timesteps, input_dim).

In your case:

  • the input has 9 time lags that need to be fed to the LSTM in sequence, so they are timesteps
  • the time series contains only one financial instrument, so the input_dim is 1

Hence, the correct way to reshape it is: (1212, 9, 1)

Also, make sure to respect the order in which data is fed to the LSTM. For forecasting problems it is better to feed the lags from the most ancient to the most recent, since we are going to predict the next value after the most recent.

Since the LSTM reads the input from left to right, the 9 values should be ordered as: x_t-9, x_t-8, ...., x_t-1 from left to right, i.e. the input and output tensors should look like this:

Z = [[[0], [1], [2], [3], [4], [5], [6], [7], [8]],
     [[1], [2], [3], [4], [5], [6], [7], [8], [9]],
     ...
    ]
y = [9, 10, ...]

If they are not oriented as such you can always set the LSTM flag go_backwards=True to have the LSTM read from right to left.

Also, make sure to pass numpy arrays and not pandas series as X and y as Keras sometimes gets confused by Pandas.

For a full example of doing time series forecasting with Keras take a look at this notebook