I built an LSTM in Keras. It reads observations of 9 time-lags, and predicts the next label. For some reason, the model I trained is predicting something that is nearly a straight line. What issue might there be in the model architecture that is creating such a bad regression result?
Input Data: Hourly financial time-series, with a clear upward trend 1200+ records
Input Data Dimensions:
- originally:
X_train.shape (1212, 9)
- reshaped for LSTM:
Z_train.shape (1212, 1, 9)
array([[[0.45073171, 0.46783444, 0.46226164, ..., 0.47164819,
0.47649667, 0.46017738]],
[[0.46783444, 0.46226164, 0.4553289 , ..., 0.47649667,
0.46017738, 0.47167775]],
Target data: y_train
69200 0.471678
69140 0.476364
69080 0.467761
...
7055 0.924937
7017 0.923651
7003 0.906253
Name: Close, Length: 1212, dtype: float64
type(y_train)
<class 'pandas.core.series.Series'>
LSTM design:
my = Sequential()
my.add(LSTM((20),batch_input_shape=(None,1,9), return_sequences=True))
my.add(LSTM(20, return_sequences=True))
my.add(LSTM(20, return_sequences=True))
my.add(LSTM(1))
input layer of 9 nodes. 3 hidden layers of 20 units each. 1 output layers of 1 unit.
The Keras default is return_sequences=False
Model is compiled with mse
loss, and adam
or sgd
optimizer.
curr_model.compile(optimizer=optmfunc, loss="mse")
Model is fit in this manner. Batch is 32, shuffle can be True/False
curr_model.fit(Z_train, y_train,
validation_data=(Z_validation,y_validation),
epochs=noepoch, verbose=0,
batch_size=btchsize,
shuffle=shufBOOL)
Config and Weights are saved to disk. Since I'm training several models, I load them afterward to test certain performance metrics.
spec_model.model.save_weights(mname_trn)
mkerascfg = spec_model.model.to_json()
with open(mname_cfg, "w") as json_file:
json_file.write(mkerascfg)
When I trained an MLP, I got this result against the validation set:
I've trained several of the LSTMs, but the result against the validation set looks like this:
The 2nd plot (LSTM plot) is of the validation data. This is y_validation versus predictions on Z_validation. They are the last 135 records in respective arrays. These were split out of full data (i.e validation), and have the same type/properties as Z_train and y_train. The x-axis is just numbering 0 to 134 of the index, and y-axis it the value of y_validation or the prediction. Units are normalized in both arrays. So all the units are the same. The "straight" line is the prediction.
What idea could you suggest on why this is happening? - I've changed batch sizes. Similar result. - I've tried changing the return_sequences, but it leads to various errors around shape for subsequent layers, etc.
Information about LSTM progression of MSE loss
There are 4 models trained, all with the same issue of course. We'll just focus on the 3 hidden layer, 20-unit per layer, LSTM, as defined above.(Mini-batch size was 32, and shuffling was disabled. But enabling changed nothing).
This is a slightly zoomed in image of the loss progressionfor the first model (adam optimizer)
From what I can tell by messing with the index, that bounce in the loss values (which creates the thick area) starts after in the 500s of epochs.
y_train
, and what are the horizontal axis units for the plots? It'd help to include code covering all variables used in the snippets you provided. – OverLordGoldDragonZ_train
= 1212 samples of 9 timesteps,y_train
= 1212 samples of the "10-th timestep" – OverLordGoldDragon