4
votes

I'm currently undertaking my first 'real' DL project of (surprise) predicting stock movements. I know that I'm 1000:1 to make anything useful but I'm enjoying it and want to see it through, I've learnt more in my few weeks of attempting this than I have in the prior 6 months of completing MOOC's.

I'm building an LSTM using Keras to currently predict the next 1 step forward and have attempted the task as both classification (up/down/steady) and now as a regression problem. Both result in a similar roadblock in that my validation loss never improves from epoch #1.

I can get the model to overfit such that training loss approaches zero with MSE (or 100% accuracy if classification), but at no stage does the validation loss decrease. This screams overfitting to my untrained eye so I added varying amounts of dropout but all that does is stifle the learning of the model/training accuracy and shows no improvements on the validation accuracy.

I have attempted to change a significant number of hyperparameters - learning rate, optimiser, batchsize, lookback window, #layers, #units, dropout, #samples, etc, also tried with subset of data and subset of features but I just can't get it to work so I'm very thankful for any help.

Example graph with no dropout

Code Below (it's not pretty I know):

# Import saved full dataframe ~ 200 features
import feather
df = feather.read_dataframe('df_feathered')
df.set_index('time', inplace=True)

# Difference the dataset to make stationary
df = df.diff(periods=1, axis=0)

# MAKE LARGE SAMPLE FOR TESTING
df_train = df.loc['2017-3-1':'2017-6-30']
df_val = df.loc['2017-7-1':'2017-8-31']
df_test = df.loc['2017-9-1':'2017-9-30']

# Make x_train, x_val sets by dropping target variable
x_train = df_train.drop('close+1', axis=1)
x_val = df_val.drop('close+1', axis=1)

# Scale the training data first then fit the transform to the test set
scaler = StandardScaler()
x_train = scaler.fit_transform(x_train)
x_test = scaler.transform(x_val)

# scaler = MinMaxScaler(feature_range=(0,1))
# x_train = scaler.fit_transform(df_train1)
# x_test = scaler.transform(df_val1)

# Create y_train, y_test, simply target variable for regression
y_train = df_train['close+1']
y_test = df_val['close+1']

# Define Lookback window for LSTM input
sliding_window = 15

# Convert x_train, x_test, y_train, y_test into 3d array (samples, 
timesteps, features) for LSTM input
dataXtrain = []
for i in range(len(x_train)-sliding_window-1):
        a = x_train[i:(i+sliding_window), 0:(x_train.shape[1])]
        dataXtrain.append(a)

dataXtest = []
for i in range(len(x_test)-sliding_window-1):
        a = x_test[i:(i+sliding_window), 0:(x_test.shape[1])]
        dataXtest.append(a)

dataYtrain = []
for i in range(len(y_train)-sliding_window-1):
        dataYtrain.append(y_train[i + sliding_window])

dataYtest = []
for i in range(len(y_test)-sliding_window-1):
        dataYtest.append(y_test[i + sliding_window])

# Make data the divisible by a variety of batch_sizes for training
# Started at 1000 to not include replaced NaN values
dataXtrain = np.array(dataXtrain[1000:172008])
dataYtrain = np.array(dataYtrain[1000:172008])
dataXtest = np.array(dataXtest[1000:83944])
dataYtest = np.array(dataYtest[1000:83944])

# Checking input shapes
print('dataXtrain size is: {}'.format((dataXtrain).shape))
print('dataXtest size is: {}'.format((dataXtest).shape))
print('dataYtrain size is: {}'.format((dataYtrain).shape))
print('dataYtest size is: {}'.format((dataYtest).shape))

### ACTUAL LSTM MODEL

batch_size = 256
timesteps = dataXtrain.shape[1]
features = dataXtrain.shape[2]

# Model set-up, stacked 4 layer stateful LSTM
model = Sequential()
model.add(LSTM(512, return_sequences=True, stateful=True, 
               batch_input_shape=(batch_size, timesteps, features)))
model.add(LSTM(256,stateful=True, return_sequences=True))
model.add(LSTM(256,stateful=True, return_sequences=True))
model.add(LSTM(128,stateful=True))
model.add(Dense(1, activation='linear'))         

model.summary()

reduce_lr = ReduceLROnPlateau(monitor='val_loss', factor=0.9, patience=5, min_lr=0.000001, verbose=1)

def coeff_determination(y_true, y_pred):
    from keras import backend as K
    SS_res =  K.sum(K.square( y_true-y_pred ))
    SS_tot = K.sum(K.square( y_true - K.mean(y_true) ) )
    return ( 1 - SS_res/(SS_tot + K.epsilon()) )

model.compile(loss='mse',
              optimizer='nadam',
              metrics=[coeff_determination,'mse','mae','mape'])

history = model.fit(dataXtrain, dataYtrain,validation_data=(dataXtest, dataYtest),
          epochs=100,batch_size=batch_size, shuffle=False, verbose=1, callbacks=[reduce_lr])

score = model.evaluate(dataXtest, dataYtest,batch_size=batch_size, verbose=1)
print(score)

predictions = model.predict(dataXtest, batch_size=batch_size)
print(predictions)

import matplotlib.pyplot as plt
%matplotlib inline
#plt.plot(history.history['mean_squared_error'])
#plt.plot(history.history['val_mean_squared_error'])
plt.plot(history.history['coeff_determination'])
plt.plot(history.history['val_coeff_determination'])
#plt.plot(history.history['mean_absolute_error'])
#plt.plot(history.history['mean_absolute_percentage_error'])
#plt.plot(history.history['val_mean_absolute_percentage_error'])
#plt.title("MSE")
plt.ylabel("R2")
plt.xlabel("epoch")
plt.legend(["train", "val"], loc="best")
plt.show()

plt.plot(history.history["loss"][5:])
plt.plot(history.history["val_loss"][5:])
plt.title("model loss")
plt.ylabel("loss")
plt.xlabel("epoch")
plt.legend(["train", "val"], loc="best")
plt.show()

plt.figure(figsize=(20,8))
plt.plot(dataYtest)
plt.plot(predictions)
plt.title("Prediction")
plt.ylabel("Price")
plt.xlabel("Time")
plt.legend(["Truth", "Prediction"], loc="best")
plt.show()
2
Such a symptom normally means that you are overfitting. You model works better and better for your training timeframe and worse and worse for everything else. This is a good start. Now you need to regularize. Try to add dropout to each of your LSTM layers and check result.Manngo
Thanks for the reply Manngo - that was my initial thought too. However after trying a ton of different dropout parameters most of the graphs look like this: imgur.com/yQyjs5Y Validation loss just will not go down :(swifty
Yeah, this pattern is much better. I would stop training when validation loss doesn't decrease anymore after n epochs. Try early_stopping as a callback. The only other options are to redesign your model and/or to engineer more features.Manngo
Ah ok, val loss doesn't ever decrease though (as in the graph). I did have an early stopping callback but it just gets triggered at whatever the patience level is. Is it possible that there is just no discernible relationship in the data so that it will never generalize? Sounds like I might need to work on more features?swifty
Well, MSE goes down to 1.8 in the first epoch and no longer decreases. It is possible that the network learned everything it could already in epoch 1. At the beginning your validation loss is much better than the training loss so there's something to learn for sure. What is the min-max range of y_train and y_test? What is the MSE with random weights? I'm not sure that you normalize y while I see that you normalize x to range (0,1). If y is something like 2800 (S&P 500) and your input is in range (0,1) then your weights will be extreme.Manngo

2 Answers

2
votes

Maybe you should remember you are predicting sock returns, which it's very likely to predict nothing. So val_loss increasing is not overfitting at all. Instead of adding more dropouts, maybe you should think about adding more layers to increase it's power.

1
votes

Try to reduce learning rate much (and remove dropouts for now).

Why do you use

shuffle=False

in fit() function?