7
votes

I am new to neural networks and have two, probably pretty basic, questions. I am setting up a generic LSTM Network to predict the future of sequence, based on multiple Features. My training data is therefore of the shape (number of training sequences, length of each sequence, amount of features for each timestep). Or to make it more specific, something like (2000, 10, 3). I try to predict the value of one feature, not of all three.

  1. Problem:

If I make my Network deeper and/or wider, the only output I get is the constant mean of the values to be predicted. Take this setup for example:

z0 = Input(shape=[None, len(dataset[0])])

z = LSTM(32, return_sequences=True, activation='softsign', recurrent_activation='softsign')(z0)
z = LSTM(32, return_sequences=True, activation='softsign', recurrent_activation='softsign')(z)
z = LSTM(64, return_sequences=True, activation='softsign', recurrent_activation='softsign')(z)
z = LSTM(64, return_sequences=True, activation='softsign', recurrent_activation='softsign')(z)
z = LSTM(128, activation='softsign', recurrent_activation='softsign')(z)

z = Dense(1)(z)
model = Model(inputs=z0, outputs=z)
print(model.summary())

model.compile(loss='mean_squared_error', optimizer='adam', metrics=['accuracy'])

history= model.fit(trainX, trainY,validation_split=0.1, epochs=200, batch_size=32, 
    callbacks=[ReduceLROnPlateau(factor=0.67, patience=3, verbose=1, min_lr=1E-5),
            EarlyStopping(patience=50, verbose=1)])

This is what results from a network like that. Note: These are predictions from the input used for training

If I just use one layer, like:

z0 = Input(shape=[None, len(dataset[0])])

z = LSTM(4, activation='soft sign', recurrent_activation='softsign')(z0)

z = Dense(1)(z)
model = Model(inputs=z0, outputs=z)
print(model.summary())

model.compile(loss='mean_squared_error', optimizer='adam', metrics=['accuracy'])
history= model.fit(trainX, trainY,validation_split=0.1, epochs=200, batch_size=32,
        callbacks=[ReduceLROnPlateau(factor=0.67, patience=3, verbose=1, min_lr=1E-5),
        EarlyStopping(patience=200, verbose=1)])

The predictions are somewhat reasonable, at least they are not constant anymore.

Why does that happen? Around 2000 samples not that many, but in the case of overfitting, I would expect the predictions to match perfectly...

  1. EDIT: Solved, as stated in the comments, it's just that Keras always expects Batches: Keras

When I use:

`test=model.predict(trainX[0])`

to get the prediction for the first sequence, I get an dimension error:

"Error when checking : expected input_1 to have 3 dimensions, but got array with shape (3, 3)"

I need to feed in an array of sequences like:

`test=model.predict(trainX[0:1])`

This is a workaround, but I am not really sure, whether this has any deeper meaning, or is just a syntax thing...

1
Have you compared your code to these examples? keras.io/getting-started/sequential-model-guide - Jonathon Reinhart
Maybe try to reinitialize the model a few times (create it again) and see if sometimes it work.... --- About the question 2, keras is always expecting "batches". That's why you need to pass an array of sequences, never a single sequence. - Daniel Möller
Jonathon: Do you have any specific example in mind? My Code does seem to work, as only large networks give constant outputs, it seems to be an design issue, not a syntax based one etc. @Daniel: Yeah, I ran the script multiple times, creating the model over and over again. I think there were sometimes models with an "intermediate" size, that sometimes worked, sometimes didn't... - Carolus Shoen

1 Answers

1
votes

This is because you have not normalised input data.

Any neural network model will initially have weights normalised around zero. Since your training dataset has all positive values, the model will try to adjust its weights to predict only positive values. However, the activation function (in your case softsign) will map it to 1. So the model can do nothing except adding the bias. That is why you are getting an almost constant line around the average value of the dataset.

For this, you can use a general tool like sklearn to pre-process your data. If you are using pandas dataframe, something like this will help

data_df = (data_df - data_df.mean()) / data_df.std()

Or to have the parameters in the model, you can consider adding batch normalization layer to your model