Please consider this simple example
nb_samples = 100000
X = np.random.randn(nb_samples)
Y = X[1:]
X = X[:-1]
X = X.reshape((len(Y), 1, 1))
Y = Y.reshape((len(Y), 1))
So we have basically
Y[i] = X[i-1]
and the model is simply a lag operator.
I can learn this model with a stateless LSTM, but I want here to understand and apply stateful LSTMs in Keras.
So I try to learn this model with a stateful LSTM, by giving the pairs of values (x, y)
one by one (batch_size = 1)
model = Sequential()
model.add(LSTM(batch_input_shape=(1, 1, 1),
output_dim =10,
activation='tanh', stateful=True
)
)
model.add(Dense(output_dim=1, activation='linear'))
model.compile(loss='mse', optimizer='adam')
for epoch in range(50):
model.fit(X_train,
Y_train,
nb_epoch = 1,
verbose = 2,
batch_size = 1,
shuffle = False)
model.reset_states()
But the model does not learn anything.
As per Marcin suggestion, I modified the training code as follows:
for epoch in range(10000):
model.reset_states()
train_loss = 0
for i in range(Y_train.shape[0]):
train_loss += model.train_on_batch(X_train[i:i+1],
Y_train[i:i+1],
)
print '# epoch', epoch, ' loss ', train_loss/float(Y_train.shape[0])
but I am still seeing a mean loss around 1, which is the standard deviation of my randomly generated data, so the model does not seem to learn.
Am I having something wrong?
10
units might simply be not enough for this. You could also decrease the sequence length or try to check some continous function (likesin
or polynomial). At this moment your architecture seems to be to simple for your task. – Marcin Możejkorandn
will be 0. If this is your output then the learning actually succeeded. Try learning something meaningful. – nemo