Keras LSTM Autoencoder time-series reconstruction

Question

I am trying to reconstruct time series data with LSTM Autoencoder (Keras). Now I want train autoencoder on small amount of samples (5 samples, every sample is 500 time-steps long and have 1 dimension). I want to make sure that model can reconstruct that 5 samples and after that I will use all data (6000 samples).

window_size = 500
features = 1
data = data.reshape(5, window_size, features)

model = Sequential()

model.add(LSTM(256, input_shape=(window_size, features), 
return_sequences=True))
model.add(LSTM(128, input_shape=(window_size, features), 
return_sequences=False))
model.add(RepeatVector(window_size))

model.add(LSTM(128, input_shape=(window_size, features), 
return_sequences=True))
model.add(LSTM(256, input_shape=(window_size, features), 
return_sequences=True))
model.add(TimeDistributed(Dense(1)))

model.compile(optimizer='adam', loss='mse')
model.fit(data, data, epochs=100, verbose=1)

Model

Training:

Epoch 1/100
5/5 [==============================] - 2s 384ms/step - loss: 0.1603
...
Epoch 100/100
5/5 [==============================] - 2s 388ms/step - loss: 0.0018

After training, I tried reconstruct one of 5 samples:

yhat = model.predict(np.expand_dims(data[1,:,:], axis=0), verbose=0)

Reconstitution: Blue
Input: Orange

Reconstion (blue) vs Input (orange)

Why is reconstruction so bad when loss is small? How can I make model better? Thanks.

Would you show all graphs from data[0,:,:] to data[4,:,:]? — Daniel Möller

moh moh · Accepted Answer · 2019-07-25T10:47:31

It seems to me, a time series should be given to the LSTMs in this format:

 (samples, features , window_size)

So, if you change the format, for example I exchanged the variables, and look at the results:

Code for reproducing the result(I didn't change the name of the variables, so please don't be confused :)):

import numpy as np
import keras
from keras import Sequential
from keras.layers import Dense, RepeatVector,        TimeDistributed
from keras.layers import LSTM

N = 10000
data = np.random.uniform(-0.1, 0.1, size=(N, 500))
data = data.cumsum(axis=1)
print(data.shape)
window_size = 1
features = 500
data = data.reshape(N, window_size, features)

model = Sequential()

model.add(LSTM(32, input_shape=
(window_size,features), 
return_sequences=True))
model.add(LSTM(16, input_shape=(window_size,   
features), 
return_sequences=False))
model.add(RepeatVector(window_size))

model.add(LSTM(16, input_shape=(window_size, 
features), 
return_sequences=True))
model.add(LSTM(32, input_shape=(window_size,   
features), 
return_sequences=True))
model.add(TimeDistributed(Dense(500)))

model.compile(optimizer='adam', loss='mse')
model.fit(data, data, epochs=100, verbose=1)


yhat = model.predict(np.expand_dims(data[1,:,:],   axis=0), verbose=0)
plot(np.arange(500), yhat[0,0,:])
plot(np.arange(500), data[1,0,:])

Credit to sobe86: I used the proposed data by him/her.

Keras LSTM Autoencoder time-series reconstruction

2 Answers