
I'm trying to find correct examples of using LSTM Autoencoder for defining anomalies in time series data in internet and see a lot of examples, where LSTM Autoencoder model are fitted with labels, which are future time steps for feature sequences (as for usual time series forecasting with LSTM), but I suppose, that this kind of model should be trained with labels which are the same sequence as sequence of features (previous time steps).

The first link in the google by this searching for example - https://towardsdatascience.com/time-series-of-price-anomaly-detection-with-lstm-11a12ba4f6d9

1.This function defines the way to get labels (y feature)

def create_sequences(X, **y**, time_steps=TIME_STEPS):
    Xs, ys = [], []
    for i in range(len(X)-time_steps):
    return np.array(Xs), np.array(ys)

X_train, **y_train** = create_sequences(train[['Close']], train['Close'])
X_test, y_test = create_sequences(test[['Close']], test['Close'])

2.Model is fitted as follow

history = model.fit(X_train, **y_train**, epochs=100, batch_size=32, validation_split=0.1,
                    callbacks=[keras.callbacks.EarlyStopping(monitor='val_loss', patience=3, mode='min')], shuffle=False)

Could you kindly comment the way how Autoencoder is implemented in the link on towardsdatascience.com/? Is it correct method or model should be fitted following way ?


Thanks in advance!

I dont see the question here. you are exactly following the code provided here, towardsdatascience.com/…. Training is done as detailed by the author in the post. So are you asking for assistance in understanding how it works?Akshay Sehgal
I think that it is wrong way to fit Autoencoder's model as shown in article. I see many examples like this in different sources (Kaggle, towardsdatascience and others). It seems to me that model should be fitted as I noted model.fit(X_train,X_train). My question: Is it correct way how model is fitted in the article or not ?Mikhail
The job of an auto-encoder (as the name suggests) is to regenerate the input. Your input is X_train, and you are trying to generate X_train. I don't see why the fit statement is incorrect. Anomaly detection using auto-encoders is the act of attempting to re-generate the input, and then comparing the residual loss between input and generated output. The more the loss the more the anomaly score.Akshay Sehgal
The model from article is fitted with y_train, it's labels of future t+1 timestamp, I don't understand why you mark this way as attemption to generate X_train or re-generate the input ? It seems to me that it's attempt to forecast the future labels, isn't it?Mikhail
Because he is creating X_train and y_train using create_sequences(train[['Close']], train['Close']) the same train[[close]].... I would recommend physically printing X_train and y_train to compare whats happening.Akshay Sehgal

2 Answers


This is time series auto-encoder. If you want to predict for future, it goes this way. The auto-encoder / machine learning model fitting is different for different problems and their solutions. You cannot train and fit one model / workflow for all problems. Time-series / time lapse can be what we already collected data for time period and predict, it can be for data collected and future prediction. Both are differently constructed. Like time series data for sub surface earth is differently modeled, and for weather forecast is differently. One model cannot work for both.


By definition an autoencoder is any model attempting at reproducing it's input, independent of the type of architecture (LSTM, CNN,...).

Framed this way it is a unspervised task so the training would be : model.fit(X_train,X_train)

Now, what she does in the article you linked, is to use a common architecture for LSTM autoencoder but applied to timeseries forecasting:

model.add(LSTM(128, input_shape=(X_train.shape[1], X_train.shape[2])))
model.add(LSTM(128, return_sequences=True))

She's pre-processing the data in a way to get X_train = [x(t-seq)....x(t)] and y_train = x(t+1)

for i in range(len(X)-time_steps):

So the model does not per-se reproduce the input it's fed, but it doesn't mean it's not a valid implementation since it produce valuable prediction.