Time Series Prediction using LSTM

Question

I am using Jason Brownlee's tutorial (mirror) to apply LSTM network on some syslog/network log data. He's a master!

I have syslog data(a specific event) for each day for last 1 year and so I am using LSTM network for time series analysis. I am using LSTM from Keras deep learning library.

As I understand -

About Batch_size

A batch of data is a fixed-sized number of rows from the training dataset that defines how many patterns to process before updating the weights of the network. Based on the batch_size the Model takes random samples from the data for the analysis. For time series this is not desirable, hence the batch_size should always be 1.

About setting value for shuffle value

By default, the samples within an epoch are shuffled prior to being exposed to the network. This is undesirable for the LSTM because we want the network to build up state as it learns across the sequence of observations. We can disable the shuffling of samples by setting “shuffle” to “False“.

Scenario1 - Using above two rules/guidelines - I ran several trials with different number of neurons, epoch size and different layers and got better results from the baseline model(persistence model).

Scenario2- Without using above guidelines/rules - I ran several trials with different number of neurons, epoch size and different layers and got even better results than Scenario 1.

Query - Setting shuffle to True and Batch_size values to 1 for time series. Is this a rule or a guideline?

It seems logical reading the tutorial that the data for time series should not be shuffled as we do not want to change the sequence of data, but for my data the results are better if I let the data be shuffled. At the end what I think, what matters is how I get better predictions with my runs. I think I should try and put away "theory" over concrete evidence, such as metrics, elbows, RMSEs,etc.

Kindly enlighten.

Germán Alfaro Germán Alfaro · Accepted Answer · 2017-08-24T00:14:32

It depends a lot on the size of your data, also in the number of variables, decreasing batch size in my experience gives better results since the update is more frequent but in huge datasets it is very expensive. And you have to play with this trade-off (training time vs result).

About your shuffle it may be the case that your data is not that correlated with the past, if that is the case shuffling the data helps the network to learn and be able to generalize (like ordered by label) check reason 7 of the following 37 reasons your neural network not working

Batch size the larger the difficult it is to generalize (reason 11). When data clearly depends on the past you can declare your LSTM in Keras to stateful, this means: "that the states computed for the samples in one batch will be reused as initial states for the samples in the next batch" according to Keras API. Hope this helps.

Time Series Prediction using LSTM

1 Answers