I am currently trying to do time series prediction with LSTM implemented with Keras.
I tried to train a LSTM model with 10 000 samples in the train and 2 500 samples in the test. I am using a batch size of 30.
Now, I am trying to train the exact same model but with more data. I have a train with 100 000 samples and test with 25 000 samples.
The time for one epoch is multiplicated by 100 when using the big dataset.
Even if I have more data, the size of the batch size is the same so the training should not be taking more time. Is it possible that this is the calculation of the loss on the train and test data that take a lot of time (here all the data is used) ?
Concerning the size of the batch size : should I put it higher because I have more data ?
EDIT 1
I tried to change the batch size and to put a bigger one. When I do that, the time of training decrease a lot. With a big batch size, the computation of the gradient should be longer than with a small batch size ?
I have no clue here, I really do not understand why this is happening.
Does someone know why this is happening ? Is it linked to the data I use ? How theorically can this happen ?
EDIT 2
My processor is Intel Xeon W3520 (4 hearts / 8 threads) with 32G of RAM. The data is composed of sequence of length 6 with 4 features. I use one LSMT layer with 50 units and a dense output layer. Whether I am training with 10 000 samples or 100 000 it is really the size of the batch size that change the time of computation. I can go from 2 seconds for one epoch with a batch size = 1000, to 200 seconds with a batch size = 30.
I do not use a generator, I use the basic line of code model.fit(Xtrain, Ytrain, nb_epoch, batch_size, verbose=2,callbacks,validation_data=(Xtest, Ytest)) with callbacks = [EarlyStopping(monitor='val_loss', patience=10, verbose=2), history]