7
votes

I run the example code for LSTM networks that uses imdb dataset in Keras. One can find the code in the following link. imdb_lstm.py

My problem is that as code progresses the training loss decreases and training accuracy increases as expected but validation accuracy fluctuates in an interval and validation loss increases to a high value. I attach a part of the log of the training phase below. Even I observe that when training loss is very small (~ 0.01-0.03) sometimes it increases in the next epoch and then it decreases again. What I mention can be seen in epochs 75-77. But in general it decreases.

What I expect is that training accuracy always increases up to 0.99-1 and training loss always decreases. Moreover, the validation accuracy should start from maybe 0.4 and raise to for example 0.8 in the end. If validation accuracy does not improve over epochs what is the point of waiting during epochs? Also the test accuracy is close 0.81 at the end.

I also tried with my own data and came up with same situation. I processed my data in a similar way. I mean my training, validation and test points are processed in same logic as the ones in this example code.

Besides, I did not understand how this code represents the whole sentence after obtaining the outputs from LSTM for each word. Does it conduct mean or max pooling or does it take only the last output from LSTM layer before giving it to a logistic regression classifier?

Any help would be appreciated.

Using Theano backend.
Loading data...
25000 train sequences
25000 test sequences
Pad sequences (samples x time)
X_train shape: (25000, 80)
X_test shape: (25000, 80)
Build model...
Train...
Train on 22500 samples, validate on 2500 samples
Epoch 1/100
22500/22500 [==============================] - 236s - loss: 0.5438 - acc: 0.7209 - val_loss: 0.4305 - val_acc: 0.8076
Epoch 2/100
22500/22500 [==============================] - 237s - loss: 0.3843 - acc: 0.8346 - val_loss: 0.3791 - val_acc: 0.8332
Epoch 3/100
22500/22500 [==============================] - 245s - loss: 0.3099 - acc: 0.8716 - val_loss: 0.3736 - val_acc: 0.8440
Epoch 4/100
22500/22500 [==============================] - 243s - loss: 0.2458 - acc: 0.9023 - val_loss: 0.4206 - val_acc: 0.8372
Epoch 5/100
22500/22500 [==============================] - 239s - loss: 0.2120 - acc: 0.9138 - val_loss: 0.3844 - val_acc: 0.8384
....
....
Epoch 75/100
22500/22500 [==============================] - 238s - loss: 0.0134 - acc: 0.9868 - val_loss: 0.9045 - val_acc: 0.8132
Epoch 76/100
22500/22500 [==============================] - 241s - loss: 0.0156 - acc: 0.9845 - val_loss: 0.9078 - val_acc: 0.8211
Epoch 77/100
22500/22500 [==============================] - 235s - loss: 0.0129 - acc: 0.9883 - val_loss: 0.9105 - val_acc: 0.8234
1

1 Answers

7
votes
  1. When to stop training: it's an usual way to stop training when a some metric computed on validation data starts to grow. This an usual indicator of overfitting. But please notice that you are using a dropout technique - which results in training slightly different model during every epochs - that's why you should apply some kind of patience - and stop training when a phenomena like this occurs in a several consecutive epochs.

  2. The reason of fluctuations: the same as in first point - you are using a dropout technique which introduces some sort of randomness to your network. This is in my opinion the main reason of the fluctuations observed.

  3. What Keras models take as an inputs to a Dense layer: if you study carefully a documentation of a LSTM/RNN layer you will notice return_sequences=False set as a default argument. This means that only the last output from a processed sequence is taken as an input to the following layer. You could change that using 1-D Convolutions.