I run the example code for LSTM networks that uses imdb dataset in Keras. One can find the code in the following link. imdb_lstm.py
My problem is that as code progresses the training loss decreases and training accuracy increases as expected but validation accuracy fluctuates in an interval and validation loss increases to a high value. I attach a part of the log of the training phase below. Even I observe that when training loss is very small (~ 0.01-0.03) sometimes it increases in the next epoch and then it decreases again. What I mention can be seen in epochs 75-77. But in general it decreases.
What I expect is that training accuracy always increases up to 0.99-1 and training loss always decreases. Moreover, the validation accuracy should start from maybe 0.4 and raise to for example 0.8 in the end. If validation accuracy does not improve over epochs what is the point of waiting during epochs? Also the test accuracy is close 0.81 at the end.
I also tried with my own data and came up with same situation. I processed my data in a similar way. I mean my training, validation and test points are processed in same logic as the ones in this example code.
Besides, I did not understand how this code represents the whole sentence after obtaining the outputs from LSTM for each word. Does it conduct mean or max pooling or does it take only the last output from LSTM layer before giving it to a logistic regression classifier?
Any help would be appreciated.
Using Theano backend.
Loading data...
25000 train sequences
25000 test sequences
Pad sequences (samples x time)
X_train shape: (25000, 80)
X_test shape: (25000, 80)
Build model...
Train...
Train on 22500 samples, validate on 2500 samples
Epoch 1/100
22500/22500 [==============================] - 236s - loss: 0.5438 - acc: 0.7209 - val_loss: 0.4305 - val_acc: 0.8076
Epoch 2/100
22500/22500 [==============================] - 237s - loss: 0.3843 - acc: 0.8346 - val_loss: 0.3791 - val_acc: 0.8332
Epoch 3/100
22500/22500 [==============================] - 245s - loss: 0.3099 - acc: 0.8716 - val_loss: 0.3736 - val_acc: 0.8440
Epoch 4/100
22500/22500 [==============================] - 243s - loss: 0.2458 - acc: 0.9023 - val_loss: 0.4206 - val_acc: 0.8372
Epoch 5/100
22500/22500 [==============================] - 239s - loss: 0.2120 - acc: 0.9138 - val_loss: 0.3844 - val_acc: 0.8384
....
....
Epoch 75/100
22500/22500 [==============================] - 238s - loss: 0.0134 - acc: 0.9868 - val_loss: 0.9045 - val_acc: 0.8132
Epoch 76/100
22500/22500 [==============================] - 241s - loss: 0.0156 - acc: 0.9845 - val_loss: 0.9078 - val_acc: 0.8211
Epoch 77/100
22500/22500 [==============================] - 235s - loss: 0.0129 - acc: 0.9883 - val_loss: 0.9105 - val_acc: 0.8234