Deep Learning: small dataset with keras : local minima

Question

For my thesis, I'm running a 4 layered deep network for sequence to sequence translation use-case 150 x Conv(64,5) x GRU (100) x softmax activation on last stage with loss='categorical_crossentropy'.

Training loss and accuracy converge optimally pretty quickly where as validation loss and accuracy seem to be stuck in val_acc 97 to 98.2 range, unable to go past beyond that.

Is my model overfitting?

Have tried dropout of 0.2 between layers.

Output after drop-out
    Epoch 85/250
    [==============================] - 3s - loss: 0.0057 - acc: 0.9996 - val_loss: 0.2249 - val_acc: 0.9774
    Epoch 86/250
    [==============================] - 3s - loss: 0.0043 - acc: 0.9987 - val_loss: 0.2063 - val_acc: 0.9774
    Epoch 87/250
    [==============================] - 3s - loss: 0.0039 - acc: 0.9987 - val_loss: 0.2180 - val_acc: 0.9809
    Epoch 88/250
    [==============================] - 3s - loss: 0.0075 - acc: 0.9978 - val_loss: 0.2272 - val_acc: 0.9774
    Epoch 89/250
    [==============================] - 3s - loss: 0.0078 - acc: 0.9974 - val_loss: 0.2265 - val_acc: 0.9774
    Epoch 90/250
    [==============================] - 3s - loss: 0.0027 - acc: 0.9996 - val_loss: 0.2212 - val_acc: 0.9809
    Epoch 91/250
    [==============================] - 3s - loss: 3.2185e-04 - acc: 1.0000 - val_loss: 0.2190 - val_acc: 0.9809
    Epoch 92/250
    [==============================] - 3s - loss: 0.0020 - acc: 0.9991 - val_loss: 0.2239 - val_acc: 0.9792
    Epoch 93/250
    [==============================] - 3s - loss: 0.0047 - acc: 0.9987 - val_loss: 0.2163 - val_acc: 0.9809
    Epoch 94/250
    [==============================] - 3s - loss: 2.1863e-04 - acc: 1.0000 - val_loss: 0.2190 - val_acc: 0.9809
    Epoch 95/250
    [==============================] - 3s - loss: 0.0011 - acc: 0.9996 - val_loss: 0.2190 - val_acc: 0.9809
    Epoch 96/250
    [==============================] - 3s - loss: 0.0040 - acc: 0.9987 - val_loss: 0.2289 - val_acc: 0.9792
    Epoch 97/250
    [==============================] - 3s - loss: 2.9621e-04 - acc: 1.0000 - val_loss: 0.2360 - val_acc: 0.9792
    Epoch 98/250
    [==============================] - 3s - loss: 4.3776e-04 - acc: 1.0000 - val_loss: 0.2437 - val_acc: 0.9774

Marcin Możejko Marcin Możejko · Accepted Answer · 2017-07-19T11:52:40

The case you presented is a really complexed one. In order to answer your question if overfitting is actually happening in your case you need to answer two questions:

Are results obtained on validation set satisfying?- the main purpose of a validation set is to provide you with insights what will happen when new data arrives. If you are satisfied with an accuracy on a validation set then you should think about your model as not overfitting too much.
Should I worry on extremely high accuracy of your model on a training set?- you may easily notice that your model is almost perfect on a training set. This could mean that it learned some patterns by heart. Usually - there is always some noise in your data - and the property of your model to be perfect on a data - means that it probably uses some part of its capacity to learn bias. To test that I usually prefer to test positive examples with a lowest score or negative samples with a highest score - as outliers are usually in these two groups (model is struggling to push them above / below 0.5 treshold).

So - after checking these two concerns you may get an answer if your model overfit. The behaviour you presented is really nice - and what could be the actual reason behind is that there are few patterns in a validation set which are not properly covered in a training set. But this is something you should always take into account when you are designing a Machine Learning solution.

Deep Learning: small dataset with keras : local minima

2 Answers