5
votes

My model stops training after the 4th epoch even though I expect it to continue training beyond that. I've set monitor to validation loss and patience to 2, which I thought means that training stops after validation loss increases consecutively for 2 epochs. However, training seems to stop before that happens.

I've defined EarlyStopping as follows:

callbacks = [
        EarlyStopping(monitor='val_loss', patience=2, verbose=0),
    ]

And in the fit function I use it like this:

hist = model.fit_generator(
            generator(imgIds, batch_size=batch_size, is_train=True),
            validation_data=generator(imgIds, batch_size=batch_size, is_val=True),
            validation_steps=steps_per_val,
            steps_per_epoch=steps_per_epoch,
            epochs=epoch_count,
            verbose=verbose_level,
            callbacks=callbacks)

I don't understand why training ends after the 4th epoch.

675/675 [==============================] - 1149s - loss: 0.1513 - val_loss: 0.0860
Epoch 2/30
675/675 [==============================] - 1138s - loss: 0.0991 - val_loss: 0.1096
Epoch 3/30
675/675 [==============================] - 1143s - loss: 0.1096 - val_loss: 0.1040
Epoch 4/30
675/675 [==============================] - 1139s - loss: 0.1072 - val_loss: 0.1019
Finished training intermediate1.
1
What is the val_loss from epoch 1?Nicole White
first line - 0.0860 @NicoleWhitemegashigger
Oops I see. If anything it should have stopped after epoch 3, as your loss does not improve from epoch 1 in either epoch 2 or 3. Can you set verbose=1 in the callback and show what it says?Nicole White

1 Answers

4
votes

I think your interpretation of the EarlyStopping callback is a little off; it stops when the loss doesn't improve from the best loss it has ever seen for patience epochs. The best loss your model had was 0.0860 at epoch 1, and for epochs 2 and 3 the loss did not improve, so it should have stopped training after epoch 3. However, it continues to train for one more epoch due to an off-by-one error, at least I would call it that given what the docs say about patience, which is:

patience: number of epochs with no improvement after which training will be stopped.

From the Keras source code (edited slightly for clarity):

class EarlyStopping(Callback):
    def on_epoch_end(self, epoch, logs=None):
        current = logs.get(self.monitor)

        if np.less(current - self.min_delta, self.best):
            self.best = current
            self.wait = 0
        else:
            if self.wait >= self.patience:
                self.stopped_epoch = epoch
                self.model.stop_training = True
            self.wait += 1

Notice how self.wait isn't incremented until after the check against self.patience, so while your model should have stopped training after epoch 3, it continued for one more epoch.

Unfortunately it seems if you want a callback that behaves the way you described, where it stops training without consecutive improvement in patience epochs, you'd have to write it yourself. But I think you could just modify the EarlyStopping callback slightly to accomplish this.

Edit: The off-by-one error is fixed.