EarlyStopping TensorFlow 2.0

Question

I am running a code using Python 3.7.5 with TensorFlow 2.0 for MNIST classification. I am using EarlyStopping from TensorFlow 2.0 and the callback I have for it is:

callbacks = [
             tf.keras.callbacks.EarlyStopping(
                 monitor='val_loss', patience = 3,
                 min_delta=0.001
             )
]

According to EarlyStopping - TensorFlow 2.0 page, the definition of min_delta parameter is as follows:

min_delta: Minimum change in the monitored quantity to qualify as an improvement, i.e. an absolute change of less than min_delta, will count as no improvement.

Train on 60000 samples, validate on 10000 samples

Epoch 1/15 60000/60000 [==============================] - 10s 173us/sample - loss: 0.2040 - accuracy: 0.9391 - val_loss: 0.1117 - val_accuracy: 0.9648

Epoch 2/15 60000/60000 [==============================] - 9s 150us/sample - loss: 0.0845 - accuracy: 0.9736 - val_loss: 0.0801 - val_accuracy: 0.9748

Epoch 3/15 60000/60000 [==============================] - 9s 151us/sample - loss: 0.0574 - accuracy: 0.9817 - val_loss: 0.0709 - val_accuracy: 0.9795

Epoch 4/15 60000/60000 [==============================] - 9s 149us/sample - loss: 0.0434 - accuracy: 0.9858 - val_loss: 0.0787 - val_accuracy: 0.9761

Epoch 5/15 60000/60000 [==============================] - 9s 151us/sample - loss: 0.0331 - accuracy: 0.9893 - val_loss: 0.0644 - val_accuracy: 0.9808

Epoch 6/15 60000/60000 [==============================] - 9s 150us/sample - loss: 0.0275 - accuracy: 0.9910 - val_loss: 0.0873 - val_accuracy: 0.9779

Epoch 7/15 60000/60000 [==============================] - 9s 151us/sample - loss: 0.0232 - accuracy: 0.9921 - val_loss: 0.0746 - val_accuracy: 0.9805

Epoch 8/15 60000/60000 [==============================] - 9s 151us/sample - loss: 0.0188 - accuracy: 0.9936 - val_loss: 0.1088 - val_accuracy: 0.9748

Now if I look at the last three epochs viz., epochs 6, 7, and 8 and look at the validation loss ('val_loss'), their values are:

0.0688, 0.0843 and 0.0847.

And the differences between consecutive 3 terms are: 0.0155, 0.0004. But isn't the first difference greater than 'min_delta' as defined in the callback.

The code I came up with for EarlyStopping is as follows:

# numpy array to hold last 'patience = 3' values-
pv = [0.0688, 0.0843, 0.0847]

# numpy array to compute differences between consecutive elements in 'pv'-
differences = np.diff(pv, n=1)

differences
# array([0.0155, 0.0004])


# minimum change required for monitored metric's improvement-
min_delta = 0.001

# Check whether the consecutive differences is greater than 'min_delta' parameter-
check = differences > min_delta

check
# array([ True,  False])

# Condition to see whether all 3 'val_loss' differences are less than 'min_delta'
# for training to stop since EarlyStopping is called-
if np.all(check == False):
    print("Stop Training - EarlyStopping is called")
    # stop training

But according to the 'val_loss', the differences between the not ALL of the 3 last epochs are greater than 'min_delta' of 0.001. For example, the first difference is greater than 0.001 (0.0843 - 0.0688) while the second difference is less than 0.001 (0.0847 - 0.0843).

Also, according to patience parameter definition of "EarlyStopping":

patience: Number of epochs with no improvement after which training will be stopped.

So, EarlyStopping should be called when there is no improvement for 'val_loss' for 3 consecutive epochs where the absolute change of less than 'min_delta' does not count as improvement.

Then why is EarlyStopping called?

Code for model definition and 'fit()' are:

import tensorflow_model_optimization as tfmot 
from tensorflow_model_optimization.sparsity import keras as sparsity 
import matplotlib.pyplot as plt from tensorflow.keras.layers import AveragePooling2D, Conv2D
from tensorflow.keras import models, layers, datasets 
from tensorflow.keras.layers import Dense, Flatten, Reshape, Input, InputLayer 
from tensorflow.keras.models import Sequential, Model

# Specify the parameters to be used for layer-wise pruning, NO PRUNING is done here: 
pruning_params_unpruned = {
    'pruning_schedule': sparsity.PolynomialDecay(
        initial_sparsity=0.0, final_sparsity=0.0,
        begin_step = 0, end_step=0, frequency=100) }


def pruned_nn(pruning_params):
    """
    Function to define the architecture of a neural network model
    following 300 100 architecture for MNIST dataset and using
    provided parameter which are used to prune the model.

    Input: 'pruning_params' Python 3 dictionary containing parameters which are used for pruning
    Output: Returns designed and compiled neural network model
    """

    pruned_model = Sequential()
    pruned_model.add(l.InputLayer(input_shape=(784, )))
    pruned_model.add(Flatten())
    pruned_model.add(sparsity.prune_low_magnitude(
        Dense(units = 300, activation='relu', kernel_initializer=tf.initializers.GlorotUniform()),
        **pruning_params))
    # pruned_model.add(l.Dropout(0.2))
    pruned_model.add(sparsity.prune_low_magnitude(
        Dense(units = 100, activation='relu', kernel_initializer=tf.initializers.GlorotUniform()),
        **pruning_params))
    # pruned_model.add(l.Dropout(0.1))
    pruned_model.add(sparsity.prune_low_magnitude(
        Dense(units = num_classes, activation='softmax'),
        **pruning_params))

    # Compile pruned CNN-
    pruned_model.compile(
        loss=tf.keras.losses.categorical_crossentropy,
        # optimizer='adam',
        optimizer=tf.keras.optimizers.Adam(lr = 0.001),
        metrics=['accuracy'])

    return pruned_model


batch_size = 32
epochs = 50



# Instantiate NN- 
orig_model = pruned_nn(pruning_params_unpruned)


# Train unpruned Neural Network-
history_orig = orig_model.fit(
    x = X_train, y = y_train,
    batch_size = batch_size,
    epochs = epochs,
    verbose = 1,
    callbacks = callbacks,
    validation_data = (X_test, y_test),
    shuffle = True )

Those numbers you are referring to belong to accuracy rather than val_loss. Take a closer look. — Mehraban

Hichame Yessou Hichame Yessou · Accepted Answer · 2020-01-28T16:12:37

The behaviour of the Early Stopping callback is related to:

Metric or Loss to be monitored
min_delta which is the minimum quantity to be considered an improvement, between the performance of the monitored quantity in the current epoch and the best result in that metric.
patience which is the number of epochs without improvements (taking into consideration that improvements have to be of a greater change than min_delta) before stopping the algorithm.

In your case, the best val_lossis 0.0644 and should have a value of lower than 0.0634 to be registered as improvement:

Epoch 6/15 val_loss: 0.0873 | Difference is: + 0.0229
Epoch 7/15 val_loss: 0.0746 | Difference is: + 0.0102
Epoch 8/15 val_loss: 0.1088 | Difference is: + 0.0444

Be aware that the quantities that are printed in the "training log", are approximated and you shouldn't base your assumptions on these values. You should rather take into consideration the true values from callbacks or the history of the training.

Reference

EarlyStopping TensorFlow 2.0

1 Answers