Optimal batch size and epochs for large models

Question

I know there are a number of related questions but I was hoping someone could provide some advice specific to the model I am trying to build.

It is an image classification model. At the moment I am trying to classify 40 different classes (40 different types of animals). Within each class there are between 120 and 220 images. My training set is 4708 images and my validation set is 2512 images.

I ran a sequential model (code below) where I used a batch size of 64 and 30 epochs. The code took a long time to run. The accuracy after 30 epochs was about 67 on the validation set and about 70 on the training set. The loss on the validation set was about 1.2 and about 1 on the training set (I have included the last 12 epoch results below). It appears to be tapering off after about 25 epochs.

My questions are around batch size and epochs. Is there value to using larger or smaller batch sizes (than 64) and should I be using more epochs. I read that generally between 50 and 100 epochs are common practice, but if my results are tapering off after 25 is there value to adding more.

Model

history = model.fit_generator(
    train_data_gen,
    steps_per_epoch= 4708 // batch_size,
    epochs=30,
    validation_data=val_data_gen,
    validation_steps= 2512 // batch_size
)

Results

Epoch 18/30
73/73 [==============================] - 416s 6s/step - loss: 1.0982 - accuracy: 0.6843 - val_loss: 1.3010 - val_accuracy: 0.6418
Epoch 19/30
73/73 [==============================] - 414s 6s/step - loss: 1.1215 - accuracy: 0.6712 - val_loss: 1.2761 - val_accuracy: 0.6454
Epoch 20/30
73/73 [==============================] - 414s 6s/step - loss: 1.0848 - accuracy: 0.6809 - val_loss: 1.2918 - val_accuracy: 0.6442
Epoch 21/30
73/73 [==============================] - 413s 6s/step - loss: 1.0276 - accuracy: 0.7013 - val_loss: 1.2581 - val_accuracy: 0.6430
Epoch 22/30
73/73 [==============================] - 415s 6s/step - loss: 1.0985 - accuracy: 0.6854 - val_loss: 1.2626 - val_accuracy: 0.6575
Epoch 23/30
73/73 [==============================] - 413s 6s/step - loss: 1.0621 - accuracy: 0.6949 - val_loss: 1.3168 - val_accuracy: 0.6346
Epoch 24/30
73/73 [==============================] - 415s 6s/step - loss: 1.0718 - accuracy: 0.6869 - val_loss: 1.1658 - val_accuracy: 0.6755
Epoch 25/30
73/73 [==============================] - 419s 6s/step - loss: 1.0368 - accuracy: 0.6957 - val_loss: 1.1962 - val_accuracy: 0.6739
Epoch 26/30
73/73 [==============================] - 419s 6s/step - loss: 1.0231 - accuracy: 0.7067 - val_loss: 1.3491 - val_accuracy: 0.6426
Epoch 27/30
73/73 [==============================] - 434s 6s/step - loss: 1.0520 - accuracy: 0.6919 - val_loss: 1.2039 - val_accuracy: 0.6683
Epoch 28/30
73/73 [==============================] - 417s 6s/step - loss: 0.9810 - accuracy: 0.7151 - val_loss: 1.2047 - val_accuracy: 0.6711
Epoch 29/30
73/73 [==============================] - 436s 6s/step - loss: 0.9915 - accuracy: 0.7140 - val_loss: 1.1737 - val_accuracy: 0.6711
Epoch 30/30
73/73 [==============================] - 424s 6s/step - loss: 1.0006 - accuracy: 0.7087 - val_loss: 1.2213 - val_accuracy: 0.6619

There is nothing specific to say about your model (or your data, which we don't have); regarding the batch size, the general considerations mentioned in the the SO thread How to calculate optimal batch size apply. — desertnaut

Nicolas Gervais Nicolas Gervais · Accepted Answer · 2020-04-19T13:17:32

You should only interrupt the training process when the model doesn't "learn" anymore, meaning that loss and accuracy on the validation data doesn't improve. To do this, you can put an arbitrarily high number of epochs, and use tf.keras.callbacks.EarlyStopping (documentation). This will interrupt the training process when a certain condition is met, for instance when the val_loss hasn't decreased in 10 epochs.

es = EarlyStopping(monitor='val_loss', patience=10)

fit_generator(... callbacks=[es])

This will ensure that the learning process isn't interrupted while the model is still learning, and also that the model won't overfit.

Batch size of 32 is standard, but that's a question more relevant for another site because it's about statistics (and it's very hotly debated).

Optimal batch size and epochs for large models

3 Answers