0
votes

I know there are a number of related questions but I was hoping someone could provide some advice specific to the model I am trying to build.

It is an image classification model. At the moment I am trying to classify 40 different classes (40 different types of animals). Within each class there are between 120 and 220 images. My training set is 4708 images and my validation set is 2512 images.

I ran a sequential model (code below) where I used a batch size of 64 and 30 epochs. The code took a long time to run. The accuracy after 30 epochs was about 67 on the validation set and about 70 on the training set. The loss on the validation set was about 1.2 and about 1 on the training set (I have included the last 12 epoch results below). It appears to be tapering off after about 25 epochs.

My questions are around batch size and epochs. Is there value to using larger or smaller batch sizes (than 64) and should I be using more epochs. I read that generally between 50 and 100 epochs are common practice, but if my results are tapering off after 25 is there value to adding more.

Model

history = model.fit_generator(
    train_data_gen,
    steps_per_epoch= 4708 // batch_size,
    epochs=30,
    validation_data=val_data_gen,
    validation_steps= 2512 // batch_size
)

Results

Epoch 18/30
73/73 [==============================] - 416s 6s/step - loss: 1.0982 - accuracy: 0.6843 - val_loss: 1.3010 - val_accuracy: 0.6418
Epoch 19/30
73/73 [==============================] - 414s 6s/step - loss: 1.1215 - accuracy: 0.6712 - val_loss: 1.2761 - val_accuracy: 0.6454
Epoch 20/30
73/73 [==============================] - 414s 6s/step - loss: 1.0848 - accuracy: 0.6809 - val_loss: 1.2918 - val_accuracy: 0.6442
Epoch 21/30
73/73 [==============================] - 413s 6s/step - loss: 1.0276 - accuracy: 0.7013 - val_loss: 1.2581 - val_accuracy: 0.6430
Epoch 22/30
73/73 [==============================] - 415s 6s/step - loss: 1.0985 - accuracy: 0.6854 - val_loss: 1.2626 - val_accuracy: 0.6575
Epoch 23/30
73/73 [==============================] - 413s 6s/step - loss: 1.0621 - accuracy: 0.6949 - val_loss: 1.3168 - val_accuracy: 0.6346
Epoch 24/30
73/73 [==============================] - 415s 6s/step - loss: 1.0718 - accuracy: 0.6869 - val_loss: 1.1658 - val_accuracy: 0.6755
Epoch 25/30
73/73 [==============================] - 419s 6s/step - loss: 1.0368 - accuracy: 0.6957 - val_loss: 1.1962 - val_accuracy: 0.6739
Epoch 26/30
73/73 [==============================] - 419s 6s/step - loss: 1.0231 - accuracy: 0.7067 - val_loss: 1.3491 - val_accuracy: 0.6426
Epoch 27/30
73/73 [==============================] - 434s 6s/step - loss: 1.0520 - accuracy: 0.6919 - val_loss: 1.2039 - val_accuracy: 0.6683
Epoch 28/30
73/73 [==============================] - 417s 6s/step - loss: 0.9810 - accuracy: 0.7151 - val_loss: 1.2047 - val_accuracy: 0.6711
Epoch 29/30
73/73 [==============================] - 436s 6s/step - loss: 0.9915 - accuracy: 0.7140 - val_loss: 1.1737 - val_accuracy: 0.6711
Epoch 30/30
73/73 [==============================] - 424s 6s/step - loss: 1.0006 - accuracy: 0.7087 - val_loss: 1.2213 - val_accuracy: 0.6619
3
There is nothing specific to say about your model (or your data, which we don't have); regarding the batch size, the general considerations mentioned in the the SO thread How to calculate optimal batch size apply.desertnaut

3 Answers

2
votes

You should only interrupt the training process when the model doesn't "learn" anymore, meaning that loss and accuracy on the validation data doesn't improve. To do this, you can put an arbitrarily high number of epochs, and use tf.keras.callbacks.EarlyStopping (documentation). This will interrupt the training process when a certain condition is met, for instance when the val_loss hasn't decreased in 10 epochs.

es = EarlyStopping(monitor='val_loss', patience=10)
fit_generator(... callbacks=[es])

This will ensure that the learning process isn't interrupted while the model is still learning, and also that the model won't overfit.

Batch size of 32 is standard, but that's a question more relevant for another site because it's about statistics (and it's very hotly debated).

1
votes

Yes, if you can go for the as large batch size as you can.

High batch size almost always results in faster convergence, short training time. If you have a GPU with a good memory, just go as high as you can.

As for as epochs, it is hard to decide, as I can your model is still improving in 28-29 epochs so you may have to train for more epochs to get a better model, but aslo look for val_accuracy, it seems your val_acc is improving too which suggests model needs more training.

You can use ModelCheckpoint to store models after each epochs to get the best version of your model. https://www.tensorflow.org/api_docs/python/tf/keras/callbacks/ModelCheckpoint

You can use keras

1
votes

There are three reasons to choose a batch size.

  1. Speed. If you are using a GPU then larger batches are often nearly as fast to process as smaller batches. That means individual cases are much faster, which means each epoch is faster too.
  2. Regularization. Smaller batches add regularization, similar to increasing dropout, increasing the learning rate, or adding weight decay. Larger batches will reduce regularization.
  3. Memory constraints. This one is a hard limit. At a certain point your GPU just won't be able to fit all the data in memory, and you can't increase batch size any more.

That suggests that larger batch sizes are better until you run out of memory. Unless you are having trouble with overfitting, a larger and still-working batch size will (1) speed up training and (2) allow a larger learning rate, which also speeds up the training process.

That second point comes about because of regularization. If you increase batch size, the reduced regularization gives back some "regularization budget" to spend on an increased learning rate, which will add that regularization back.


Low regularization means that training is very smooth, which means that it is easy for training to converge but also easy for training to overfit.

High regularization means that training is more noisy or difficult, but validation results are better because the noisy training process reduces overfitting and the resulting generalization error.

If you are familiar with the Bias-Variance Tradeoff, adding regularization is a way of adding a bit of bias in order to reduce the variance. Here is one of many good write ups on the subject: Regularization: the path to bias-variance trade-off.


On the broader topic of regularization, training schedules, and hyper-parameter tuning, I highly recommend two papers on the subject by Leslie N. Smith.

The first paper, on Super-Convergence, will also address your some of your questions on how many epochs to use.


  • Keep the training schedule as fast as possible for as long as possible while you are working on the model. Faster training means can try more ideas and tune your hyper-parameters more finely.
  • When you are ready to fine-tune results for some reason (submitting to Kaggle, deploying a model to production) then you can increase epochs and do some final hyper-parameter tuning until validation results stop improving "enough", where "enough" is a combination of your patience and the need for better results.