7
votes

I am training a neural network in batches with Keras 2.0 package for Python. Below is some information about the data and the training parameters:

  • #samples in train: 414934
  • #features: 590093
  • #classes: 2 (binary classification problem)
  • batch size: 1024
  • #batches = 406 (414934 / 1024 = 405.2)

Below are some logs of the follow code:

for i in range(epochs):
    print("train_model:: starting epoch {0}/{1}".format(i + 1, epochs))
    model.fit_generator(generator=batch_generator(data_train, target_train, batch_size),
                        steps_per_epoch=num_of_batches,
                        epochs=1,
                        verbose=1)

(partial) Logs:

train_model:: starting epoch 1/3                                                            
Epoch 1/1                                                                                   
  1/406 [..............................] - ETA: 11726s - loss: 0.7993 - acc: 0.5996         
  2/406 [..............................] - ETA: 11237s - loss: 0.7260 - acc: 0.6587         
  3/406 [..............................] - ETA: 14136s - loss: 0.6619 - acc: 0.7279         
404/406 [============================>.] - ETA: 53s - loss: 0.3542 - acc: 0.8917            
405/406 [============================>.] - ETA: 26s - loss: 0.3541 - acc: 0.8917            
406/406 [==============================] - 10798s - loss: 0.3539 - acc: 0.8918              
train_model:: starting epoch 2/3                                                            
Epoch 1/1                                                                                   
  1/406 [..............................] - ETA: 15158s - loss: 0.2152 - acc: 0.9424         
  2/406 [..............................] - ETA: 14774s - loss: 0.2109 - acc: 0.9419         
  3/406 [..............................] - ETA: 16132s - loss: 0.2097 - acc: 0.9408         
404/406 [============================>.] - ETA: 64s - loss: 0.2225 - acc: 0.9329            
405/406 [============================>.] - ETA: 32s - loss: 0.2225 - acc: 0.9329            
406/406 [==============================] - 13127s - loss: 0.2225 - acc: 0.9329              
train_model:: starting epoch 3/3                                                            
Epoch 1/1                                                                                   
  1/406 [..............................] - ETA: 22631s - loss: 0.1145 - acc: 0.9756         
  2/406 [..............................] - ETA: 24469s - loss: 0.1220 - acc: 0.9688         
  3/406 [..............................] - ETA: 23475s - loss: 0.1202 - acc: 0.9691         
404/406 [============================>.] - ETA: 60s - loss: 0.1006 - acc: 0.9745            
405/406 [============================>.] - ETA: 31s - loss: 0.1006 - acc: 0.9745            
406/406 [==============================] - 11147s - loss: 0.1006 - acc: 0.9745    

My question is: what happens after each epoch that improves the accuracy like that? For example, the accuracy at the end of the first epoch is 0.8918, but at the beginning of the second epoch accuracy of 0.9424 is observed. Similarly, the accuracy at the end of the second epoch is 0.9329, but the third epoch starts with accuracy of 0.9756.

I would expect to find an accuracy of ~0.8918 at the beginning of the second epoch, and ~0.9329 at the beginning of the third epoch.

I know that in each batch there is one forward pass and one backward pass of training samples in the batch. Thus, in each epoch there is one forward pass and one backward pass of all training samples.

Also, from Keras documentation:

Epoch: an arbitrary cutoff, generally defined as "one pass over the entire dataset", used to separate training into distinct phases, which is useful for logging and periodic evaluation.

Why is the accuracy improvement within each epoch is smaller than the accuracy improvement between the end of epoch X and the beginning of epoch X+1?

2

2 Answers

15
votes

This has nothing to do with your model or your dataset; the reason for this "jump" lies in how metrics are calculated and displayed in Keras.

As Keras processes batch after batch, it saves accuracies at each one of them, and what it displays to you is not the accuracy on the latest processed batch, but the average over all batches in the current epoch. And, as the model is being trained, accuracies over successive batches tend to improve.

Now consider: in the first epoch, let's say, there are 50 batches, and network went from 0% to 90% during these 50 batches. Then at the end of the epoch Keras will show accuracy of, e.g. (0 + 0.1 + 0.5 + ... + 90) / 50%, which is, obviously, much less than 90%! But, because your actual accuracy is 90%, the first batch of the second epoch will show 90%, giving the impression of a sudden "jump" in quality. The same, obviously, goes for loss or any other metric.

Now, if you want more realistic and trustworthy calculation of accuracy, loss, or any other metric you may find yourself using, I would suggest using validation_data parameter in model.fit[_generator] to provide validation data, which will not be used for training, but will be used only to evaluate the network at the end of each epoch, without averaging over various points in time.

2
votes

The accuracy at the end of an epoch is the accuracy over the full dataset. The accuracy after each batch is the accuracy over all batches that are used for training at that moment. It could be the case that your first batch is predicted very well and the following batches have a lower accuracy. In that case the accuracy over your full dataset will be low compared to the accuracy of your first batch.