I am training a neural network in batches with Keras 2.0
package for Python
.
Below is some information about the data and the training parameters:
- #samples in train: 414934
- #features: 590093
- #classes: 2 (binary classification problem)
- batch size: 1024
- #batches = 406 (414934 / 1024 = 405.2)
Below are some logs of the follow code:
for i in range(epochs):
print("train_model:: starting epoch {0}/{1}".format(i + 1, epochs))
model.fit_generator(generator=batch_generator(data_train, target_train, batch_size),
steps_per_epoch=num_of_batches,
epochs=1,
verbose=1)
(partial) Logs:
train_model:: starting epoch 1/3
Epoch 1/1
1/406 [..............................] - ETA: 11726s - loss: 0.7993 - acc: 0.5996
2/406 [..............................] - ETA: 11237s - loss: 0.7260 - acc: 0.6587
3/406 [..............................] - ETA: 14136s - loss: 0.6619 - acc: 0.7279
404/406 [============================>.] - ETA: 53s - loss: 0.3542 - acc: 0.8917
405/406 [============================>.] - ETA: 26s - loss: 0.3541 - acc: 0.8917
406/406 [==============================] - 10798s - loss: 0.3539 - acc: 0.8918
train_model:: starting epoch 2/3
Epoch 1/1
1/406 [..............................] - ETA: 15158s - loss: 0.2152 - acc: 0.9424
2/406 [..............................] - ETA: 14774s - loss: 0.2109 - acc: 0.9419
3/406 [..............................] - ETA: 16132s - loss: 0.2097 - acc: 0.9408
404/406 [============================>.] - ETA: 64s - loss: 0.2225 - acc: 0.9329
405/406 [============================>.] - ETA: 32s - loss: 0.2225 - acc: 0.9329
406/406 [==============================] - 13127s - loss: 0.2225 - acc: 0.9329
train_model:: starting epoch 3/3
Epoch 1/1
1/406 [..............................] - ETA: 22631s - loss: 0.1145 - acc: 0.9756
2/406 [..............................] - ETA: 24469s - loss: 0.1220 - acc: 0.9688
3/406 [..............................] - ETA: 23475s - loss: 0.1202 - acc: 0.9691
404/406 [============================>.] - ETA: 60s - loss: 0.1006 - acc: 0.9745
405/406 [============================>.] - ETA: 31s - loss: 0.1006 - acc: 0.9745
406/406 [==============================] - 11147s - loss: 0.1006 - acc: 0.9745
My question is: what happens after each epoch that improves the accuracy like that? For example, the accuracy at the end of the first epoch is 0.8918, but at the beginning of the second epoch accuracy of 0.9424 is observed. Similarly, the accuracy at the end of the second epoch is 0.9329, but the third epoch starts with accuracy of 0.9756.
I would expect to find an accuracy of ~0.8918 at the beginning of the second epoch, and ~0.9329 at the beginning of the third epoch.
I know that in each batch there is one forward pass and one backward pass of training samples in the batch. Thus, in each epoch there is one forward pass and one backward pass of all training samples.
Also, from Keras documentation:
Epoch: an arbitrary cutoff, generally defined as "one pass over the entire dataset", used to separate training into distinct phases, which is useful for logging and periodic evaluation.
Why is the accuracy improvement within each epoch is smaller than the accuracy improvement between the end of epoch X and the beginning of epoch X+1?