When using one of adaptive optimizers (Adam, etc.) we expect changing learning rate for successive mini-batches during training inside epoch. But I wonder how the learning rate would change between successive epochs - would it be continued from previous epoch (expected behavior) or initialized from default?
Of course by term "rate" I mean the whole bunch of variables which particular optimizer uses to determine the actual weights update wrt gradient)
Also what would happen to the rate if I run training for N epochs, stop and then continue like this:
model.fit(data1_train_x,data1_train_y, \
initial_epoch=0, \
epochs=20, \
validation_split=0.1,\
batch_size=64, \
callbacks=[tensorboard])
model.fit(data2_train_x,data2_train_y, \
initial_epoch=20, \
epochs=40, \
validation_split=0.1,\
batch_size=64, \
callbacks=[tensorboard])
I think I"ll create callback to log the rate after each epoch and plot it, but before I do it, may be someone already has the answers.