Resume Training tf.keras Tensorboard

Question

I encountered some problems when I continued training my model and visualized the progress on tensorboard.

My question is how do I resume training from the same step without specifying any epoch manually? If possible, simply by loading the saved model, it somehow could read the global_step from the optimizer saved and continue training from there.

I have provided some codes below to reproduce similar errors.

import tensorflow as tf
from tensorflow.keras.callbacks import TensorBoard
from tensorflow.keras.models import load_model

mnist = tf.keras.datasets.mnist

(x_train, y_train),(x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)),
  tf.keras.layers.Dense(512, activation=tf.nn.relu),
  tf.keras.layers.Dropout(0.2),
  tf.keras.layers.Dense(10, activation=tf.nn.softmax)
])
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

model.fit(x_train, y_train, epochs=10, callbacks=[Tensorboard()])
model.save('./final_model.h5', include_optimizer=True)

del model

model = load_model('./final_model.h5')
model.fit(x_train, y_train, epochs=10, callbacks=[Tensorboard()])

You can run the tensorboard by using the command:

tensorboard --logdir ./logs

Even if you load the model, TensorFlow will treat the metrics from the starting point. Also, this is because epochs start again from 0 and not from where it ended like 8 epochs. — Shubham Panchal

melaanya melaanya · Accepted Answer · 2019-03-06T14:02:42

You can set the parameter initial_epoch in the function model.fit() to the number of the epoch you want your training to start from. Take into account that the model trains until the epoch of index epochs is reached (and not a number of iterations given by epochs). In your example, if you want to train for 10 epochs more, it should be:

model.fit(x_train, y_train, initial_epoch=9, epochs=19, callbacks=[Tensorboard()])

It will allow you to visualise your plots on Tensorboard in a correct manner. More extensive information about these parameters can be found in the docs.

Resume Training tf.keras Tensorboard

3 Answers