save checkpoint with Tensorflow

Question

I have 3 folders for my CNN model which are train_data, val_data, test_data.

when I am training my model, I found that the accuracy may vary and sometimes the last epoch does not show the best accuracy. for example, last epoch accuracy is 71% but I found the better accuracy in the earlier epoch. I want to save the checkpoint of that epoch which has higher accuracy and then use that checkpoint to predict my model on test_data

I trained my model on train_data and predicted on val_data and save the checkpoint of the model like below:

    print("{} Saving checkpoint of model...". format(datetime.now()))
    checkpoint_path = os.path.join(checkpoint_dir, 'model_epoch' + str(epoch) + '.ckpt')
    save_path = saver.save(session, checkpoint_path)

and before starting the tf.Session() I have this line:

saver = tf.train.Saver()

I want to know how can I save the best epoch which has higher accuracy and then use this checkpoint for my test_data?

By the way, seeing your code, you should not add '.ckpt' to your path as you are doing. You should only specify the directory. — nairouz mrabah
A note: from what I understand, by doing this you are actually training on the validation data. The is no guarantee that it will perform better on test data. — geometrikal
@geometrikal, You are right. There is no guarantee that it performs well on my test data but the aim of validation set is to find the best hyperparameters for your model and then use those hyperparameters on the test data. that's why I want to save the best hyperparameters of my model and then use them on test — user2975921
Out of interest, do you use learning rate decay? I have found that can both improve and stabilise accuracy. — geometrikal

David Parks David Parks · Accepted Answer · 2018-05-04T18:33:00

The tf.train.Saver() documentation describes the following:

saver.save(sess, 'my-model', global_step=0) ==> filename: 'my-model-0'
...
saver.save(sess, 'my-model', global_step=1000) ==> filename: 'my-model-1000'

Note that if you pass global_step to the saver you will generate checkpoint files that contain the global step number. I generally save checkpoints every X minutes and then come back and review the results and choose a checkpoint at the appropriate step value. If you're using tensorboard you'll find this intuitive since all your graphs can be displayed by global step as well.

https://www.tensorflow.org/api_docs/python/tf/train/Saver

save checkpoint with Tensorflow

2 Answers