2
votes

I am trying to train a TensorFlow object detection model on a custom dataset on google colab and I have a saved model trained for 5000 steps, is it possible to use saved model to resume training? I am planning to train for another 20000 steps. I am using google colab for training and the training will take around 36 hours, so I'm planning to use checkpoint. How to store best model checkpoints and use them when session runs out?

1
There is a pretty good explanation on google colab itself, have you tried this?YashvanderBamel
Hi @YashvanderBamel, thank you for your attention. I am using tensorflow object detection api for training, so I was looking at options in terms of object detection api. Can you please inform me about how to implement this for object detection modelshivu kumar
Please post what have you done and/or an error or problem in you're running into and also provide an expected output/result, like this nobody would be able to answer "How to save checkpoints in google colab?" When I trained a model using google colab, I was able to follow the procedure I mentioned.YashvanderBamel

1 Answers

2
votes

For resuming training using weights from a saved checkpoint, in your pipeline.config file, change the line containing fine_tune_checkpoint from <path_to_ckpt>/model.ckpt to <path_to_ckpt>/model.ckpt-XXXX where XXXX is your checkpoint number.

As far as saving only best weights is concerned, you can refer to this post and/or this GitHub link