8
votes

I have just started working with tensorflow in python. I am trying to train Single shot detection using tensorflow for pascalvoc dataset. While creating tfrecords and during evaluation using VGG_VOC0712_SSD_300x300_ft_iter_120000.ckpt trained model there was no error. Whereas, when I am trying to train pascalvoc 2007 or 2012 datasets using ssd_300_vgg.ckpt pre-trained model I am getting following error.

2017-08-25 20:03:03.001268: I tensorflow/core/common_runtime gpu/gpu_device.cc:1045] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Quadro M5000M, pci bus id: 0000:01:00.0)
INFO:tensorflow:Error reported to Coordinator: <type 'exceptions.ValueError'>, Can't load save_path when it is None.
Traceback (most recent call last):
  File "train_ssd_network.py", line 391, in <module>
    tf.app.run()
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 48, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "train_ssd_network.py", line 387, in main
    sync_optimizer=None)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/slim/python/slim/learning.py", line 738, in train
    master, start_standard_services=False, config=session_config) as sess:
  File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__
    return self.gen.next()
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/supervisor.py", line 965, in managed_session
    self.stop(close_summary_writer=close_summary_writer)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/supervisor.py", line 793, in stop
    stop_grace_period_secs=self._stop_grace_secs)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/coordinator.py", line 389, in join
    six.reraise(*self._exc_info_to_raise)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/supervisor.py", line 954, in managed_session
    start_standard_services=start_standard_services)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/supervisor.py", line 709, in prepare_or_wait_for_session
    init_feed_dict=self._init_feed_dict, init_fn=self._init_fn)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/session_manager.py", line 281, in prepare_session
    init_fn(sess)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/framework/python/ops/variables.py", line 660, in callback
    saver.restore(session, model_path)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 1558, in restore
    raise ValueError("Can't load save_path when it is None.")
ValueError: Can't load save_path when it is None.

I am using following script to fine-tune the model

DATASET_DIR=./tfrecords
TRAIN_DIR=./logs/
CHECKPOINT_PATH=./checkpoints/ssd_300_vgg.ckpt
python train_ssd_network.py \
    --train_dir=${TRAIN_DIR} \
    --dataset_dir=${DATASET_DIR} \
    --dataset_name=pascalvoc_2012 \
    --dataset_split_name=train \
    --model_name=ssd_300_vgg \
    --checkpoint_path=${CHECKPOINT_PATH} \
    --save_summaries_secs=60 \
    --save_interval_secs=600 \
    --weight_decay=0.0005 \
    --optimizer=adam \
    --learning_rate=0.001 \
    --batch_size=10

The model ssd_300_vgg.ckpt is stored at the location ./checkpoints

Please let me know if anyone has the solution.

5
would you share the code section, where you perform the save machine learning model? - Cloud Cho

5 Answers

8
votes

Three suggestions:

  • Check the path when restoring the model

    saver = tf.train.import_meta_graph(model_path)

  • Check the path when restoring the checkpoint

    saver.restore(sess, tf.train.latest_checkpoint(cur_dir))

  • Check the parameters when saving the model

    saver = tf.train.Saver(save_relative_paths=True)
    
3
votes

CHECKPOINT_PATH=./checkpoints/ssd_300_vgg.ckpt/ssd_300_vgg.ckpt

2
votes

I was having the same problem even when I was proving correct path.

I was passing the correct path like this :

sess =  tf.Session()
saver = tf.train.import_meta_graph('model_dir/model.meta')
restore = saver.restore(sess,tf.train.latest_checkpoint('model_dir/'))

But I was getting the error, so I opened checkpoint file as .txt, the path in checkpoint file was wrong that's why it was not able to load the file.

So if you are getting the same error, check the checkpoint file path by opening it.

0
votes

check your path, you might be pointing to a non-existing file.

0
votes

What someone might want, when checking through those answers:

saver.restore(sess, tf.train.latest_checkpoint("directorytosavedmodel/./"))

in other words the ./ works in the directory the model is saved… (I was looking through this thread thinking, I just want to restore a model not save one before I restore..)