7
votes

I wanted to train on multiple CPU so i run this command

C:\Users\solution\Desktop\Tensorflow\research>python object_detection/train.py --logtostderr --pipeline_config_path=C:\Users\solution\Desktop\Tensorflow\myFolder\power_drink.config --train_dir=C:\Users\solution\Desktop\Tensorflow\research\object_detection\train --num_clones=2 --clone_on_cpu=True

and i got the following error

Traceback (most recent call last): File "object_detection/train.py", line 169, in tf.app.run() File "C:\Users\solution\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\platform\app.py", line 124, in run _sys.exit(main(argv)) File "object_detection/train.py", line 165, in main worker_job_name, is_chief, FLAGS.train_dir) File "C:\Users\solution\Desktop\Tensorflow\research\object_detection\trainer.py", line 246, in train clones = model_deploy.create_clones(deploy_config, model_fn, [input_queue]) File "C:\Users\solution\Desktop\Tensorflow\research\slim\deployment\model_deploy.py", line 193, in create_clones outputs = model_fn(*args, **kwargs) File "C:\Users\solution\Desktop\Tensorflow\research\object_detection\trainer.py", line 158, in _create_losses train_config.merge_multiple_label_boxes) ValueError: not enough values to unpack (expected 7, got 0)

If i set num_clones to 1 or omitted it, it works normally. I also tries setting --ps_tasks=1 which doesn't help

any advice would be appreciated

2
python3 issue probably. In models/research/object_detection/utils/learning_schedules.py change range(num_boundaries) to list(range(num_boundaries))Austin

2 Answers

3
votes

You don't mention which type of model you are training - if like me you were using the default model from the TensorFlow Object Detection API example (Faster-RCNN-Inception-V2) then num_clones should equal the batch_size. I was using a GPU however, but when I went from one clone to two, I saw a similar error and setting batch_size: 2 in the training config file was the solution.

2
votes

I solved this issue by changing one parameter in my original configuration slightly:

...
train_config: {
  fine_tune_checkpoint: "C:/some_path/model.ckpt"
  batch_size: 1
  sync_replicas: true
  startup_delay_steps: 0
  replicas_to_aggregate: 8
  num_steps: 25000
  ...
}
...

Changing the parameter replicas_to_aggregate: 1, or setting sync_replicas: false both solves the problem for me, since I was training only on one graphics card and did not have any replicas (as you would have when training on TPU).