1
votes

I'm using an object detection module for classifying images. My specs are as follows:

  • OS: Ubuntu 18.04 LTS
  • Python: 3.6.7
  • VirtualEnv: Version: 16.4.3
  • Pip3 version inside virtualenv: 19.0.3
  • TensorFlow Version: 1.13.1
  • Protoc Version: 3.0.0-9

I'm working on Windows virtualenv and google-colab. This is the error message I get:

python3 legacy/train.py --logtostderr --train_dir=training/ --pipeline_config_path=training/ssd_mobilenet_v1_pets.config


INFO:tensorflow:global step 1: loss = 18.5013 (48.934 sec/step)
INFO:tensorflow:Finished training! Saving model to disk.
/home/priyank/venv/lib/python3.6/site-packages/tensorflow/python/summary/writer/writer.py:386: UserWarning: Attempting to use a closed FileWriter. The operation will be a noop unless the FileWriter is explicitly reopened.
  warnings.warn("Attempting to use a closed FileWriter. "
Traceback (most recent call last):
  File "legacy/train.py", line 184, in <module>
    tf.app.run()
  File "/home/priyank/venv/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 125, in run
    _sys.exit(main(argv))
  File "/home/priyank/venv/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 324, in new_func
    return func(*args, **kwargs)
  File "legacy/train.py", line 180, in main
    graph_hook_fn=graph_rewriter_fn)
  File "/home/priyank/venv/models-master/research/object_detection/legacy/trainer.py", line 416, in train
    saver=saver)
  File "/home/priyank/venv/lib/python3.6/site-packages/tensorflow/contrib/slim/python/slim/learning.py", line 785, in train
    ignore_live_threads=ignore_live_threads)
  File "/home/priyank/venv/lib/python3.6/site-packages/tensorflow/python/training/supervisor.py", line 832, in stop
    ignore_live_threads=ignore_live_threads)
  File "/home/priyank/venv/lib/python3.6/site-packages/tensorflow/python/training/coordinator.py", line 389, in join
    six.reraise(*self._exc_info_to_raise)
  File "/home/priyank/venv/lib/python3.6/site-packages/six.py", line 693, in reraise
    raise value
  File "/home/priyank/venv/lib/python3.6/site-packages/tensorflow/python/training/queue_runner_impl.py", line 257, in _run
    enqueue_callable()
  File "/home/priyank/venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1257, in _single_operation_run
    self._call_tf_sessionrun(None, {}, [], target_list, None)
  File "/home/priyank/venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1407, in _call_tf_sessionrun
    run_metadata)
<b>tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[15,1,1755,2777,3] and type float on /job:localhost/replica:0/task:0/device:CPU:0 by allocator cpu
     [[{{node batch}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.</b>
1

1 Answers

1
votes

You can try the following fixes:
1. Reducing the image dimension in case you are using very high image resolution
2. Try reducing the batch size
3. Check if any other process is using up your memory

Could you also please share your config file