2
votes

I am attempting to perform Multi-GPU training with the TensorFlow Object Detection API.

What I see in my NVIDIA-SMI is that only 1 GPU is actually being utilized. The other 3 GPUs that are provided have the GPU process loaded to them, but memory usage is at 300MB and utilization sits at 0% at all times

I am using the SSD MobileNetV1 based network pretrained on COCO and then training it with my custom dataset.

I expect that when I provide Tensorflow with more GPUs, the framework will actually use them to speed up training.

2
If you use tf.estimator, you can have multi-gpu mode, if you use low-level API, you need to place ops on GPUs manually, I recommend you see the tf.estimator documents.Hamed

2 Answers

2
votes

For Tensorflow 2.2.0 Object Detection API, when you are running model_main_tf2.py, enable this flags:

python model_main_tf2.py --num_workers=2

for any integer for --num_workers > 1, tensorflow uses all available gpus, if you want to use only some of the gpus, you have to edit this model_main_tf2.py file where it specifies the strategy while keeping the num_workers in default 1. This for example, uses first and second gpu of the machine:

strategy = tf.distribute.MirroredStrategy(devices=["/gpu:0", "/gpu:1"])
0
votes

Unfortunalety, it is not possible at the moment. You can run multi GPU inference as described here: https://github.com/tensorflow/models/issues/6611#issuecomment-508507576 . But for training purposes it is not possible. I heard that the devs are working on a migration to keras as bas. This could enable multi gpu training.