1
votes

I created a multi GPU training system following this tutorial on tensorflow 1.2.0: https://github.com/tensorflow/tensorflow/blob/r0.10/tensorflow/models/image/cifar10/cifar10_multi_gpu_train.py

Before creating the multi GPU version, I was able to fit a batch size of 64 on a single GPU. I thought if I created a multi GPU version of the code following the above tutorial, I could fit in more batches through data parallelization. I was hoping to use 4 GPUS each with a batch size of 64 but I am facing out of memory issues. I am only able to use 2 GPUS for data parallelization when using a batch size of 64 and only 4 GPUS when using a batch size of 32. On using more GPUS for either cases, I get the following error:

tensorflow.python.framework.errors_impl.InternalError: Dst tensor is not initialized.

On some googling, I found that this error occurs when the GPU runs out of memory. I dont understand why this is happening. I have 8 nvidia titan cards with 12 Gigs of memory each on my machine. I dont understand why when I can fit a batch size of 64 on a single GPU, I am not able to fit the same batch size of 64 on more than two GPUS. Why is the memory getting saturated? Is there some overhead which increases on increasing the number of GPUS being used?

1

1 Answers

1
votes

may be it missing variable scope define in front of the loop

   with tf.variable_scope(tf.get_variable_scope()):

        for i in xrange(FLAGS.num_gpus): 

             with tf.device('/gpu:%d' % i):