TensorFlow: one network, two GPUs?

Question

I have a convolutional neural network with two different output streams:

                         input
                           |
                         (...) <-- several convolutional layers
                           |
                       _________
    (several layers)   |       |    (several layers)
    fully-connected    |       |    fully-connected
    output stream 1 -> |       | <- output stream 2

I would like to compute stream 1 on /gpu:0 and stream 2 on /gpu:1. Unfortunately I were not able to set it up properly.

This attempt:

...placeholders...
...conv layers...

with tf.device("/gpu:0"):
    ...stream 1 layers...
    nn_out_1 = tf.matmul(...)

with tf.device("/gpu:1"):
    ...stream 2 layers...
    nn_out_2 = tf.matmul(...)

Runs dead slow (slower than training on 1 GPU solely) and sometimes produces NaN values in the output. I thought this might be because the with statements may not be synchronized properly. So I added control_dependencies and placed the conv layers on /gpu:0 explicitly:

...placeholders...  # x -> input, y -> labels

with tf.device("/gpu:0"):
    with tf.control_dependencies([x, y]):
        ...conv layers...
        h_conv_flat = tf.reshape(h_conv_last, ...)

with tf.device("/gpu:0"):
    with tf.control_dependencies([h_conv_flat]):
        ...stream 1 layers...
        nn_out_1 = tf.matmul(...)

with tf.device("/gpu:1"):
    with tf.control_dependencies([h_conv_flat]):
        ...stream 2 layers...
        nn_out_2 = tf.matmul(...)

...but with this approach the network isn't even running. No matter what I've tried, it complained about the input not being initialized:

tensorflow.python.framework.errors.InvalidArgumentError:
    You must feed a value for placeholder tensor 'x'
    with dtype float
    [[Node: x = Placeholder[dtype=DT_FLOAT, shape=[],
    _device="/job:localhost/replica:0/task:0/cpu:0"]()]]

Without the with statements the network is training on /gpu:0 only and runs fine - trains reasonable stuff, no errors.

What am I doing wrong? Is TensorFlow not able to split different streams of layers in one network to different GPUs? Do I always have to split the complete network in different towers?

It can depend from so many different factors. Are the same gpus? how big is your data ? — fabrizioM
Yes, the two GPUs are the same, they are on one card. It is a Dual K80 Tesla card from NVIDIA link. It has 24 GB VRAM and the data completely fits into the VRAM of one GPU (12GB). — daniel451
Are you sure that the bottleneck is the GPU speed for that computation? It's quite common for the bottleneck to be in the bandwidth to/from GPU, not the actual calculation; and if you're sending a large tensor to another GPU, then in that case it would only make things worse. — Peteris

Nelson Yalta Nelson Yalta · Accepted Answer · 2016-03-06T07:44:12

There is an example of how to use many gpus on one network https://github.com/tensorflow/tensorflow/blob/master/tensorflow/models/image/cifar10/cifar10_multi_gpu_train.py probably you can copy the code. Also can get something like this

# Creates a graph.
c = []
for d in ['/gpu:2', '/gpu:3']:
with tf.device(d):
   a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3])
   b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2])
   c.append(tf.matmul(a, b))
with tf.device('/cpu:0'):
sum = tf.add_n(c)
# Creates a session with log_device_placement set to True.
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
# Runs the op.
print sess.run(sum)

Looking at: https://www.tensorflow.org/versions/r0.7/how_tos/using_gpu/index.html#using-multiple-gpus

Best Regards

TensorFlow: one network, two GPUs?

1 Answers