3
votes

I have a convolutional neural network with two different output streams:

                         input
                           |
                         (...) <-- several convolutional layers
                           |
                       _________
    (several layers)   |       |    (several layers)
    fully-connected    |       |    fully-connected
    output stream 1 -> |       | <- output stream 2

I would like to compute stream 1 on /gpu:0 and stream 2 on /gpu:1. Unfortunately I were not able to set it up properly.

This attempt:

...placeholders...
...conv layers...

with tf.device("/gpu:0"):
    ...stream 1 layers...
    nn_out_1 = tf.matmul(...)

with tf.device("/gpu:1"):
    ...stream 2 layers...
    nn_out_2 = tf.matmul(...)

Runs dead slow (slower than training on 1 GPU solely) and sometimes produces NaN values in the output. I thought this might be because the with statements may not be synchronized properly. So I added control_dependencies and placed the conv layers on /gpu:0 explicitly:

...placeholders...  # x -> input, y -> labels

with tf.device("/gpu:0"):
    with tf.control_dependencies([x, y]):
        ...conv layers...
        h_conv_flat = tf.reshape(h_conv_last, ...)

with tf.device("/gpu:0"):
    with tf.control_dependencies([h_conv_flat]):
        ...stream 1 layers...
        nn_out_1 = tf.matmul(...)

with tf.device("/gpu:1"):
    with tf.control_dependencies([h_conv_flat]):
        ...stream 2 layers...
        nn_out_2 = tf.matmul(...)

...but with this approach the network isn't even running. No matter what I've tried, it complained about the input not being initialized:

tensorflow.python.framework.errors.InvalidArgumentError:
    You must feed a value for placeholder tensor 'x'
    with dtype float
    [[Node: x = Placeholder[dtype=DT_FLOAT, shape=[],
    _device="/job:localhost/replica:0/task:0/cpu:0"]()]]

Without the with statements the network is training on /gpu:0 only and runs fine - trains reasonable stuff, no errors.

What am I doing wrong? Is TensorFlow not able to split different streams of layers in one network to different GPUs? Do I always have to split the complete network in different towers?

1
It can depend from so many different factors. Are the same gpus? how big is your data ?fabrizioM
Yes, the two GPUs are the same, they are on one card. It is a Dual K80 Tesla card from NVIDIA link. It has 24 GB VRAM and the data completely fits into the VRAM of one GPU (12GB).daniel451
Are you sure that the bottleneck is the GPU speed for that computation? It's quite common for the bottleneck to be in the bandwidth to/from GPU, not the actual calculation; and if you're sending a large tensor to another GPU, then in that case it would only make things worse.Peteris

1 Answers

2
votes

There is an example of how to use many gpus on one network https://github.com/tensorflow/tensorflow/blob/master/tensorflow/models/image/cifar10/cifar10_multi_gpu_train.py probably you can copy the code. Also can get something like this

# Creates a graph.
c = []
for d in ['/gpu:2', '/gpu:3']:
with tf.device(d):
   a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3])
   b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2])
   c.append(tf.matmul(a, b))
with tf.device('/cpu:0'):
sum = tf.add_n(c)
# Creates a session with log_device_placement set to True.
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
# Runs the op.
print sess.run(sum)

Looking at: https://www.tensorflow.org/versions/r0.7/how_tos/using_gpu/index.html#using-multiple-gpus

Best Regards