Model parallelism in TensorFlow multi-gpu training

Question

I am training a model in several GPUs on a single machine using tensorflow. However, I find the speed is much slower than training on a single GPU. I am wondering if tensorflow executes sub-model in different GPUs in parallel or in a sequential order. For example:

x = 5
y = 2
with tf.device('/gpu:0'):
     z1 = tf.multiply(x, y)
with tf.device('/gpu:1'):
     z2 = tf.add(x, y)

Are the code inside /gpu:0 and /gpu:1 executes sequentially? If in sequential order, how can I make the two parts execute in parallel? Assume the two parts are not dependent on each other.

Alexandre Passos Alexandre Passos · Accepted Answer · 2018-04-02T17:55:07

In TensorFlow only the second block (inside gpu:1) would execute since nothing depends on the first block.

Model parallelism in TensorFlow multi-gpu training

2 Answers