I am training a model in several GPUs on a single machine using tensorflow. However, I find the speed is much slower than training on a single GPU. I am wondering if tensorflow executes sub-model in different GPUs in parallel or in a sequential order. For example:
x = 5
y = 2
with tf.device('/gpu:0'):
z1 = tf.multiply(x, y)
with tf.device('/gpu:1'):
z2 = tf.add(x, y)
Are the code inside /gpu:0
and /gpu:1
executes sequentially? If in sequential order, how can I make the two parts execute in parallel? Assume the two parts are not dependent on each other.