0
votes

I am trying to convert a model to run with eager execution. However, I have encountered an odd error where it appears that even when tf.linspace and all of its arguments are placed on the GPU it still requires some copying to/from cpu memory. In particular consider this minimal example:

import tensorflow as tf

tfe = tf.contrib.eager
tfe.enable_eager_execution(config=tf.ConfigProto(allow_soft_placement=True,
                                        log_device_placement=True), device_policy=tfe.DEVICE_PLACEMENT_WARN)

a = tf.constant(7.).gpu()
b = tf.constant(8.).gpu()
c = tf.constant(4).gpu()

with tf.device("/device:GPU:0"):
    print(a)
    print(tf.linspace(a,b,c))

This gives the following warnings:

2018-05-22 23:29:56.401000: W tensorflow/c/eager/c_api.cc:506] before computing LinSpace input #0 was expected to be on /job:localhost/replica:0/task:0/device:CPU:0 but is actually on /job:localhost/replica:0/task:0/device:GPU:0 (operation running on /job:localhost/replica:0/task:0/device:GPU:0). This triggers a copy which can be a performance bottleneck. 2018-05-22 23:29:56.401275: W tensorflow/c/eager/c_api.cc:506] before computing LinSpace input #1 was expected to be on /job:localhost/replica:0/task:0/device:CPU:0 but is actually on /job:localhost/replica:0/task:0/device:GPU:0 (operation running on /job:localhost/replica:0/task:0/device:GPU:0). This triggers a copy which can be a performance bottleneck. 2018-05-22 23:29:56.401534: W tensorflow/c/eager/c_api.cc:506] before computing LinSpace input #2 was expected to be on /job:localhost/replica:0/task:0/device:CPU:0 but is actually on /job:localhost/replica:0/task:0/device:GPU:0 (operation running on /job:localhost/replica:0/task:0/device:GPU:0). This triggers a copy which can be a performance bottleneck.

I am running tensorflow 1.8 on python 2.7.

1

1 Answers

1
votes

Yes, tf.linspace does require all the arguments in host (i.e., CPU) memory (in theory this can change in future versions of TensorFlow, but hasn't so far). This isn't a characteristic of eager execution - it is true for graph execution as well. In both cases if the input tensors are in GPU memory then they will be copied over to host memory.

Every primitive TensorFlow operation has a "kernel" implementation for each device it can execute on. It seems the GPU kernel for the LinSpace operation doesn't actually use the GPU.

This is some quirk of the linspace operation.

Hope that helps.