Tensorflow: GPU gradients for nested map_fn

Question

The piece of code below works fine when I am using the CPU as device, however it fails when using GPU. This is the error I am getting:

InvalidArgumentError (see above for traceback): Cannot assign a device for operation 'Adam/update_Variable/Cast_5': Could not satisfy explicit device specification '/device:GPU:0' because no supported kernel for GPU devices is available.

Based on that, I am assuming there are no GPU gradients for nested map_fn calls, is that the case? If true, is there any way I can implement that same piece of code, so it works on the GPU, while keeping the two nested functions?

Thanks.

import numpy as np
import tensorflow as tf


def loop_inner(x):
    return tf.reduce_sum(tf.square(x))


def loop_outer(x):
    return tf.map_fn(lambda x: loop_inner(x), x)

np.random.seed(10)
io, d, k, m = 2, 4, 3, 2
A = np.random.random((io, d, k, m))

with tf.device('/cpu:0'):

    sess = tf.Session()
    A = tf.Variable(A)
    B = tf.map_fn(lambda x: loop_outer(x), A)

    L = tf.reduce_sum(B)
    optim = tf.train.AdamOptimizer(learning_rate=0.1).minimize(L)

sess.run(tf.global_variables_initializer())

for i in range(1000):
    sess.run(optim)
    print(sess.run(L))

The error message is misleading, as the problem is not in cast itself. If you replace L = tf.reduce_sum( A ), Cast would be placed on GPU fine. "sess = tf.Session(config=tf.ConfigProto(allow_soft_placement=True, log_device_placement=True))" would partially solve the problem and help with debugging. I debugged this a bit, but haven't found which op exactly is missing GPU support and thus triggered co-location constraint. — Yao Zhang
If I replace "L = tf.reduce_sum( A )" then I am avoiding B, which is where the nested loop happens, so the error will disappear regardless. I tried with allow_soft_placement, and while the error disappears, the whole operation is cast to the CPU, so it doesn't really help. — Vitor Guizilini

Lerner Zhang Lerner Zhang · Accepted Answer · 2017-12-22T05:34:04

I don't think this is related to the nested map_fn since the simple non-nested map_fn can cause that error:

import numpy as np
import tensorflow as tf


def my_fn(x, y):
    return x * y

with tf.device('/gpu:0'):
    a = np.array([[1, 2, 3], [2, 4, 1], [5, 1, 7]])
    b = np.array([[1, -1, -1], [1, 1, 1], [-1, 1, -1]])
    elems = (a, b)

    sess = tf.Session()
    B = tf.map_fn(lambda x: my_fn(x[0], x[1]), elems, dtype=tf.int32)

sess.run(tf.global_variables_initializer())

print(sess.run(B))

The error is like this:

InvalidArgumentError (see above for traceback): Cannot assign a device for operation 'map/TensorArray_2': Could not satisfy explicit device specification '' because the node was colocated with a group of nodes that required incompatible device '/device:GPU:0' Colocation Debug Info: Colocation group had the following types and devices: Mul: CPU TensorArrayGatherV3: GPU CPU Range: GPU CPU TensorArrayWriteV3: CPU TensorArraySizeV3: GPU CPU Enter: GPU CPU TensorArrayV3: CPU Const: CPU [[Node: map/TensorArray_2 = TensorArrayV3clear_after_read=true, dtype=DT_INT32, dynamic_size=false, element_shape=, tensor_array_name=""]]

If the gpu is changed to cpu everything works well.

Tensorflow: GPU gradients for nested map_fn

1 Answers