Computing gradients of a multi-output model in Keras giving conversion to Tensorflow DType error

Question

I have a multi-output model in Keras (18 outputs to be precise), with a loss function for each output. I am trying to mimick the Region Proposal Network in faster-RCNN. Before training I want to make sure the gradients of my model are in order, where I have a snippet as follows:

with tf.GradientTape() as tape:
    loss = RegionProposalNetwork.evaluate(first_batch)[0]
    t = tape.watched_variables()
grads = tape.gradient(loss, RegionProposalNetwork.trainable_variables)
print(grads)

The variable first_batch is obtained from a tf.data object by using the take(). function. The returned value loss is an array of size 19, where loss[0] is the sum of all the loss functions, a.k.a, the overall loss. Before being able to print the gradient array, I am getting the error message/trace:

Traceback (most recent call last):
  File "C:\Program Files\JetBrains\PyCharm Community Edition 2020.2\plugins\python-ce\helpers\pydev\pydevd.py", line 1448, in _exec
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "C:\Program Files\JetBrains\PyCharm Community Edition 2020.2\plugins\python-ce\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "C:/Users/James/PycharmProjects/Masters/models/MoreTesting.py", line 469, in <module>
    grads = tape.gradient(loss, RegionProposalNetwork.trainable_variables)
  File "C:\Users\James\Anaconda3\envs\masters\lib\site-packages\tensorflow\python\eager\backprop.py", line 1034, in gradient
    if not backprop_util.IsTrainable(t):
  File "C:\Users\James\Anaconda3\envs\masters\lib\site-packages\tensorflow\python\eager\backprop_util.py", line 30, in IsTrainable
    dtype = dtypes.as_dtype(dtype)
  File "C:\Users\James\Anaconda3\envs\masters\lib\site-packages\tensorflow\python\framework\dtypes.py", line 650, in as_dtype
    (type_value,))
TypeError: Cannot convert value 29.614826202392578 to a TensorFlow DType.

where the float 29.614826202392578 is the overall loss of this call to the model's evaluate function. I'm not sure what this error means. For reference all of the input data types and intermediate layer results are tensors of tf.float32 values. Any insights appreciated.

Edit: If I try to convert the loss to a tensor using the tf.convert_to_tensor, I no longer get the error, however the gradients returned are all None. I have tested that my models weights are updated will calls to fit(), so something is wrong.

IntegrateThis IntegrateThis · Accepted Answer · 2020-09-10T02:05:14

The issue I was having is that the return value described here :

Return Scalar test loss (if the model has a single output and no metrics) or list of scalars (if the model has multiple outputs and/or metrics). The attribute model.metrics_names will give you the display labels for the scalar outputs.

is not a tensor. Similarly model.predict() will not work since the result is a numpy array, breaking the gradient computation. To compute the gradients the loop works if I instead simply call the model on test input data, and then compute the loss function with respect to ground truth values, a.k.a

with tf.GradientTape() as tape:
     model_output = model(input)
     loss = loss_fn(output, model_output)
gradients = tape.gradient(loss, model.trainable_variables)

# And if you are using a generator, 
batch = data_iterator.get_next()
input = batch[0]
output = batch[1]

Computing gradients of a multi-output model in Keras giving conversion to Tensorflow DType error

1 Answers