I have a multi-output model in Keras (18 outputs to be precise), with a loss function for each output. I am trying to mimick the Region Proposal Network in faster-RCNN. Before training I want to make sure the gradients of my model are in order, where I have a snippet as follows:
with tf.GradientTape() as tape:
loss = RegionProposalNetwork.evaluate(first_batch)[0]
t = tape.watched_variables()
grads = tape.gradient(loss, RegionProposalNetwork.trainable_variables)
print(grads)
The variable first_batch
is obtained from a tf.data object by using the take()
. function. The returned value loss is an array of size 19, where loss[0] is the sum of all the loss functions, a.k.a, the overall loss. Before being able to print the gradient array, I am getting the error message/trace:
Traceback (most recent call last):
File "C:\Program Files\JetBrains\PyCharm Community Edition 2020.2\plugins\python-ce\helpers\pydev\pydevd.py", line 1448, in _exec
pydev_imports.execfile(file, globals, locals) # execute the script
File "C:\Program Files\JetBrains\PyCharm Community Edition 2020.2\plugins\python-ce\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "C:/Users/James/PycharmProjects/Masters/models/MoreTesting.py", line 469, in <module>
grads = tape.gradient(loss, RegionProposalNetwork.trainable_variables)
File "C:\Users\James\Anaconda3\envs\masters\lib\site-packages\tensorflow\python\eager\backprop.py", line 1034, in gradient
if not backprop_util.IsTrainable(t):
File "C:\Users\James\Anaconda3\envs\masters\lib\site-packages\tensorflow\python\eager\backprop_util.py", line 30, in IsTrainable
dtype = dtypes.as_dtype(dtype)
File "C:\Users\James\Anaconda3\envs\masters\lib\site-packages\tensorflow\python\framework\dtypes.py", line 650, in as_dtype
(type_value,))
TypeError: Cannot convert value 29.614826202392578 to a TensorFlow DType.
where the float 29.614826202392578
is the overall loss of this call to the model's evaluate function. I'm not sure what this error means. For reference all of the input data types and intermediate layer results are tensors of tf.float32 values. Any insights appreciated.
Edit: If I try to convert the loss to a tensor using the tf.convert_to_tensor
, I no longer get the error, however the gradients returned are all None. I have tested that my models weights are updated will calls to fit()
, so something is wrong.