Exception in Tensorflow function used as Keras custom loss

Question

I am trying to write a Keras 2 LSTM with a custom loss function via Tensorflow:

model.compile(loss=in_top_k_loss, optimizer='rmsprop', metrics=[bin_crossent_true_only, 'binary_crossentropy', 'mean_squared_error', 'accuracy'])

My training set has examples with different sizes of the time dimension, hence I use train_on_batch where each batch consists only of instances with the same time dimension. Batch size is 256. The following code throws a very nasty exception in the first epoch (when train_on_batch is first called):

# takes 2 1D arrays of equal length, returns a single value (the negative of my own "precision" measure)
def in_top_k_loss_single(y_true, y_pred):
    y_true_labels = tf.cast(tf.transpose(tf.where(y_true > 0))[0], tf.int32)
    y_pred = tf.reshape(y_pred, [1, tf.shape(y_pred)[0]])
    y_topk_tensor = tf.nn.top_k(y_pred, k=7)
    y_topk_ixs = y_topk_tensor[0][0][:7]
    y_topk = y_topk_tensor[1][0][:7]
    y_topk_len = tf.cast(tf.count_nonzero(y_topk_ixs), tf.int32)
    y_topk = y_topk[:y_topk_len]
    y_topk0 = tf.expand_dims(y_topk, 1)
    y_true_labels0 = tf.expand_dims(y_true_labels, 0)
    re = tf.cast(tf.reduce_any(tf.equal(y_topk0, y_true_labels0), 1), tf.int32) / tf.range(1,y_topk_len+1)
    return (-1) * tf.where(tf.equal(tf.reduce_sum(y_pred), tf.constant(0.0)), tf.constant(0.0), tf.cast(tf.reduce_mean(re),tf.float32))

# takes 2 matrices of equal sizes, 
# applies the upper function for y_true[i] & y_pred[i] for each row i, 
# returns a single value (mean of all row-wise values)
def in_top_k_loss(y_true, y_pred):
    # if I change `in_top_k_loss_single` to `keras.metrics.binary_crossentropy` (for instance) it runs
    return K.mean(tf.map_fn(lambda x: in_top_k_loss_single(x[0], x[1]), (y_true, y_pred), dtype=tf.float32))

where in_top_k_loss is my custom loss function in the Keras model. These functions seem to work when I test them separately with different input (even tricky one). It seems that only Keras has problems with them - perhaps it expects different datatypes/shapes/etc.

Some smart ideas from the Internet: Tried changing the batch size, changing the optimizer and clipping the gradient - no success. Also tried calling evaluate before train_on_batch - no success.

Rest of the code works with losses from Keras as well as losses like this one:

def bin_crossent_true_only(y_true, y_pred):
    return (1 + keras.backend.sum(y_pred)) * keras.metrics.binary_crossentropy(y_true, y_true * y_pred)

The function in_top_k_loss works and returns meaningful results if used in the metrics array. All input (y_true, y_pred) is not NaN. y_true may has 0s and 1s (zero or more 1s per row, i.e. per instance of the training set).

The exception itself:

Traceback (most recent call last):
  File "C:\Users\myname\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 491, in apply_op
    preferred_dtype=default_dtype)
  File "C:\Users\myname\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\framework\ops.py", line 702, in internal_convert_to_tensor
    ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
  File "C:\Users\myname\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\framework\constant_op.py", line 110, in _constant_tensor_conversion_function
    return constant(v, dtype=dtype, name=name)
  File "C:\Users\myname\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\framework\constant_op.py", line 99, in constant
    tensor_util.make_tensor_proto(value, dtype=dtype, shape=shape, verify_shape=verify_shape))
  File "C:\Users\myname\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\framework\tensor_util.py", line 360, in make_tensor_proto
    raise ValueError("None values not supported.")
ValueError: None values not supported.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 9, in <module>
  File "C:\Users\myname\AppData\Local\Programs\Python\Python36\lib\site-packages\keras\models.py", line 941, in train_on_batch
    class_weight=class_weight)
  File "C:\Users\myname\AppData\Local\Programs\Python\Python36\lib\site-packages\keras\engine\training.py", line 1620, in train_on_batch
    self._make_train_function()
  File "C:\Users\myname\AppData\Local\Programs\Python\Python36\lib\site-packages\keras\engine\training.py", line 1002, in _make_train_function
    self.total_loss)
  File "C:\Users\myname\AppData\Local\Programs\Python\Python36\lib\site-packages\keras\optimizers.py", line 210, in get_updates
    new_a = self.rho * a + (1. - self.rho) * K.square(g)
  File "C:\Users\myname\AppData\Local\Programs\Python\Python36\lib\site-packages\keras\backend\tensorflow_backend.py", line 1225, in square
    return tf.square(x)
  File "C:\Users\myname\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\ops\math_ops.py", line 384, in square
    return gen_math_ops.square(x, name=name)
  File "C:\Users\myname\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\ops\gen_math_ops.py", line 2733, in square
    result = _op_def_lib.apply_op("Square", x=x, name=name)
  File "C:\Users\myname\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 504, in apply_op
    values, as_ref=input_arg.is_ref).dtype.name
  File "C:\Users\myname\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\framework\ops.py", line 702, in internal_convert_to_tensor
    ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
  File "C:\Users\myname\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\framework\constant_op.py", line 110, in _constant_tensor_conversion_function
    return constant(v, dtype=dtype, name=name)
  File "C:\Users\myname\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\framework\constant_op.py", line 99, in constant
    tensor_util.make_tensor_proto(value, dtype=dtype, shape=shape, verify_shape=verify_shape))
  File "C:\Users\myname\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\framework\tensor_util.py", line 360, in make_tensor_proto
    raise ValueError("None values not supported.")
ValueError: None values not supported.

mrry mrry · Accepted Answer · 2017-04-21T05:12:01

The optimizers in TensorFlow require that the loss function be differentiable, which is determined by all of the operations between the loss result and the variables in the TensorFlow graph having defined gradients. The tf.where() operation does not have defined gradients, which means that the overall loss function is not differentiable. The result of trying to compute the gradients of a non-differentiable function in TensorFlow is None, which results in the error you are seeing when Keras tries to update the variables.

Exception in Tensorflow function used as Keras custom loss

1 Answers