Matthews correlation coefficient as a loss in keras

Question

I try to write a custom loss function for keras with tf backend. I get the following error

ValueError: An operation has None for gradient. Please make sure that all of your ops have a gradient defined (i.e. are differentiable). Common ops without gradient: K.argmax, K.round, K.eval.

def matthews_correlation(y_true, y_pred):
    y_pred_pos = K.round(K.clip(y_pred, 0, 1))
    y_pred_neg = 1 - y_pred_pos

    y_pos = K.round(K.clip(y_true, 0, 1))
    y_neg = 1 - y_pos

    tp = K.sum(y_pos * y_pred_pos)
    tn = K.sum(y_neg * y_pred_neg)

    fp = K.sum(y_neg * y_pred_pos)
    fn = K.sum(y_pos * y_pred_neg)

    numerator = (tp * tn - fp * fn)
    denominator = K.sqrt((tp + fp) * (tp + fn) * (tn + fp) * (tn + fn))

    return 1.0 - numerator / (denominator + K.epsilon())

If I use this function as a metric and not as the loss function it works. How can I use this function as a loss?

After removing K.round I get following error:

InvalidArgumentError: Can not squeeze dim[0], expected a dimension of 1, got 8 [[{{node loss_9/dense_10_loss/Squeeze}} = Squeeze[T=DT_FLOAT, squeeze_dims=[-1], _device="/job:localhost/replica:0/task:0/device:GPU:0"] (_arg_dense_10_sample_weights_0_2/_2445)]] [[{{node loss_9/add_12/_2467}} = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_6418_loss_9/add_12", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

K.round is not differentiable. Can you try without rounding so we can narrow the error surface? — Josef Korbel
@JosefKorbel it won't work. any thing that use tp and tf (such as f-beta) is not a loss and as such won't be viable. you won't be able to use it as a loss. you need to use it as metric and keep this to get the best iteration. — Frayal
@tag please update your answer, don't post useful infos in comments — Frayal
@Alexis oh I fear that you're right. Wont be so usefull as you cannot directly minimize it tho. — Josef Korbel

Frayal Frayal · Accepted Answer · 2019-01-07T16:18:20

The answer is: You can't

let me explain a little why. First we need to define a few things:

loss: a loss function or cost function is a function that maps an event or values of one or more variables onto a real number intuitively representing some "cost" associated with the event. An optimization problem seeks to minimize a loss function
metric: In mathematics, a metric or distance function is a function that defines a distance between each pair of elements of a set
optmizer: a way to optimize (minimize) a cost function.

now why can't we use the True positive rate as a loss function? Well because you can't minimize it. It is not convex. So you can't define the cost of the prediction individually. As you can see from the definition it's a cost function that depends on all the answers to calculate a rate. You can't calculate it for 1 sample.

What you can do?

use it as a metric and use early stopping while following the evolution of this metric and get the best iteration.

Matthews correlation coefficient as a loss in keras

2 Answers