10
votes

I am new to Keras. I need some help in writing a custom loss function in keras with TensorFlow backend for the following loss equation.

Loss function

The parameters passed to the loss function are :

  1. y_true would be of shape (batch_size, N, 2). Here, we are passing N (x, y) coordinates in each sample in the batch.
  2. y_pred would be of shape (batch_size, 256, 256, N). Here, we are passing N predicted heatmaps of 256 x 256 pixels in each sample in the batch.

i[0, 255]

j[0, 255]

Mn(i, j) represents value at pixel location (i, j) for the nth predicted heatmap.

Mn(i, j) = Guassian2D((i, j), y_truen, std) where

std = standard deviation, same standard deviation for both the dimensions (5 px).

y_truen is the nth (x, y) coordinate. This is the mean.

For details of this, please check the l2 loss described in this paper Human Pose Estimation.

Note : I mentioned batch_size in shape of y_true and y_pred. I assumed that Keras calls loss function on the entire batch and not on individual samples in the batch. Correct me if I am wrong.

def l2_loss(y_true, y_pred):
     loss = 0
     n = y_true.shape[0]
     for j in range(n):
        for i in range(num_joints):
            yv, xv = tf.meshgrid(tf.arange(0, im_height), tf.arange(0, im_width))
            z = np.array([xv, yv]).transpose(1, 2, 0)
            ground = np.exp(-0.5*(((z - y_true[j, i, :])**2).sum(axis=2))/(sigma**2))
            loss = loss + np.sum((ground - y_pred[j,:, :, i])**2)
     return loss/num_joints

This is the code I have writen so far. I know that this won't run as we can't use direct numpy ndarrays inside a keras loss function. Also, I need to eliminate loops!

3
SO is not a code writing service; please show what you have tried so far and the specific programming issues you are facing...desertnaut
Yeah, I have added my code, please check!Alex

3 Answers

11
votes

You can pretty much just translate the numpy functions into Keras backend functions. The only thing to notice is to set up the right broadcast shape.

def l2_loss_keras(y_true, y_pred):
    # set up meshgrid: (height, width, 2)
    meshgrid = K.tf.meshgrid(K.arange(im_height), K.arange(im_width))
    meshgrid = K.cast(K.transpose(K.stack(meshgrid)), K.floatx())

    # set up broadcast shape: (batch_size, height, width, num_joints, 2)
    meshgrid_broadcast = K.expand_dims(K.expand_dims(meshgrid, 0), -2)
    y_true_broadcast = K.expand_dims(K.expand_dims(y_true, 1), 2)
    diff = meshgrid_broadcast - y_true_broadcast

    # compute loss: first sum over (height, width), then take average over num_joints
    ground = K.exp(-0.5 * K.sum(K.square(diff), axis=-1) / sigma ** 2)
    loss = K.sum(K.square(ground - y_pred), axis=[1, 2])
    return K.mean(loss, axis=-1)

To verify it:

def l2_loss_numpy(y_true, y_pred):
     loss = 0
     n = y_true.shape[0]
     for j in range(n):
        for i in range(num_joints):
            yv, xv = np.meshgrid(np.arange(0, im_height), np.arange(0, im_width))
            z = np.stack([xv, yv]).transpose(1, 2, 0)
            ground = np.exp(-0.5*(((z - y_true[j, i, :])**2).sum(axis=2))/(sigma**2))
            loss = loss + np.sum((ground - y_pred[j,:, :, i])**2)
     return loss/num_joints

batch_size = 32
num_joints = 10
sigma = 5
im_width = 256
im_height = 256

y_true = 255 * np.random.rand(batch_size, num_joints, 2)
y_pred = 255 * np.random.rand(batch_size, im_height, im_width, num_joints)

print(l2_loss_numpy(y_true, y_pred))
45448272129.0

print(K.eval(l2_loss_keras(K.variable(y_true), K.variable(y_pred))).sum())
4.5448e+10

The number is truncated under the default dtype float32. If you run it with dtype set to float64:

y_true = 255 * np.random.rand(batch_size, num_joints, 2)
y_pred = 255 * np.random.rand(batch_size, im_height, im_width, num_joints)

print(l2_loss_numpy(y_true, y_pred))
45460126940.6

print(K.eval(l2_loss_keras(K.variable(y_true), K.variable(y_pred))).sum())
45460126940.6

EDIT:

It seems that Keras requires y_true and y_pred to have the same number of dimensions. For example, on the following testing model:

X = np.random.rand(batch_size, 256, 256, 3)
model = Sequential([Dense(10, input_shape=(256, 256, 3))])
model.compile(loss=l2_loss_keras, optimizer='adam')
model.fit(X, y_true, batch_size=8)

ValueError: Cannot feed value of shape (8, 10, 2) for Tensor 'dense_2_target:0', which has shape '(?, ?, ?, ?)'

To deal with this problem, you can add a dummy dimension with expand_dims before feeding y_true into the model:

def l2_loss_keras(y_true, y_pred):
    ...

    y_true_broadcast = K.expand_dims(y_true, 1)  # change this line

    ...

model.fit(X, np.expand_dims(y_true, axis=1), batch_size=8)
0
votes

Recent versions of Keras actually support losses with different shapes of y_pred and y_true. The build in loss sparse_categorical_crossentropy is an example of this. The TensorFlow implementation of this loss is here: https://github.com/keras-team/keras/blob/0fc33feb5f4efe3bb823c57a8390f52932a966ab/keras/backend/tensorflow_backend.py#L3570

Notice how it says target: An integer tensor. and not target: A tensor of the same shape as `output`. like the others. I tried with a custom loss I made myself and it seems to work fine.

I'm using Keras 2.2.4.

0
votes

Yu's answer is correct still I want to share my experience. Whenever you want to write custom loss function, beware of certain things:

  1. At compile time, Keras will not complain about size mismatch. For example, if y_pred which comes from the output layer of your model has 3-D shape, y_true will default to 3-D shape. BUT, at run time i.e. during fit, if you pass target data as 2-D which many people do, you might end up getting the error in your loss function. E.g., if you are calculating sigmoid_crossentropy_with_logits, it will complain. Hence, do pass targets as 3-D via np.expand_dims.
  2. Also, in custom loss make sure you use y_true and y_pred as arguments (if somebody knows that we can use any other names as arguments, pl shout).