Original question
I am trying to design a custom loss function in Keras. The target loss function is similar to the "mean_squared_error" in Kears and it presented below.
y_true and y_pred have the shape of [batch_size, system_size], and system_size is an integer, e.g. system_size = 5. The elements in y_true and y_pred are within in the region of [-1, 1]. Before calculating the loss, I need to change the sign of the y_pred for each sample according to the sign of the maximum absolute value of y_true and the corresponding value in y_pred. For each sample, I need to pick the index of the maximum absolute value first (assume the index is i). If y_pred[:,i] has the same sign as y_true[:,i], then the loss is the normal "mean_squared_error". If the sign of y_pred[:,i] has the different sign as y_true[:,i], all the elements of this sample in y_pred are multiplied by -1.
I tried following function to define the loss. However it does not work.
def normalized_mse(y_true, y_pred):
y_pred = K.l2_normalize(y_pred, axis = -1) # normalize the y_pred
loss_minus = K.square(y_true - y_pred)
loss_plus = K.square(y_true + y_pred)
loss = K.mean(tf.where(tf.greater(
tf.div(y_true[:, K.argmax(K.abs(y_true), axis = -1))],
y_pred[:, K.argmax(K.abs(y_true), axis = -1))]), 0),
loss_minus, loss_plus), axis = -1)
return loss
If I replace the "K.argmax(K.abs(y_true), axis = -1))" with an integer, then the function works well. It seems that this command to pick the index of the maximum absolute value in y_pred is problematic.
Have you ever encountered such problems? Could you please give me some advice and guidance on this problem?
Thank you very much.
Thanks to the guidance from @AnnaKrogager, the problem has been solved. As has been pointed out below, K.argmax returns a tensor instead of an integer. According to @AnnaKrogager's answer, I revised the loss function to
def normalized_mse(y_true, y_pred):
y_pred = K.l2_normalize(y_pred, axis = -1)
y_true = K.l2_normalize(y_true, axis = -1)
loss_minus = K.square(y_pred - y_true)
loss_plus = K.square(y_pred + y_true)
index = K.argmax(K.abs(y_true), axis = -1)
y_true_slice = tf.diag_part(tf.gather(y_true, index, axis = 1))
y_pred_slice = tf.diag_part(tf.gather(y_pred, index, axis = 1))
loss = K.mean(tf.where(tf.greater(tf.div(y_true_slice, y_pred_slice), 0),
loss_minus, loss_plus), axis = -1)
return loss
In order to verify it, I define another function with numpy
def normalized_mse_numpy(y_true, y_pred):
import operator
batch_size = y_true.shape[0]
sample_size = y_true.shape[1]
loss = np.zeros((batch_size))
for i in range(batch_size):
index = np.argmax(abs(y_true[i, :]))
y_pred[i, :] = y_pred[i, :]/linalg.norm(y_pred[i, :])
y_true[i, :] = y_true[i, :]/linalg.norm(y_true[i, :])
sign_flag = y_true[i, index] / y_pred[i, index]
if sign_flag < 0:
for j in range(sample_size):
loss[i] = loss[i] + (y_true[i, j] + y_pred[i, j])**2
for j in range(sample_size):
loss[i] = loss[i] + (y_true[i, j] - y_pred[i, j])**2
loss[i] = loss[i] / SystemSize
return loss
SystemSize = 5
batch_size = 10
sample_size = 5
y_true = 100 * np.random.rand(batch_size, sample_size)
y_pred = 100 * np.random.rand(batch_size, sample_size)
numpy_result = normalized_mse_numpy(y_true, y_pred)
keras_result = K.eval(normalized_mse(K.variable(y_true), K.variable(y_pred)))
numpy_result - keras_result
array([ 4.57889131e-08, 1.27995520e-08, 5.66398740e-09, 1.07868497e-08,
4.41975839e-09, 7.89889471e-09, 6.68819598e-09, 1.05113101e-08,
-9.91241045e-09, -1.20345756e-09])
I also benefit from the answer by Yu-Yang in Implementing custom loss function in keras with different sizes for y_true and y_pred.
Please be noted that tf.gather() does not support 'axis' in some early-version tensorflow, e.g, 1.0.1. It works in 1.11.0. If the tensorflow version is low, you may get the error of "gather() got an unexpected keyword argument 'axis'".