Original question
I am trying to design a custom loss function in Keras. The target loss function is similar to the "mean_squared_error" in Kears and it presented below.
y_true and y_pred have the shape of [batch_size, system_size], and system_size is an integer, e.g. system_size = 5. The elements in y_true and y_pred are within in the region of [-1, 1]. Before calculating the loss, I need to change the sign of the y_pred for each sample according to the sign of the maximum absolute value of y_true and the corresponding value in y_pred. For each sample, I need to pick the index of the maximum absolute value first (assume the index is i). If y_pred[:,i] has the same sign as y_true[:,i], then the loss is the normal "mean_squared_error". If the sign of y_pred[:,i] has the different sign as y_true[:,i], all the elements of this sample in y_pred are multiplied by -1.
I tried following function to define the loss. However it does not work.
def normalized_mse(y_true, y_pred):
y_pred = K.l2_normalize(y_pred, axis = -1) # normalize the y_pred
loss_minus = K.square(y_true - y_pred)
loss_plus = K.square(y_true + y_pred)
loss = K.mean(tf.where(tf.greater(
tf.div(y_true[:, K.argmax(K.abs(y_true), axis = -1))],
y_pred[:, K.argmax(K.abs(y_true), axis = -1))]), 0),
loss_minus, loss_plus), axis = -1)
return loss
If I replace the "K.argmax(K.abs(y_true), axis = -1))" with an integer, then the function works well. It seems that this command to pick the index of the maximum absolute value in y_pred is problematic.
Have you ever encountered such problems? Could you please give me some advice and guidance on this problem?
Thank you very much.
Elvin
Solved
Thanks to the guidance from @AnnaKrogager, the problem has been solved. As has been pointed out below, K.argmax returns a tensor instead of an integer. According to @AnnaKrogager's answer, I revised the loss function to
def normalized_mse(y_true, y_pred):
y_pred = K.l2_normalize(y_pred, axis = -1)
y_true = K.l2_normalize(y_true, axis = -1)
loss_minus = K.square(y_pred - y_true)
loss_plus = K.square(y_pred + y_true)
index = K.argmax(K.abs(y_true), axis = -1)
y_true_slice = tf.diag_part(tf.gather(y_true, index, axis = 1))
y_pred_slice = tf.diag_part(tf.gather(y_pred, index, axis = 1))
loss = K.mean(tf.where(tf.greater(tf.div(y_true_slice, y_pred_slice), 0),
loss_minus, loss_plus), axis = -1)
return loss
In order to verify it, I define another function with numpy
def normalized_mse_numpy(y_true, y_pred):
import operator
batch_size = y_true.shape[0]
sample_size = y_true.shape[1]
loss = np.zeros((batch_size))
for i in range(batch_size):
index = np.argmax(abs(y_true[i, :]))
y_pred[i, :] = y_pred[i, :]/linalg.norm(y_pred[i, :])
y_true[i, :] = y_true[i, :]/linalg.norm(y_true[i, :])
sign_flag = y_true[i, index] / y_pred[i, index]
if sign_flag < 0:
for j in range(sample_size):
loss[i] = loss[i] + (y_true[i, j] + y_pred[i, j])**2
else:
for j in range(sample_size):
loss[i] = loss[i] + (y_true[i, j] - y_pred[i, j])**2
loss[i] = loss[i] / SystemSize
return loss
SystemSize = 5
batch_size = 10
sample_size = 5
y_true = 100 * np.random.rand(batch_size, sample_size)
y_pred = 100 * np.random.rand(batch_size, sample_size)
numpy_result = normalized_mse_numpy(y_true, y_pred)
keras_result = K.eval(normalized_mse(K.variable(y_true), K.variable(y_pred)))
print(numpy_result.sum())
0.9979743490342015
print(keras_result.sum())
0.9979742
numpy_result - keras_result
array([ 4.57889131e-08, 1.27995520e-08, 5.66398740e-09, 1.07868497e-08,
4.41975839e-09, 7.89889471e-09, 6.68819598e-09, 1.05113101e-08,
-9.91241045e-09, -1.20345756e-09])
I also benefit from the answer by Yu-Yang in Implementing custom loss function in keras with different sizes for y_true and y_pred.
Please be noted that tf.gather() does not support 'axis' in some early-version tensorflow, e.g, 1.0.1. It works in 1.11.0. If the tensorflow version is low, you may get the error of "gather() got an unexpected keyword argument 'axis'".