1
votes

Did anyoe have a convincing solution to make custom_binarycrossentropy work?

I tried all possible methods (even making the whole training data size same as the bacth size to eliminate the dependence on global averaging during batch wise processing.). But i see significant difference between my binary cross entropy implementation and the one from keras ( by specifying loss = 'binary_crossentropy')

My crustom binary cross entropy code is as follows

def _loss_tensor(y_true, y_pred):
y_pred = K.clip(y_pred, _EPSILON, 1.0-_EPSILON)
out = (y_true * K.log(y_pred) + (1.0 - y_true) * K.log(1.0 - y_pred))
return -K.mean(out)
def _loss_tensor2(y_true, y_pred):
y_pred = K.clip(y_pred, _EPSILON, 1.0-_EPSILON)
out = -(y_true * K.log(y_pred) + -(1.0 - y_true) * K.log(1.0 - y_pred))
return out
def _loss_tensor2(y_true, y_pred):
loss1 = K.binary_crossentropy(y_true, y_pred)
return loss1

None of these methods work. It doesnt work even if i do K.mean() before ir eturn the results from custom loss function.

I am not able to understand what special does using loss = 'binary_crossentropy' does. When i use my custom loss function , the training sucks and it does work as expected.

I need my custom loss function to manipulate the loss function depending on the error and penalizing a certain type of classification error more.

2

2 Answers

0
votes

I have found a way of working for this requirement and posted the same here : https://github.com/keras-team/keras/issues/4108

However, why the inbuilt function performs significantly different than the explicit formula method is unknown. I would however expect its mainly due to the handling of upper and lower bounds of probability values of the y_pred.

0
votes
def custom_binary_loss(y_true, y_pred): 
    # https://github.com/tensorflow/tensorflow/blob/v2.3.1/tensorflow/python/keras/backend.py#L4826
    y_pred = K.clip(y_pred, K.epsilon(), 1 - K.epsilon())
    
    term_0 = (1 - y_true) * K.log(1 - y_pred + K.epsilon())  # Cancels out when target is 1 
    term_1 = y_true * K.log(y_pred + K.epsilon()) # Cancels out when target is 0

    return -K.mean(term_0 + term_1, axis=1)