is crossentropy loss of pytorch different than “categorical_crossentropy” of keras?

Question

I am trying to mimic a pytorch neural network in keras.

I am confident that my keras version of the neural network is very close to the one in pytorch but during training, I see that the loss value of the pytorch network are much lower than the loss values of the keras network. I wonder if this is because I have not properly copied the pytorch network in keras or the loss computation is different in the two framework.

Pytorch loss definition:

loss_function = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=args.lr, momentum=0.9, weight_decay=5e-4)

Keras loss definition:

sgd = optimizers.SGD(lr=.1, momentum=0.9, nesterov=True)
resnet.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['categorical_accuracy'])

Note that all the layers in the keras network have been implemented with L2 regularization kernel_regularizer=regularizers.l2(5e-4), also I used he_uniform initialization which I believe is default in pytorch, according to the source code.

The batch size for the two networks are the same: 128.

In the pytorch version, I get loss values around 4.1209 which decreases to around 0.5. In keras it starts around 30 and decreases to 2.5.

xashru xashru · Accepted Answer · 2020-04-26T08:52:49

PyTorch CrossEntropyLoss accepts unnormalized scores for each class i.e., not probability (source). Keras categorical_crossentropy by default uses from_logits=False which means it assumes y_pred contains probabilities (not raw scores) (source).

In PyTorch, if you use CrossEntropyLoss, you should not use the softmax/sigmoid layer at the end. In keras you can use it or not use it but set the from_logits accordingly.

is crossentropy loss of pytorch different than “categorical_crossentropy” of keras?

2 Answers