3
votes

I am trying to mimic a pytorch neural network in keras.

I am confident that my keras version of the neural network is very close to the one in pytorch but during training, I see that the loss value of the pytorch network are much lower than the loss values of the keras network. I wonder if this is because I have not properly copied the pytorch network in keras or the loss computation is different in the two framework.

Pytorch loss definition:

loss_function = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=args.lr, momentum=0.9, weight_decay=5e-4)

Keras loss definition:

sgd = optimizers.SGD(lr=.1, momentum=0.9, nesterov=True)
resnet.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['categorical_accuracy'])

Note that all the layers in the keras network have been implemented with L2 regularization kernel_regularizer=regularizers.l2(5e-4), also I used he_uniform initialization which I believe is default in pytorch, according to the source code.

The batch size for the two networks are the same: 128.

In the pytorch version, I get loss values around 4.1209 which decreases to around 0.5. In keras it starts around 30 and decreases to 2.5.

2

2 Answers

7
votes

PyTorch CrossEntropyLoss accepts unnormalized scores for each class i.e., not probability (source). Keras categorical_crossentropy by default uses from_logits=False which means it assumes y_pred contains probabilities (not raw scores) (source).

In PyTorch, if you use CrossEntropyLoss, you should not use the softmax/sigmoid layer at the end. In keras you can use it or not use it but set the from_logits accordingly.

2
votes

In my case, the reason why the displayed losses in the two models was different is because Keras prints the sum of the cross entropy loss with the regularization term whereas in the pytorch model only the categorical cross entropy was printed.