I am trying to mimic a pytorch neural network in keras.
I am confident that my keras version of the neural network is very close to the one in pytorch but during training, I see that the loss value of the pytorch network are much lower than the loss values of the keras network. I wonder if this is because I have not properly copied the pytorch network in keras or the loss computation is different in the two framework.
Pytorch loss definition:
loss_function = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=args.lr, momentum=0.9, weight_decay=5e-4)
Keras loss definition:
sgd = optimizers.SGD(lr=.1, momentum=0.9, nesterov=True)
resnet.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['categorical_accuracy'])
Note that all the layers in the keras network have been implemented with L2 regularization kernel_regularizer=regularizers.l2(5e-4)
, also I used he_uniform
initialization which I believe is default in pytorch, according to the source code.
The batch size for the two networks are the same: 128
.
In the pytorch version, I get loss values around 4.1209
which decreases to around 0.5
. In keras it starts around 30 and decreases to 2.5
.