Why Keras behave better than Pytorch under the same network configuration?

Question

Recently, I have compared unet++ implementation of Keras version and Pytorch version on the same dataset. However, with Keras the loss decrease continuously and the accuracy is higher after 10 epochs, while with Pytorch the loss decrease unevenly and the accuracy is lower after 10 epochs. Anyone has met such problems and has any answers?

the final pytorch training process is like:

2019-12-15 18:14:20 Epoch:9 Iter: 1214/1219 loss:0.464673 acc:0.581713

2019-12-15 18:14:21 Epoch:9 Iter: 1215/1219 loss:0.450462 acc:0.584101

2019-12-15 18:14:21 Epoch:9 Iter: 1216/1219 loss:0.744811 acc:0.293406

2019-12-15 18:14:22 Epoch:9 Iter: 1217/1219 loss:0.387612 acc:0.735630

2019-12-15 18:14:23 Epoch:9 Iter: 1218/1219 loss:0.767146 acc:0.364759

the final keras training process is like:

685/690 [============================>.] - ETA: 2s - loss: 0.4940 - acc: 0.7309

686/690 [============================>.] - ETA: 1s - loss: 0.4941 - acc: 0.7306

687/690 [============================>.] - ETA: 1s - loss: 0.4939 - acc: 0.7308

688/690 [============================>.] - ETA: 0s - loss: 0.4942 - acc: 0.7303

689/690 [============================>.] - ETA: 0s - loss: 0.4943 - acc: 0.7302

I don't know, but I also have better results with Keras, although PyTorch is waaaay faster. — Daniel Möller
In my test, Keras behaves both faster and better than Pytorch, which is a bit weird as Pytorch was always reported to be faster than Keras. Maybe some inner optimization in Keras? — daifeng
Maybe you're not using PyTorch in the best way? All comparisons I did showed PyTorch at least double speed. — Daniel Möller
Could you share a training example of Pytorch? I have tried my test many times (using torch.backends.cudnn.benchmark=True, num_workers>0, pin_memory=True, etc. ), however, Pytorch version is always much slower. — daifeng
Did you set your tensors to cuda? (You must explicitly do that, otherwise everything will be CPU). Do you do any iteration other than epochs and batch? (Iterating tensors is always a bad idea). Do you set your model to eval before the evaluation step (this avoids unnecessary backpropagation procedures). — Daniel Möller

Separius Separius · Accepted Answer · 2019-12-16T10:20:03

Well, it's pretty hard to say without any code snippets. that being said, in general, initialization is way more important than you might think. I'm sure that the default initialization of pytorch is different from keras and I had similar issues in the past.

Another thing to check is the optimizer parameters, make sure that not only you are using the same optimizer(sgd, adam, ...) but also with the same parameters(lr, beta, momentum, ...)

Why Keras behave better than Pytorch under the same network configuration?

1 Answers