I am trying to implement End to End Memory Network using Pytorch and BabI dataset. The network architecture is :
MemN2N (
(embedding_A): Embedding(85, 120, padding_idx=0)
(embedding_B): Embedding(85, 120, padding_idx=0)
(embedding_C): Embedding(85, 120, padding_idx=0)
(match): Softmax ()
85 is the vocabulary size and 120 is embedding size. Loss function is cross entropy and optimizer is RmsProp. The results is
Epoch Train Loss Test Loss TrainAcc TestAcc
10 0.608 11.213 1.0 0.99
20 0.027 11.193 1.0 0.99
30 0.0017 11.740 1.0 0.99
40 0.0006 12.190 1.0 0.99
50 5.597e-05 12.319 1.0 0.99
60 3.366-05 12.379 1.0 0.99
70 2.72e-05 12.361 1.0 0.99
80 2.64e-05 12.333 1.0 0.99
90 2.63e-05 12.329 1.0 0.99
100 2.63e-05 12.329 1.0 0.99
110 2.63e-05 12.329 1.0 0.99
120 2.63e-05 12.329 1.0 0.99
Final TrainAcc TestAcc
1.0 0.999
I know the accuracy is good, but I wonder the behaviour of the test loss. Since training loss decreases, the test loss increases. The calculation is the same for each loss value. Shouldn't it decrease too? I used Task 1 to display, but the behaviour is the same with other tasks.
Do you have any idea about this behaviour?