Now i am facing a problem in tensorflow: I have a network consisting of 6 convolutional layers (each with batch normalization, and the last convolution is followed by an average pooling to make the output shape Nx1x1xC), aiming to classify one image into a category. Everything is fine during training: - training samples are about 150000 - validation samples during training are about 12000
I have trained totally 50000 iterations with mini-batch size of 6. - During training, the training loss is getting lower always (from about 2.6 at beginning to about 0.3 at iteration 50000), - and the validation accuracy is getting higher and saturated after about 40000 iterations (from 60% at beginning to 72% at iteration 50000)
BUT when I use the learned weights of iteration 50000 on the same validation samples to test, the overall accuracy comes at only about 40%. I have googled if there someone who have faced similar problems. Some said the decay of moving average in batch normalization may be the cause.
The default decay in tf.contrib.layers.batch_norm is 0.999. Then I have trained with decay of 0.9, 0.99, 0.999. the result of OA on validation samples during test are 70%, 30%, 39%. Although decay of 0.9 have the best result, it is still lower than the OA on validation during training.
I am writing to ask if anyone have such similar problems, and do you have any idea what could be the cause?
best wishes,