Keras: Different training and validation results on same dataset using batch normalization

Question

I have high classification on training but low classification on validation even though I am using the same dataset. This problem only occurred when using batch normalization. Am I implementing it correctly?

Code using batch normalization:

train_datagen = ImageDataGenerator(rescale=1. / 255)
train_generator = train_datagen.flow_from_directory(
directory = '../ImageFilter/Images/',
target_size=(img_rows, img_cols),
batch_size=batch_size,
class_mode='categorical',
shuffle=True)

model = Sequential()

model.add(Convolution2D(16,
kernel_size=(3, 3),
strides=(2,2),
activation='relu',
input_shape=(img_rows, img_cols, 3)))
model.add(BatchNormalization())
model.add(MaxPooling2D((2,2), strides=(2,2)))

model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dense(2, activation='softmax'))

model.compile(loss='categorical_crossentropy',
optimizer='adam',
metrics = ['accuracy'])
epochs = 100
patience = 6
n_images = 91
file_path = 'imageFilterCNN.hdf5'

checkpointer = ModelCheckpoint(file_path, monitor='val_acc', verbose=0, save_best_only=True)
earlystop = EarlyStopping(monitor='val_acc', patience=patience, verbose=0, mode='auto')
tboard = TensorBoard('./logs')

model.fit_generator(
train_generator,
steps_per_epoch=n_images// batch_size,
epochs=epochs,
callbacks=[checkpointer, earlystop, tboard],
validation_data=train_generator,
validation_steps=n_images// batch_size)

Outputs: Epoch 15/100 11/11 [==============================] - 2s - loss: 0.0092 - acc: 1.0000 - val_loss: 3.0321 - val_acc: 0.5568

And what is weird about these results? Training accuracy is always gonna be better than testing; do you have any reasons to expect generalization to be simple? — lejlot
I'm testing on the same dataset that it is training on. So the results should be fairly similar which they are not. — mcudic

Lukasz Tracewski Lukasz Tracewski · Accepted Answer · 2017-06-15T17:57:51

You are applying batch normalization on the first (input) layer, which is most likely a mistake. Why would you do this? Your input are images and you know very well how to normalize your input - in fact that's what you're doing in the first line. It makes no sense to apply a normalization again.

Batch normalization is applied in hidden layers so that the data does not get too big or too small. There's no simple, universal way of doing this, hence this special layer as introduced by Sergey Ioffe and Christian Szegedy.

Keras: Different training and validation results on same dataset using batch normalization

1 Answers