Why at first epoch validation accuracy is higher than training accuracy?

Question

I'm working with a video classification of 5 classes and using TimeDistributed CNN + RNN model. The training dataset contains 70 videos containing 20 frames each per class. The validation dataset contains 15 videos containing 20 frames each per class. The test dataset contains 15 videos containing 20 frames each per class. The batch size I used is 64. So, in total, I'm working with 500 videos. I compiled the model using RmsProp optimizer and categorical cross_entropy loss.

I've trained the model with 65 epochs.But I notice a strange fact that, validation accuracy gets higher than training accuracy at first epoch.However, at the rest of the epochs, the curve looks much satisfactory.

My model is:

model = Sequential()

input_shape=(20, 128, 128, 3)

model.add(BatchNormalization(input_shape=(20, 128, 128, 3)))

model.add(TimeDistributed(Conv2D(32, (3, 3), strides=(1, 1),activation='relu', padding='same')))
model.add(TimeDistributed(MaxPooling2D((2, 2))))
model.add(TimeDistributed(Conv2D(64, (3, 3), strides=(1, 1),activation='relu', padding='same')))
model.add(TimeDistributed(Conv2D(128, (3, 3), strides=(1, 1),activation='relu', padding='same')))
model.add(TimeDistributed(Conv2D(128, (3, 3), strides=(1, 1),activation='relu', padding='same')))
model.add(TimeDistributed(MaxPooling2D((2, 2))))
model.add(TimeDistributed(Conv2D(256, (3, 3), strides=(1, 1),activation='relu', padding='same')))
model.add(TimeDistributed(MaxPooling2D((2, 2))))

model.add(TimeDistributed(Flatten()))

model.add(LSTM(256, activation='relu', return_sequences=False))
model.add((Dense(128,activation='relu')))

model.add(Dense(5, activation='softmax'))

Can anyone tell me why validation accuracy gets higher than training accuracy at first epoch?

ErikXIII ErikXIII · Accepted Answer · 2020-07-13T06:39:27

My guess is that because you only have 5 classes, by just guessing on one for all frames will give you an accuracy of 20%. Now you have around 32%, so slightly better.

I usually don't look at the initial accuracy as the model is really bad. (actually remove the first N (in this case maybe 20/30) epochs from the plot to better show the performance).

Check the confusion matrix after the first epoch and you will probably only be good at a few classes.

Why at first epoch validation accuracy is higher than training accuracy?

1 Answers