I'm working with a video classification of 5 classes and using TimeDistributed CNN + RNN model. The training dataset contains 70 videos containing 20 frames each per class. The validation dataset contains 15 videos containing 20 frames each per class. The test dataset contains 15 videos containing 20 frames each per class. The batch size I used is 64. So, in total, I'm working with 500 videos. I compiled the model using RmsProp optimizer and categorical cross_entropy loss.
I've trained the model with 65 epochs.But I notice a strange fact that, validation accuracy gets higher than training accuracy at first epoch.However, at the rest of the epochs, the curve looks much satisfactory.
My model is:
model = Sequential()
input_shape=(20, 128, 128, 3)
model.add(BatchNormalization(input_shape=(20, 128, 128, 3)))
model.add(TimeDistributed(Conv2D(32, (3, 3), strides=(1, 1),activation='relu', padding='same')))
model.add(TimeDistributed(MaxPooling2D((2, 2))))
model.add(TimeDistributed(Conv2D(64, (3, 3), strides=(1, 1),activation='relu', padding='same')))
model.add(TimeDistributed(Conv2D(128, (3, 3), strides=(1, 1),activation='relu', padding='same')))
model.add(TimeDistributed(Conv2D(128, (3, 3), strides=(1, 1),activation='relu', padding='same')))
model.add(TimeDistributed(MaxPooling2D((2, 2))))
model.add(TimeDistributed(Conv2D(256, (3, 3), strides=(1, 1),activation='relu', padding='same')))
model.add(TimeDistributed(MaxPooling2D((2, 2))))
model.add(TimeDistributed(Flatten()))
model.add(LSTM(256, activation='relu', return_sequences=False))
model.add((Dense(128,activation='relu')))
model.add(Dense(5, activation='softmax'))
Can anyone tell me why validation accuracy gets higher than training accuracy at first epoch?