I have a very basic multiclass CNN model for classifying vehicles into 4 classes [pickup, sedan, suv, van]
that I have written using Tensorflow 2.0 tf.keras:
he_initialiser = tf.keras.initializers.VarianceScaling()
model = tf.keras.Sequential()
model.add(tf.keras.layers.Conv2D(32, kernel_size=(3,3), input_shape=(3,128,128), activation='relu', padding='same', data_format='channels_first', kernel_initializer=he_initialiser))
model.add(tf.keras.layers.Conv2D(32, kernel_size=(3,3), activation='relu', padding='same', data_format='channels_first', kernel_initializer=he_initialiser))
model.add(tf.keras.layers.MaxPooling2D((2, 2), data_format=cfg_data_fmt))
model.add(tf.keras.layers.Conv2D(64, kernel_size=(3,3), activation='relu', padding='same', data_format='channels_first', kernel_initializer=he_initialiser))
model.add(tf.keras.layers.Conv2D(64, kernel_size=(3,3), activation='relu', padding='same', data_format='channels_first', kernel_initializer=he_initialiser))
model.add(tf.keras.layers.MaxPooling2D((2, 2), data_format=cfg_data_fmt))
model.add(tf.keras.layers.Conv2D(128, kernel_size=(3,3), activation='relu', padding='same', data_format='channels_first', kernel_initializer=he_initialiser))
model.add(tf.keras.layers.Conv2D(128, kernel_size=(3,3), activation='relu', padding='same', data_format='channels_first', kernel_initializer=he_initialiser))
model.add(tf.keras.layers.MaxPooling2D((2, 2), data_format='channels_first'))
model.add(tf.keras.layers.Flatten(data_format='channels_first'))
model.add(tf.keras.layers.Dense(128, activation='relu', kernel_initializer=he_initialiser))
model.add(tf.keras.layers.Dense(128, activation='relu', kernel_initializer=he_initialiser))
model.add(tf.keras.layers.Dense(4, activation='softmax', kernel_initializer=he_initialiser))
I use the following configuration for training:
- Image size: 3x128x128 (planar data)
- Number of epochs: 45
- Batch size: 32
- Loss function:
tf.keras.losses.CategoricalCrossentropy(from_logits=True)
- Optimizer:
optimizer=tf.optimizers.Adam
- training data size: 67.5% of all data
- validation data size: 12.5% of all data
- test data size: 20% of all data
I have an unbalanced dataset, which has the following distribution:
pickups: 1202
sedans: 1954
suvs: 2510
vans: 196
For this reason I have used class weights to mitigate this imbalance:
pickup_weight: 4.87
sedan_weight: 3.0
suv_weight: 2.33
van_weight: 30.0
This seems like a small dataset but I am using this for fine tuning since I first train the model on a larger dataset of 16k images of these classes, though with images of vehicles taken from different angles as compared to my fine tune dataset.
Now the questions that I'm having stem from the following observations:
At the end of the final epoch, the results returned by model.fit
gave:
- training accuracy of 0.9229
- training loss of 3.5055
- validation accuracy of 0.7906
- validation loss of 0.9382
- training precision for class pickup of 0.9186
- training precision for class sedan of 0.9384
- training precision for class suv of 0.9196
- training precision for class van of 0.8378
- validation precision for class pickup of 0.7805
- validation precision for class sedan of 0.8026
- validation precision for class suv of 0.0.8029
- validation precision for class van of 0.4615
The results returned by model.evaluate
on my hold-out test set after training gave similar accuracy and loss values as the corresponding validation values in the last epoch and the precision values for each class were also nearly identical to the corresponding validation precisions.
The lower, but still high enough, validation accuracy leads me to believe there is no overfitting problem as the model can generalize.
My first question is how can the validation loss be so much lower than the training loss?
Furthermore, when I created a confusion matrix using:
test_images = np.array([x[0].numpy() for x in list(labeled_ds_test)])
test_labels = np.array([x[1].numpy() for x in list(labeled_ds_test)])
test_predictions = model.predict(test_images, batch_size=32)
print(tf.math.confusion_matrix(tf.argmax(test_labels, 1), tf.argmax(test_predictions, 1)))
The results I got back were:
tf.Tensor(
[[ 42 85 109 3]
[ 72 137 177 4]
[ 91 171 228 11]
[ 9 12 16 1]], shape=(4, 4), dtype=int32)
This shows an accuracy of only 35%!!
My second question is therefore this: how can the accuracy given by model.predict
be so small when during training and evaluation the values seemed to indicate that my model was quite precise with its predictions?
Am I using the predict method wrong or is my theoretical understanding of what's expected to happen completely off?
I am at a bit of a loss here and would greatly appreciate any feedback. Thanks for reading this.