0
votes

I have a very basic multiclass CNN model for classifying vehicles into 4 classes [pickup, sedan, suv, van] that I have written using Tensorflow 2.0 tf.keras:

he_initialiser = tf.keras.initializers.VarianceScaling()
model = tf.keras.Sequential()
model.add(tf.keras.layers.Conv2D(32, kernel_size=(3,3), input_shape=(3,128,128), activation='relu', padding='same', data_format='channels_first', kernel_initializer=he_initialiser))
model.add(tf.keras.layers.Conv2D(32, kernel_size=(3,3), activation='relu', padding='same', data_format='channels_first', kernel_initializer=he_initialiser))
model.add(tf.keras.layers.MaxPooling2D((2, 2), data_format=cfg_data_fmt))
model.add(tf.keras.layers.Conv2D(64, kernel_size=(3,3), activation='relu', padding='same', data_format='channels_first', kernel_initializer=he_initialiser))
model.add(tf.keras.layers.Conv2D(64, kernel_size=(3,3), activation='relu', padding='same', data_format='channels_first', kernel_initializer=he_initialiser))
model.add(tf.keras.layers.MaxPooling2D((2, 2), data_format=cfg_data_fmt))
model.add(tf.keras.layers.Conv2D(128, kernel_size=(3,3), activation='relu', padding='same', data_format='channels_first', kernel_initializer=he_initialiser))
model.add(tf.keras.layers.Conv2D(128, kernel_size=(3,3), activation='relu', padding='same', data_format='channels_first', kernel_initializer=he_initialiser))
model.add(tf.keras.layers.MaxPooling2D((2, 2), data_format='channels_first'))
model.add(tf.keras.layers.Flatten(data_format='channels_first'))
model.add(tf.keras.layers.Dense(128, activation='relu', kernel_initializer=he_initialiser))
model.add(tf.keras.layers.Dense(128, activation='relu', kernel_initializer=he_initialiser))
model.add(tf.keras.layers.Dense(4, activation='softmax', kernel_initializer=he_initialiser))

I use the following configuration for training:

  • Image size: 3x128x128 (planar data)
  • Number of epochs: 45
  • Batch size: 32
  • Loss function: tf.keras.losses.CategoricalCrossentropy(from_logits=True)
  • Optimizer: optimizer=tf.optimizers.Adam
  • training data size: 67.5% of all data
  • validation data size: 12.5% of all data
  • test data size: 20% of all data

I have an unbalanced dataset, which has the following distribution:

pickups: 1202
sedans: 1954
suvs: 2510
vans: 196

For this reason I have used class weights to mitigate this imbalance:

pickup_weight: 4.87
sedan_weight: 3.0
suv_weight: 2.33
van_weight: 30.0

This seems like a small dataset but I am using this for fine tuning since I first train the model on a larger dataset of 16k images of these classes, though with images of vehicles taken from different angles as compared to my fine tune dataset.

Now the questions that I'm having stem from the following observations:

At the end of the final epoch, the results returned by model.fit gave:

  • training accuracy of 0.9229
  • training loss of 3.5055
  • validation accuracy of 0.7906
  • validation loss of 0.9382
  • training precision for class pickup of 0.9186
  • training precision for class sedan of 0.9384
  • training precision for class suv of 0.9196
  • training precision for class van of 0.8378
  • validation precision for class pickup of 0.7805
  • validation precision for class sedan of 0.8026
  • validation precision for class suv of 0.0.8029
  • validation precision for class van of 0.4615

The results returned by model.evaluate on my hold-out test set after training gave similar accuracy and loss values as the corresponding validation values in the last epoch and the precision values for each class were also nearly identical to the corresponding validation precisions.

The lower, but still high enough, validation accuracy leads me to believe there is no overfitting problem as the model can generalize.

My first question is how can the validation loss be so much lower than the training loss?

Furthermore, when I created a confusion matrix using:

test_images = np.array([x[0].numpy() for x in list(labeled_ds_test)])
test_labels = np.array([x[1].numpy() for x in list(labeled_ds_test)])
test_predictions = model.predict(test_images, batch_size=32)
print(tf.math.confusion_matrix(tf.argmax(test_labels, 1), tf.argmax(test_predictions, 1)))

The results I got back were:

tf.Tensor(
[[ 42  85 109   3]
 [ 72 137 177   4]
 [ 91 171 228  11]
 [  9  12  16   1]], shape=(4, 4), dtype=int32)

This shows an accuracy of only 35%!!

My second question is therefore this: how can the accuracy given by model.predict be so small when during training and evaluation the values seemed to indicate that my model was quite precise with its predictions?

Am I using the predict method wrong or is my theoretical understanding of what's expected to happen completely off?

I am at a bit of a loss here and would greatly appreciate any feedback. Thanks for reading this.

3
When training accuracy is high, and prediction accuracy is low that's a sure sign of overfitting. I recommend looking into the causes and solutions for overfitting.gallen

3 Answers

0
votes

I aggree @gallen. There are several reason that can cause overfitting and several methods for preventing overfitting. One of the good solutions is adding dropout between layers. You can see stackoverflow answer and towardsdatascience article

0
votes

There is an overfitting of course but let's answer the questions.

For the first question the low number of validation data plays a role why it's loss is less than the training data as the loss is the sum of all differences in y_true and y_pred.

As for the second question how can the test accuracy be lower than the expected even if validation doesn't show any sign of overfitting?

The distribution of the validation set must be the same as the test set for it not to be miss leading.

So my advice is check the distribution of the train, validation, test datasets separately. make sure that they are the same.

0
votes

you need to divide your dataset properly like, 70% training and 30% validation and then check your model on new set of data as test data this might be helpful as machine learning is all about trial and error.