Not able to find a proper CNN

Question

I am using Keras Tensorflow in Colab and I am working on the oxford_flowers102 dataset. Task is image classification. With quite many categories (102) and not so many images per class. I tried to build different neural networks, starting from simple one to more complex ones, with and without image augmentation, dropout, hyper parameter tuning, batch size adjustment, optimizer adjustment, image resizing size .... however, I was not able to find a good CNN which gives me an accetable val_accuracy and finally a good test accuracy. Up to now my max val_accuracy I was able to get was poor 0.3x. I am pretty sure that it is possible to get better results, I am somehow just not finding the right CNN setup. My code so far:

import tensorflow as tf
from keras.models import Model
import tensorflow_datasets as tfds
import tensorflow_hub as hub

# update colab tensorflow_datasets to current version 3.2.0, 
# otherwise tfds.load will lead to error when trying to load oxford_flowers102 dataset

!pip install tensorflow_datasets --upgrade

# restart runtime

oxford, info = tfds.load("oxford_flowers102", with_info=True, as_supervised=True)

train_data=oxford['train']
test_data=oxford['test']
validation_data=oxford['validation']

IMG_SIZE = 224

def format_example(image, label):
  image = tf.cast(image, tf.float32)
  image = image*1/255.0
  image = tf.image.resize(image, (IMG_SIZE, IMG_SIZE))
  return image, label

train = train_data.map(format_example)
validation = validation_data.map(format_example)
test = test_data.map(format_example)

BATCH_SIZE = 32
SHUFFLE_BUFFER_SIZE = 1000

train_batches = train.shuffle(SHUFFLE_BUFFER_SIZE).batch(BATCH_SIZE)
test_batches = test.batch(BATCH_SIZE)
validation_batches = validation.batch(BATCH_SIZE)

First model I tried:

model = tf.keras.Sequential([
      tf.keras.layers.Conv2D(32, 3, activation='relu', input_shape=(IMG_SIZE, IMG_SIZE, 3)),
      tf.keras.layers.MaxPooling2D(),
      tf.keras.layers.Flatten(),
      tf.keras.layers.Dense(64, activation='relu'),
      tf.keras.layers.Dense(102)
  ])

model.compile(optimizer='adam',
            loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
            metrics=['accuracy'])

history = model.fit(train_batches, validation_data=validation_batches, epochs=20)

Epoch 20/20 32/32 [==============================] - 4s 127ms/step - loss: 2.9830 - accuracy: 0.2686 - val_loss: 4.8426 - val_accuracy: 0.0637

When I run it for more epochs, it overfits, val_loss goes up, val_accuracy does not go up.

Second model (very simple one):

model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(IMG_SIZE, IMG_SIZE, 3)),
  tf.keras.layers.Dense(128,activation='relu'),
  tf.keras.layers.Dense(102)
])

model.compile(optimizer='adam',
            loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
            metrics=['accuracy'])

 history = model.fit(train_batches, validation_data=validation_batches, epochs=20)

Does not work at all, loss stays at 4.6250.

Third model:

model = tf.keras.Sequential([
    tf.keras.layers.Conv2D(64, (3,3), activation='relu', input_shape=(IMG_SIZE, IMG_SIZE, 3)),
    tf.keras.layers.MaxPooling2D(2,2),
    tf.keras.layers.Conv2D(64, (3,3), activation='relu'),
    tf.keras.layers.MaxPooling2D(2,2),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(128, activation='relu'),
    tf.keras.layers.Dense(102)
])

base_learning_rate = 0.0001

model.compile(optimizer=tf.optimizers.RMSprop(lr=base_learning_rate),
  loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
  metrics=['accuracy'])

 history = model.fit(train_batches, validation_data=validation_batches, epochs=20)

Model overfits. Val_accuracy not above 0.15.

I added dropout layers to this model (trying differet rates) and also adjusted the kernels. However, no real improvement. Also tried adam optimizer.

Fourth model:

model = tf.keras.Sequential([
    tf.keras.layers.Conv2D(128, (3,3), activation='relu', input_shape=(IMG_SIZE, IMG_SIZE, 3)),
    tf.keras.layers.Dropout(0.4),
    tf.keras.layers.MaxPooling2D(2,2),
    tf.keras.layers.Conv2D(256, (3,3), activation='relu'),
    tf.keras.layers.Dropout(0.4),
    tf.keras.layers.MaxPooling2D(2,2),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(512, activation='relu'),
    tf.keras.layers.Dropout(0.4),
    tf.keras.layers.Dense(102)
])

model.compile(optimizer='adam',
            loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
            metrics=['accuracy'])

history = model.fit(train_batches, validation_data=validation_batches, epochs=20)

Same problem again, no good val_accuracy. Also tried it with RMSprop optimizer. Not able to get a val_accuracy higher than 0.2.

Fifth model:

model = tf.keras.Sequential([
    tf.keras.layers.Conv2D(64, (3,3), activation='relu', input_shape=(IMG_SIZE, IMG_SIZE, 3)),
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.MaxPooling2D(2,2),
    tf.keras.layers.Conv2D(64, (2,2), activation='relu'),
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.GlobalAveragePooling2D(),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(256, activation='relu'),
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.Dense(128, activation='relu'),
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.Dense(102)
])

base_learning_rate = 0.001

model.compile(optimizer=tf.optimizers.RMSprop(lr=base_learning_rate),
  loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
  metrics=['accuracy'])

history = model.fit(train_batches, validation_data=validation_batches, epochs=250)

val_accuracy at the highest around 0.3x. Also tried it with adam.

When I tried it with transfer learning, using Mobilenet I immediately got 0.7x within 10 epochs. So I wondered why I am not able to get close to this with a self-built CNN? I do not expect 0.8 or to beat Mobilenet. But where is my mistake? How would a self-built CNN look like with which I can get lets say 0.6-0.7 val_accuracy?

Why do you think that you need to get higher accuracy? More over, what is the observation on training and validation loss? — Ashwin Geet D'Sa
My models are not the best. With better models it is possible to get way higher accuracy. I am asking for someone who can show me a better model architecture. I can see that in quite some of my cases training loss goes down while validation loss gets up, due to overfitting as images per class is quite low. — BertHobe
Here are lots of model architectures that achieve > 99% accuracy on this dataset. Many are based on ResNets, which are fairly straightforward to implement from scratch if you want to do that. paperswithcode.com/sota/… As noted in the short descriptions, most but not all use transfer learning to achieve these results — DerekG
I do not want to use transfer learning nor make use of these sophisticated models. Nor did I ask for 0.99 accuracy models. I asked for better architecture of my models to get 0.6x-0.7x as I am sure that someone into this topic can quite easily show me a model with which I can get to 0.6x-0.7x. — BertHobe
There are several models there that achieve good performance (>95%) without transfer learning with affiliated papers. — DerekG

DerekG DerekG · Accepted Answer · 2020-09-17T16:53:39

It's not entirely clear from your question: are you concerned that your model architecture is inferior to that of say MobileNet's, or that your performance is not comparable to that of transfer learning with MobileNet?

In response to the first, in general, the popular architectures such as ResNet, MobileNet, AlexNet are very cleverly crafted networks and so are likely to better represent data than a hand-defined network unless you do something very clever yourself.

In response to the second, the more complex a model gets, the more data it needs to train it well so that it is not underfit or overfit to the data. This poses a problem on datasets such as your (with a few thousand images) because it is difficult for a complex CNN to learn meaningful rules (kernels) for extracting information from images in general without instead learning rules for memorizing the limited set of training inputs. In summary, you want a larger model to make more accurate predictions, but this in turn requires more data, which sometimes you don't have. I suspect that if you used an untrained MobileNet versus your untrained network on the oxford flowers102 dataset, you'd see similarly poor performance.

Enter transfer learning. By pretraining relatively large models on relatively huge datsets (most are pretrained on ImageNet which has millions of images), the model is able to learn to extract relevant information from arbitrary images much better than it would be on a smaller dataset. These general rules for feature extraction apply to your smaller dataset as well, so with just a bit of fine-tuning the transfer learning model will likely far outperform any model trained solely on your dataset.

Not able to find a proper CNN

1 Answers