Zero loss and validation loss in Keras CNN model

Question

I am attempting to run a crowd estimation model that classifies the images into three different broad categories depending on how many people there are in the images. 1200 images are used for training, with 20% of it used for validation. I used sentdex's tutorial on Youtube as reference to load the image data into the model; I load the images as a zip file, extract it and categorise them based on the folders they are in.

My issue is that whenever I attempt to train the model, I noticed that the loss and validation loss is always 0, which has resulted in the model not exactly training and the validation accuracy remaining the same throughout, as seen here. How can I get the loss to actually change? Is there something I am doing wrong in terms of implementation?

So far, what I have attempted is:

I tried to add a third convolutional layer, with little results.
I have also tried to change the last Dense layer to model.add(Dense(3)), but I got an error saying "Shapes (None, 1) and (None, 3) are incompatible"
I tried using a lower learning rate (0.001?), but the model ended up returning a 0 for validation accuracy
Changing the optimizer did not seem to generate any changes for me

Below is a snippet of my code so far showing my model attempt:

import keras.backend as K

logdir = "logs/scalars/" + datetime.now().strftime("%Y%m%d-%H%M%S")
tensorboard_callback = keras.callbacks.TensorBoard(log_dir=logdir)

X = X/255.0

model = Sequential()
model.add(Conv2D(64, (3,3), input_shape = X.shape[1:])) #[1:] to skip the -1
model.add(Activation("relu"))
model.add(Conv2D(64, (3,3), input_shape = X.shape[1:])) #[1:] to skip the -1
model.add(Activation("relu"))
model.add(MaxPooling2D(pool_size=(2,2)))

model.add(Conv2D(128, (3,3)))
model.add(Activation('relu'))
model.add(Conv2D(128, (3,3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2)))

model.add(Dropout(0.2))

model.add(Flatten()) 

model.add(Dense(128))
model.add(Activation('relu'))

#fully connected layer 
model.add(Dense(1))
model.add(Activation('softmax'))

opt = keras.optimizers.Adam(lr=0.01)

model.compile(loss='categorical_crossentropy', 
              optimizer = opt,
              metrics=['accuracy']) 

model.fit(x_train, y_train, batch_size = 100, epochs = 30, validation_data = (x_val, y_val),  callbacks=[tensorboard_callback], shuffle=True)

The full code can be found on Colab here.

You cannot use softmax with a single neuron, as this produces a constant 1.0 output, if you use binary classification then you need to use sigmoid activation and binary crossentropy loss. — Dr. Snoopy
Another issue is that you forgot to one-hot encode your labels, which would solve the issue of using 3 neurons (which is correct). — Dr. Snoopy
Thank you so much for the replies! I have adjusted the code accordingly and the loss values show up now! However, I noticed that the validation accuracy remains the same throughout the entire training process, and the model does not seem to be training much (if any) at all. What I have tried include switching between different optimisers and adding more layers, but to no avail. Are there any possible options I can take to try and get the accuracy to improve? — anikus

Berkay Berabi Berkay Berabi · Accepted Answer · 2021-05-13T19:16:52

Your final layer contains a single node, so you are outputting only a single number. However, you need to output 3 numbers because you have 3 classes. Each of those outputs corresponds to the unnormalized probability of that particular class. After softmax, you get the normalized probability distirbution.

Zero loss and validation loss in Keras CNN model

2 Answers