Can my prediction score be too high (100 % for one class) on keras with tensorflow as backend?

Question

What I am doing
I am training a Sequential() convolutional neural network (CNN) using Keras with tensorflow-gpu as backend for image recognition. I have 3 classes to classify.

What I am using
Ubuntu 16.04
PyCharm Community 2018.1.4
--> Python 3.5
Keras 2.2.0
Tensorflow-GPU 1.8.0
60000 Training images, 100x100 pixels (3 color-channels) ("training_set")
20000 Evaluation images, same dimensions ("evaluation_set") (evaluation set for testing different hyperparameters)
20000 Test images, same dimensions ("test_set") (test set for final test of accuracy)

What is working
I'm training my a network with a batch_size of 50 over 20 epochs (after 20 epochs my loss stagnates). I use a dropout of 0.25, shuffle is set True.

Architecture:

Convolution2D
MaxPooling2D
Convolution2D
MaxPooling2D
Flatten
Dense(100)
Dense(3)

What is worries me
During training I get a training_accuracy of about 0.9983, during evaluation my evaluation_accuracy is 0.9994 which seems reasonable. But when looking at individual prediction scores I discover many images with a prediction of

[0. 0. 1.]

(for class 1, 2 and 3), among others which match my expectaion of e.g.

[1.28186484e-26 6.89246145e-21 1.00000000e+00]

I am strictly seperating my datasets (train, evaluation, test; see above), so no individual image is in more than one dataset. But I created my dataset by taking images every 1 second from about 70 different video-files, so there is not too much variance in the individual images coming from one video-file.

Is it possible that a score of [0. 0. 1.] is due to rounding? But why are other scores [... ... 1.0000000e+00] (which I assume is due to rounding)? Do I have o problem with overfitting here? Should I be worried at all?

def create_model(training_input):    # Where training_input is a numpy.array containing the training_data
    model.add(Conv2D(32, (3, 3), padding="same", name="fistconv2D", input_shape=training_input.shape[1:], activation="relu", data_format="channels_last"))

    model.add(MaxPooling2D(data_format="channels_last", name="fistmaxpool"))
    model.add(Dropout(0.25, name="firstdropout"))

    model.add(Conv2D(32, (3, 3), padding="same", name="secondconv2D", activation="relu", data_format="channels_last"))
    model.add(MaxPooling2D(data_format="channels_last", name="secondmaxpool"))
    model.add(Dropout(0.25, name="seconddropout"))

    model.add(Flatten(name="Flattenlayerfirst"))
    model.add(Dense(100, activation="relu", name="firstDenseLayer"))
    model.add(Dropout(0.25, name="thirddropout"))
    model.add(Dense(3, activation="softmax", name="secondDenseLayer"))

    model.compile(optimizer='RMSprop', loss='categorical_crossentropy', metrics=['accuracy'])


def train(input_training, labels_training):    
    # Where training_input is a numpy.array containing the training_data    
    # labels_training is as well a numpy.array containing the corresponding labels
    model = create_model(input_training)
    history = model.fit(input_training, labels_training, epochs=20, shuffle=True, batch_size=50)

To be honest I don't see a problem, it could just be that the test set is very similar to the training set. — Dr. Snoopy
@MatiasValdenegro But how can the accuracy be plane 1, even with dropout? — benjamin

Mornor Mornor · Accepted Answer · 2018-06-15T11:52:04

You may indeed be overfitting.

What you could do is to train your newtwork with Kera's callbacks like Earlystopping or ModelCheckpoint to stop training your network when it's not improving much.

The documenation is here: Keras callback.

Which you could implement as an example like:

# Stop the training if delta val loss after 2 Epochs < 0.001
early_stopping = EarlyStopping(monitor='val_loss', min_delta=0.0001, patience=2, verbose=0, mode='auto')
model_checkpoint = ModelCheckpoint("model.h5", monitor='val_loss', verbose=0, save_best_only=False, save_weights_only=False, mode='auto')

model.fit_generator(
    generator=get_next_batch(X_train, y_train),
    steps_per_epoch=200,
    epochs=EPOCHS,
    validation_data=get_next_batch(X_val, y_val),
    validation_steps=len(X_val)
    callbacks=[early_stopping, model_checkpoint]
)

Can my prediction score be too high (100 % for one class) on keras with tensorflow as backend?

1 Answers