Huge difference between in accuracy between model.evaluate and model.predict for tensorflow CNN model

Question

I am using ImageDataGenerator(validation_split).flow_from_directory(subset) for my training and validation sets. So the training and validation data get their own generators.

After training my data, I run model.evaluate() on my validation generator and got about 75% accuracy. However, when I run model.predict() on that same validation generator, the accuracy falls to 1%.

The model is a multiclass CNN compiled on categorical crossentropy loss and accuracy metrics, which should default to categorical accuracy. # Edit: changed to categorical accuracy anyways.

# Compile

learning_rate = tf.keras.optimizers.schedules.PolynomialDecay(initial_learning_rate=initial_lr,
                                                              decay_steps=steps,
                                                              end_learning_rate=end_lr)

model.compile(optimizer=tf.keras.optimizers.RMSprop(learning_rate),
              loss='categorical_crossentropy',
              metrics=['categorical_accuracy'])

# Validation set evaluation

val_loss, val_accuracy = model.evaluate(val_generator,
                                        steps=int(val_size/bs)+1)
print('Accuracy: {}'.format(val_accuracy))

# Validation set predict

y_val = val_generator.classes

pred = model.predict(val_generator,
                     verbose=1
                     steps=int(val_size/bs)+1)

accuracy_TTA = np.mean(np.equal(y_val, np.argmax(pred, axis=-1)))
print('Accuracy: {}'.format(accuracy_TTA))

Could you update your question with the model.compile() statement, in particular, the losses and metrics — strider0160
Not sure how your generator works but are you sure that the classes between y_val and the images used in the model.predict line up correctly? — M Z
@MZ The y_val values all all in order by class. Anyways I don't get the error anymore. I think it has something to do with having two separate generator instances. — Isaac Ng
@MZ Each ImageDataGenerator feeds to train_gen and val_gen respectively. They get the same seed but different subset parameters, 'training' and 'validation'. The training one gets all the augmentations while the validation one just gets the rescale parameter. Might have to do with the augmentations or the separate instances. — Isaac Ng

Isaac Ng Isaac Ng · Accepted Answer · 2020-06-30T21:11:55

The problem with the varied accuracy values from model.evaluate and model.predict seems to be solved by creating separate instances of ImageDataGenerator() but with the same seed.

Also, sometimes during training KeyInterrupts or loading checkpoints, the generator instance should be reinitialised as the problem may occur.

Huge difference between in accuracy between model.evaluate and model.predict for tensorflow CNN model

1 Answers