Manually calculated accuracy different from evaluate's accuracy

Question

When I'm trying to calculate the accuracy after predict_generator() I end up having a different accuracy than the one calculated by evaluate_generator().

Not sure if relevant, but shuffle = True in the flow_from_generator() within the DataGenerator class.

idg_train and idg_test are ImageDataGenerator objects.

# TensorFlow, Keras and NumPy
from tensorflow import keras
from keras.optimizers import Adam
from keras.losses import categorical_crossentropy
import numpy as np

# Own libraries
from DataManipulation import create_dataset, DataGenerator
from ModelZoo import variable_conv_layers

# Data Generation
train_gen = DataGenerator(generator = idg_train, subset = 'training', **params)
val_gen = DataGenerator(generator = idg_train, subset = 'validation', **params)
val_gen = DataGenerator(generator = idg_test, **params)
y_true = test_gen.generator.classes

# Model preparation
model = variable_conv_layer(**model_params) # Creates model

model.compile(optimizer = Adam(lr = 1e-4),
                               loss = categorical_crossentropy,
                               metrics = ['accuracy'])

# Training
model.fit_generator(train_gen,
                    epochs = 1,
                    validation_data = val_gen,
                    workers = 8,
                    use_multiprocessing = True,
                    shuffle = True)

# Prediction
scores = model.predict_generator(test_gen,
                                 workers = 8,
                                 use_multiprocessing = True)

pred = np.argmax(scores, axis = -1)[:len(test_gen.generator.classes)]
acc = np.mean(pred == y_true)

print("%s: %1.3e" % ("Manual accuracy", acc))
print("Evaluated [loss, accuracy]:", model.evaluate_generator(test_gen,
                                                         workers = 8,
                                                         use_multiprocessing = True)

This prints the following:

Manual accuracy: 1.497e-01
Evaluated [loss, accuracy]: [0.308414297710572, 0.9838169642857143]

Clearly, the manually calculated accuracy is different from the one from evaluate_generator(). I've looked at this for hours on end and have no idea where the issue might be.

Thanks in advance!

Edit: Additionally, I tried creating a confusion matrix using sklearn.metrics.confusion_matrix(y_true, pred), which yields the following array:

[[407   0  70   1   8   1   0  57   0]
 [413   0  74  15   0  16   1  32   0]
 [230   0  40   0   0   4   4  32   0]
 [239   0  40   0   0   2   2  36   0]
 [282   0  34   0   0   7   1  39   0]
 [296   0  37   0   3   4   0  40   0]
 [377   0  39   2   8   8   0  42   0]
 [183   0  28   4   6   4   0  19   0]
 [283   0  46   6   5   6   0  33   0]]

It seems like it predicts a very large majority as '0' when simply using np.argmax(scores, axis = -1) for some reason.

mahbubcseju mahbubcseju · Accepted Answer · 2020-07-20T17:04:40

Just reset the test_gen before using second time:

test_gen.reset()
print("Evaluated [loss, accuracy]:", model.evaluate_generator(
    test_gen,
    workers = 8,
    use_multiprocessing = True
)

Manually calculated accuracy different from evaluate's accuracy

1 Answers