When I'm trying to calculate the accuracy after predict_generator()
I end up having a different accuracy than the one calculated by evaluate_generator()
.
Not sure if relevant, but shuffle = True
in the flow_from_generator()
within the DataGenerator
class.
idg_train
and idg_test
are ImageDataGenerator
objects.
# TensorFlow, Keras and NumPy
from tensorflow import keras
from keras.optimizers import Adam
from keras.losses import categorical_crossentropy
import numpy as np
# Own libraries
from DataManipulation import create_dataset, DataGenerator
from ModelZoo import variable_conv_layers
# Data Generation
train_gen = DataGenerator(generator = idg_train, subset = 'training', **params)
val_gen = DataGenerator(generator = idg_train, subset = 'validation', **params)
val_gen = DataGenerator(generator = idg_test, **params)
y_true = test_gen.generator.classes
# Model preparation
model = variable_conv_layer(**model_params) # Creates model
model.compile(optimizer = Adam(lr = 1e-4),
loss = categorical_crossentropy,
metrics = ['accuracy'])
# Training
model.fit_generator(train_gen,
epochs = 1,
validation_data = val_gen,
workers = 8,
use_multiprocessing = True,
shuffle = True)
# Prediction
scores = model.predict_generator(test_gen,
workers = 8,
use_multiprocessing = True)
pred = np.argmax(scores, axis = -1)[:len(test_gen.generator.classes)]
acc = np.mean(pred == y_true)
print("%s: %1.3e" % ("Manual accuracy", acc))
print("Evaluated [loss, accuracy]:", model.evaluate_generator(test_gen,
workers = 8,
use_multiprocessing = True)
This prints the following:
Manual accuracy: 1.497e-01
Evaluated [loss, accuracy]: [0.308414297710572, 0.9838169642857143]
Clearly, the manually calculated accuracy is different from the one from evaluate_generator()
. I've looked at this for hours on end and have no idea where the issue might be.
Thanks in advance!
Edit: Additionally, I tried creating a confusion matrix using sklearn.metrics.confusion_matrix(y_true, pred)
, which yields the following array:
[[407 0 70 1 8 1 0 57 0]
[413 0 74 15 0 16 1 32 0]
[230 0 40 0 0 4 4 32 0]
[239 0 40 0 0 2 2 36 0]
[282 0 34 0 0 7 1 39 0]
[296 0 37 0 3 4 0 40 0]
[377 0 39 2 8 8 0 42 0]
[183 0 28 4 6 4 0 19 0]
[283 0 46 6 5 6 0 33 0]]
It seems like it predicts a very large majority as '0' when simply using np.argmax(scores, axis = -1)
for some reason.