The accuracy that the fit generator outputs in Keras differs from the manually calculated accuracy

Question

It worked fine when I used fit, but when I used fit_generator, I got a problem.

I used the call back method to find the confusion matrix at the end of each train epoch.

However, the accuracy obtained from the confusion matrix and the validation accuracy output from keras differ.

My code is below.

    metrics = Valid_checker(model_name, args.patience, (x_valid, y_valid), x_length_valid)       
    model.compile(optimizer=optimizers.RMSprop(lr=args.lr),
      loss=[first_loss],
      loss_weights=[1.],
      metrics={'capsnet': 'accuracy'})
    callback_list = [lr_decay, metrics]

    model.fit_generator(
                no_decoder_generator(x_train, y_train),
                steps_per_epoch=len(x_train),
                epochs=args.epochs,
                validation_data=no_decoder_generator(x_valid, y_valid),
                validation_steps=len(x_valid),
                callbacks=callback_list,
                #class_weight=class_weights,
                verbose=1)

Valid check is my callback method. no_decoder_generator is my decoder generator. and my batch size of train and validation is 1.

This is my Valid_check class. (below)

class Valid_checker(keras.callbacks.Callback):
        def __init__(self, model_name, patience, val_data, x_length):
            super().__init__()
            self.best_score = 0
            self.patience = patience
            self.current_patience = 0 
            self.model_name = model_name
            self.validation_data = val_data
            self.x_length = x_length


        def on_epoch_end(self, epoch, logs={}):
            X_val, y_val = self.validation_data
            if args.decoder==1:
                y_predict, x_predict = model.predict_generator(no_decoder_generator(X_val, y_val), steps=len(X_val))
                y_predict = np.asarray(y_predict)
                x_predict = np.asarray(x_predict)                       

            else:
                y_predict = np.asarray(model.predict_generator(predict_generator(X_val), steps=len(X_val)))

            y_val, y_predict = get_utterence_label_pred(y_val, y_predict, self.x_length )
            cnf_matrix = get_accuracy_and_cnf_matrix(y_val, y_predict)[1]
            val_acc_custom =  get_accuracy_and_cnf_matrix(y_val, y_predict)[0]
            war = val_acc_custom[0]
            uar = val_acc_custom[1]
            score = round(0.2*war+0.8*uar,2)

            loss_message=''
            # custom ModelCheckpoint & early stopping by using UAR            
            loss_message='loss: %s - acc: %s - val_loss: %s - val_acc: %s'%(round(logs.get('loss'),4), round(logs.get('acc'),4), round(logs.get('val_loss'),4), round(logs.get('val_acc'),4))
            log('[Epoch %03d/%03d]'%(epoch+1, args.epochs))
            log(loss_message)
            log('Confusion matrix:')
            log('%s'%cnf_matrix)
            log('Valid [WAR] [UAR] [Custom] : %s [%s]'%(val_acc_custom,score))

            if score > self.best_score :
                model.save_weights(model_name)
                log('Epoch %05d: val_uar_acc improved from %s to %s saving model to %s'%(epoch+1, self.best_score, score, self.model_name))
                self.best_score = score
                self.current_patience = 0

            else :
                self.current_patience+=1

            # early stopping
            if self.current_patience == (self.patience-1):
                self.model.stop_training = True
                log('Epoch %05d: early stopping' % (epoch + 1)) 
            return

It should be equal to val_acc output by keras and war. However, the value is different. Why does this happen? I have confirmed that there are no problems with get_utterence_label_pred and get_accuracy_and_cnf_matrix. It works well when I use the fit function.

My generator is below.

def predict_generator(x):
while True:
    for index in range(len(x)):
        feature = x[index]
        feature = np.expand_dims(x[index],-1)
        feature = np.expand_dims(feature,0) # make (1,input_height,input_width,1) 
        yield (feature)

def no_decoder_generator(x, y):
while True:
    indexes = np.arange(len(x))
    np.random.shuffle(indexes)
    for index in indexes:
        feature = x[index]
        feature = np.expand_dims(x[index],-1)
        feature = np.expand_dims(feature,0) # make (1,input_height,input_width,1) 
        label = y[index]
        label = np.expand_dims(label,0)
        yield (feature, label)

Epoch 1/70
1858/1858 [==============================] - 558s 300ms/step - loss: 1.0708 - acc: 0.5684 - val_loss: 0.9087 - val_acc: 0.6244 [Epoch 001/070]
loss: 1.0708 - acc: 0.5684 - val_loss: 0.9087 - val_acc: 0.6244
Confusion matrix:
[[ 0. 28. 68. 4. ]
[ 0. 13.33 80. 6.67]
[ 0.96 2.88 64.42 31.73]
[ 0. 0. 3.28 96.72]]
Valid [WAR] [UAR] [Custom] : [62.44 43.62] [47.38]

Epoch 2/70 1858/1858 [==============================] - 262s 141ms/step - loss: 0.9526 - acc: 0.6254 - val_loss: 1.1998 - val_acc: 0.4537 [Epoch 002/070]
loss: 0.9526 - acc: 0.6254 - val_loss: 1.1998 - val_acc: 0.4537
Confusion matrix:
[[ 36. 12. 24. 28. ]
[ 20. 0. 46.67 33.33]
[ 4.81 0.96 24.04 70.19]
[ 0. 0. 0. 100. ]]
Valid [WAR] [UAR] [Custom] : [46.34 40.01] [41.28]

This problem is generally because something is different, like not applying normalization to your data in the generator. Please check that. — Dr. Snoopy
@MatiasValdenegro Actually, I have already done preprocessing on the data. x_valid is already preprocessed data. Or are you talking about batch normalization? — Jeonghwa Yoo
@MatiasValdenegro it is true what you say, but in this case(and also in mine and in many other's), the problem is related to fit_generator. In my case, the results obtained with fit_generator were better than manually testing with model.predict() on my test_dataset. — Timbus Calin

Object Object · Accepted Answer · 2021-03-29T15:14:20

I konw！ It may be because your model stores the accuracy of the last epoch instead of the historical best accuracy. Therefore, the hand calculation accuracy is not the optimal accuracy. You can code like this

1.Save the optimal model to a file

callbacks= [callback_list.ModelCheckpoint(
        filepath='best_model.{epoch:02d}-{val_acc:.2f}.h5',
        monitor='val_acc', save_best_only=True,verbose=1)]

2.Load the model

model = load_model('best_model.03-0.69.h5')

The accuracy that the fit generator outputs in Keras differs from the manually calculated accuracy

2 Answers